Re: Loading speed

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Loading speed
Дата	22 октября 2003 г. 14:35:55
Msg-id	11333.1066833313@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Loading speed (Cath Lawrence <Cath.Lawrence@anu.edu.au>)
Список	pgsql-novice

Дерево обсуждения

Cath Lawrence <Cath.Lawrence@anu.edu.au> writes:
> The atom table is a bit like this with some language changes:
> CREATE TABLE atom (
>         id                  INTEGER PRIMARY KEY DEFAULT
> nextval('atom_id_seq'),
>         protein_id          CHAR(4) REFERENCES protein,
>         chain_id            CHAR(1) NOT NULL,
>         amino_acid_id       INTEGER REFERENCES amino_acid,
>         atom_type           TEXT,
>         element             TEXT,
>         x                   FLOAT,
>         y                   FLOAT,
>         z                   FLOAT
> );

> With my simple-minded pygresql script, I have one insert statement per
> atom, and of course it crawls very feebly. Perhaps there's an
> insert_many method I haven't found yet (the docs suck, I will keep
> looking).

> Or perhaps I should write them out to a text file and use COPY?

COPY would be better than individual INSERTs, if you can bunch a
reasonably large number of rows into each COPY (the point being to cut
down per-statement overhead, of course).  You could perhaps avoid the
use of an intermediate file, if pygresql supports COPY FROM STDIN
(I don't know if it does).

However, I suspect the real problem is with poor performance of the
REFERENCES checks.  What is likely happening is that the planner thinks
the "protein" and "amino_acid" tables are small, and is generating plans
for the checks that only work well for small tables.  Once you get a
respectable amount of data into the referenced tables, the checks take
forever.

If you can load the "protein" and "amino_acid" tables first, then the
answer is simple --- do that, then VACUUM and/or ANALYZE those tables,
then start on the "atom" table.  If you want to load them in parallel
then I think your best bet is to drop and recreate all three tables
before you start loading, and do *not* run VACUUM or ANALYZE while the
tables are empty.  The default planner assumptions if no VACUUM or
ANALYZE has happened should give tolerable plans.

Another thing you can think about is to not do the REFERENCES checks
while loading data at all; that is, create the tables without
REFERENCES, load all the data, ANALYZE, then install the references
clauses with ALTER TABLE ADD FOREIGN KEY.  However this is only really
a good idea if you're pretty certain the initial data is clean ---
if you have any referential check failures, tracking them down will be
painful.  (7.4 would tell you the failing key value, but I don't think
7.3 does.)

            regards, tom lane

В списке pgsql-novice по дате отправления:

Предыдущее

От: Oliver Elphick
Дата: 22 октября 2003 г., 07:59:04
Сообщение: Re: Loading speed

Следующее

От: "big_mafa"
Дата: 22 октября 2003 г., 15:12:53
Сообщение: How to deal with smaller xlogs?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Loading speed

Предыдущее

Следующее