Please comment on pgsql speed at handling 550,000 records

Поиск
Список
Период
Сортировка
От Srinivas Iyyer
Тема Please comment on pgsql speed at handling 550,000 records
Дата
Msg-id 20060207175630.72838.qmail@web34515.mail.mud.yahoo.com
обсуждение исходный текст
Ответ на Foreign Keys With Arrays  (Ketema Harris <ketema@gmail.com>)
Ответы Re: Please comment on pgsql speed at handling 550,000 records  (Michael Swierczek <mike.swierczek@gmail.com>)
Re: Please comment on pgsql speed at handling 550,000 records  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-novice
dear group,
 I am running a file with 550,000 lines.  This data is
not so complex. The scheme is
Table A

seq_id | seq_name  | seq_identifier
1      | ABC       | NM_0000023
....

(23k lines)


Table B:

gos_id  |  go_id  | go_cat  | go_name
1        GO:00003 | P       | death
......
(7k)


Now table C

seq_id   | gos_id
1           1
1        |  2
1        |  200

...
(550K lines)




Table Temp_C
seq_name  |  go_id
ABC       | GO:9993934
ABC       | GO:3489343
....
(550,500 lines)


question:

TAble A and B populated easily without question.

Table C, is completely a relationship table.
I wrote a python script that writes all 500K lines
with select statements inside.

insert into tablec (seq_id,gos_id) values (
(select seq_id from table a where seq_name ='xxxx'),
(select gos_id from table b where go_id = 'xxxx'));


Such 500K + insert statements took a long time and I
had to abort it.

3.5 hrs of time, 300K lines got inserted


I wrote a plpgsql function:

create function gosup() returns integer AS '
DECLARE
    referrer_key RECORD;
BEIGN
    FOR referrer_key IN SELECT * from TEMP_C LOOP
       INSERT INTO tableC(seq_id,gos_id) values(
(select seq_id from table a where seq_name
=referrer_key.seq_name),
(select gos_id from table b where go_id =
referrer_key.go_id))
    END LOOP
return 0;
END;
' LANGUAGE plpgsql;

so this function is written to make the inserting
faster.

I used \copy command to create temp_c.


The ironical point is, even now it is taking over an
hour (still running as of now).

So experts,

do you really thing inserting 550K lines is really a
long time consuming step no matter what method you
choose (such as regular insert statements or writing a
procedure to automate this step on server side).

Could any one throw some light on DB performance (no
matter wheter it is postgres, oracle, SyBase or
MySQL).

In specific what is the state of efficiency in
postgres.

Thank you.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

В списке pgsql-novice по дате отправления:

Предыдущее
От: Andreas Seltenreich
Дата:
Сообщение: Re: Foreign Keys With Arrays
Следующее
От: Michael Swierczek
Дата:
Сообщение: Re: Please comment on pgsql speed at handling 550,000 records