Re: Bulk Inserts

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Bulk Inserts
Дата
Msg-id 5cd58ca7-ea45-01ee-bd80-c9366b872291@aklaver.com
обсуждение исходный текст
Ответ на Re: Bulk Inserts  (Souvik Bhattacherjee <kivuosb@gmail.com>)
Ответы Re: Bulk Inserts  (Souvik Bhattacherjee <kivuosb@gmail.com>)
Список pgsql-general
On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote:
>  > If the selects are returning more then one row then you are already
>  > doing bulk inserts. If they are returning single rows or you want to
>  > batch them then you need some sort of code to do that. Something
>  > like(pseudo Python like code):
> 
>  > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
> 
>  > for val_batch in attr2_vals:
>          BEGIN
>          for id in val_batch:
>                  insert into tab2 (attr1, attr2) (select attr1, attr2
>                   from tab1 where attr2 = id)
>           COMMIT
> 
> For *EXP 1: inserts with multiple txn:*
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where 
> attr2 = 10);
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where 
> attr2 = 20);
> 
> tab1 has ~6M rows and there are only two values for the attribute attr2 in
> tab1 which are evenly distributed. So, yes, I guess I'm already doing 
> batching
> here.
> 
> Also, I ran the following two statements to see if their performances 
> are comparable.
> While STMT 1 always runs faster in my machine but their performances 
> seem to differ
> by a couple of seconds at most.
> 
> STMT 1: select attr1, attr2 into tab2 from tab1;
> STMT 2: insert into tab2 (select attr1, attr2 from tab1);

All I have left is:

select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1, 
attr2 into tab2 from tab1;

That will not create a serial type in the id column though. You can 
attach a sequence to that column. Something like:

1) create sequence tab2_id start <max id + 1> owned by tab2.id;

2) alter table tab2 alter COLUMN id set default nextval('tab2_id');



> 
> However adding the serial id column as an ALTER TABLE statement actually 
> takes more time
> than inserting the tuples, so the combined total time is more than 
> double the time taken to insert
> the tuples into tab2 without serial id column.
> 
> Best,
> -SB
> 
> 
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



В списке pgsql-general по дате отправления:

Предыдущее
От: Alban Hertroys
Дата:
Сообщение: Re: A GROUP BY question
Следующее
От: rihad
Дата:
Сообщение: Changing work_mem