Re: Bulk Inserts

Поиск
Список
Период
Сортировка
От Souvik Bhattacherjee
Тема Re: Bulk Inserts
Дата
Msg-id CAANrPSejKbNVwMBEioHipAM3B6TA8=M_dxiaXHg82BDiO4MOdQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bulk Inserts  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: Bulk Inserts  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
> If the selects are returning more then one row then you are already
> doing bulk inserts. If they are returning single rows or you want to
> batch them then you need some sort of code to do that. Something
> like(pseudo Python like code):

> attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]

> for val_batch in attr2_vals:
        BEGIN
        for id in val_batch:
                insert into tab2 (attr1, attr2) (select attr1, attr2
                 from tab1 where attr2 = id)
         COMMIT

For EXP 1: inserts with multiple txn:
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where attr2 = 20);

tab1 has ~6M rows and there are only two values for the attribute attr2 in
tab1 which are evenly distributed. So, yes, I guess I'm already doing batching
here.

Also, I ran the following two statements to see if their performances are comparable. 
While STMT 1 always runs faster in my machine but their performances seem to differ
by a couple of seconds at most.

STMT 1: select attr1, attr2 into tab2 from tab1;
STMT 2: insert into tab2 (select attr1, attr2 from tab1);

However adding the serial id column as an ALTER TABLE statement actually takes more time 
than inserting the tuples, so the combined total time is more than double the time taken to insert
the tuples into tab2 without serial id column.

Best,
-SB



On Sun, Aug 11, 2019 at 11:11 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 8/10/19 7:47 PM, Souvik Bhattacherjee wrote:
> Hi Adrian,
>
> Thanks for the response.
>
>  > Yes, but you will some code via client or function that batches the
>  > inserts for you.
>
> Could you please elaborate a bit on how EXP 1 could be performed such
> that it uses bulk inserts?

I guess it comes down to what you define as bulk inserts. From your OP:

EXP 1: inserts with multiple txn:
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
attr2 = 10);
insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
attr2 = 20);

If the selects are returning more then one row then you are already
doing bulk inserts. If they are returning single rows or you want to
batch them then you need some sort of code to do that. Something
like(pseudo Python like code):

attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]

for val_batch in attr2_vals:
        BEGIN
        for id in val_batch:
                insert into tab2 (attr1, attr2) (select attr1, attr2
                 from tab1 where attr2 = id)
         COMMIT

>
> Best,
> -SB



--
Adrian Klaver
adrian.klaver@aklaver.com

В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: A GROUP BY question
Следующее
От: Alban Hertroys
Дата:
Сообщение: Re: A GROUP BY question