Обсуждение: postgres_fdw batching vs. (re)creating the tuple slots

Поиск
Список
Период
Сортировка

postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
Hi,

while looking at the other thread related to postgres_fdw batching [1]
and testing with very large batches, I noticed this disappointing
behavior when inserting 1M rows (just integers, nothing fancy):

no batching: 64782 ms
100 rows:     2118 ms
32767 rows:  41115 ms

Pretty nice improvement when batching 100 rows, but then it all goes
wrong for some reason.

The problem is pretty obvious from a perf profile:


  --100.00%--ExecModifyTable
   |
    --99.70%--ExecInsert
     |
     |--50.87%--MakeSingleTupleTableSlot
     |  |
     |   --50.85%--MakeTupleTableSlot
     |             |
     |              --50.70%--IncrTupleDescRefCount
     |                        |
     |                         --50.69%--ResourceOwnerRememberTupleDesc
     |                                   |
     |                                    --50.69%--ResourceArrayAdd
     |
     |--48.18%--ExecBatchInsert
     |  |
     |   --47.92%--ExecDropSingleTupleTableSlot
     |             |
     |             |--47.17%--DecrTupleDescRefCount
     |             |          |
     |             |           --47.15%--ResourceOwnerForgetTupleDesc
     |             |                     |
     |             |                      --47.14%--ResourceArrayRemove
     |             |
     |              --0.53%--ExecClearTuple
     |
      --0.60%--ExecCopySlot


There are two problems at play, here. Firstly, the way it's coded now
the slots are pretty much re-created for each batch. So with 1M rows and
batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful,
and it shouldn't be too difficult to keep the slots across batches. (We
can't initialize all the slots in advance, because we don't know how
many will be needed, but we don't have to release them between batches.)

The other problem is that ResourceArrayAdd/Remove seem to behave a bit
poorly with very many elements - I'm not sure if it's O(N^2) or worse,
but growing the array and linear searches seem to be a bit expensive.

I'll take a look at fixing the first point, but I'm not entirely sure
how much will that improve the situation.


regards


[1]
https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Andres Freund
Дата:
Hi,

On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote:
> There are two problems at play, here. Firstly, the way it's coded now
> the slots are pretty much re-created for each batch. So with 1M rows and
> batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful,
> and it shouldn't be too difficult to keep the slots across batches. (We
> can't initialize all the slots in advance, because we don't know how
> many will be needed, but we don't have to release them between batches.)

Yea, that sounds like an obvious improvement.


> I'll take a look at fixing the first point, but I'm not entirely sure
> how much will that improve the situation.

Hm - I'd not expect this to still show up in the profile afterwards,
when you insert >> 32k rows. Still annoying when a smaller number is
inserted, of course.


> The other problem is that ResourceArrayAdd/Remove seem to behave a bit
> poorly with very many elements - I'm not sure if it's O(N^2) or worse,
> but growing the array and linear searches seem to be a bit expensive.

Hm. I assume this is using the hashed representation of a resowner array
most of the time, not the array one? I suspect the problem is that
pretty quickly the ResourceArrayRemove() degrades to a linear search,
because all of the resowner entries are the same, so the hashing doesn't
help us at all. The peril of a simplistic open-coded hash table :(

I think in this specific situation the easiest workaround is to use a
copy of the tuple desc, instead of the one in the relcache - the copy
won't be refcounted.

The whole tupledesc refcount / resowner stuff is a mess. We don't really
utilize it much, and pay a pretty steep price for maintaining it.

This'd be less of an issue if we didn't store one resowner item for each
reference, but kept track of the refcount one tupdesc resowner item
has. But there's no space to store that right now, nor is it easy to
make space, due to the way comparisons work for resowner.

Greetings,

Andres Freund



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote:
>> The other problem is that ResourceArrayAdd/Remove seem to behave a bit
>> poorly with very many elements - I'm not sure if it's O(N^2) or worse,
>> but growing the array and linear searches seem to be a bit expensive.

> Hm. I assume this is using the hashed representation of a resowner array
> most of the time, not the array one? I suspect the problem is that
> pretty quickly the ResourceArrayRemove() degrades to a linear search,
> because all of the resowner entries are the same, so the hashing doesn't
> help us at all. The peril of a simplistic open-coded hash table :(

Not only does ResourceArrayRemove degrade, but so does ResourceArrayAdd.

> I think in this specific situation the easiest workaround is to use a
> copy of the tuple desc, instead of the one in the relcache - the copy
> won't be refcounted.

Probably.  There's no obvious reason why these transient slots need
a long-lived tupdesc.  But it does seem like the hashing scheme somebody
added to resowners is a bit too simplistic.  It ought to be able to
cope with lots of refs to the same object, or at least not be extra-awful
for that case.

            regards, tom lane



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Andres Freund
Дата:
Hi,

On 2021-05-30 17:10:59 -0400, Tom Lane wrote:
> But it does seem like the hashing scheme somebody added to resowners
> is a bit too simplistic.  It ought to be able to cope with lots of
> refs to the same object, or at least not be extra-awful for that case.

It's not really the hashing that's the problem, right? The array
representation would have nearly the same problem, I think?

It doesn't seem trivial to improve it without making resowner.c's
representation a good bit more complicated. Right now there's no space
to store a 'per resowner & tupdesc refcount'. We can't even just make
the tuple desc reference a separate allocation (of (tupdesc, refcount)),
because ResourceArrayRemove() relies on testing for equality with ==.

I think we'd basically need an additional version of ResourceArray (type
+ functions) which can store some additional data for each entry?

Greetings,

Andres Freund



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-05-30 17:10:59 -0400, Tom Lane wrote:
>> But it does seem like the hashing scheme somebody added to resowners
>> is a bit too simplistic.  It ought to be able to cope with lots of
>> refs to the same object, or at least not be extra-awful for that case.

> It's not really the hashing that's the problem, right? The array
> representation would have nearly the same problem, I think?

ResourceArrayAdd would have zero problem.  ResourceArrayRemove is
O(1) as long as resources are removed in reverse order ... which
is effectively true if they're all the same resource.  So while
I've not tested, I believe that this particular case would have
no issue at all with the old resowner implementation, stupid
though that was.

> It doesn't seem trivial to improve it without making resowner.c's
> representation a good bit more complicated.

Dunno, I have not studied the new version at all.

            regards, tom lane



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
Hi,

Here's two WIP patches that fixes the regression for me. The first part
is from [1], so make large batches work, 0002 just creates a copy of the
tupledesc to not cause issues in resource owner, 0003 ensures we only
initialize the slots once (not per batch).

With the patches applied, the timings look like this:

 batch          timing
 ----------------------
     1    64194.942 ms
    10     7233.785 ms
   100     2244.255 ms
   32k     1372.175 ms

which seems fine. I still need to get this properly tested etc. and make
sure nothing is left over.

regards


[1]
https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
Argh! I forgot the attachments, of course.

On 6/4/21 1:48 PM, Tomas Vondra wrote:
> Hi,
> 
> Here's two WIP patches that fixes the regression for me. The first part
> is from [1], so make large batches work, 0002 just creates a copy of the
> tupledesc to not cause issues in resource owner, 0003 ensures we only
> initialize the slots once (not per batch).
> 
> With the patches applied, the timings look like this:
> 
>  batch          timing
>  ----------------------
>      1    64194.942 ms
>     10     7233.785 ms
>    100     2244.255 ms
>    32k     1372.175 ms
> 
> which seems fine. I still need to get this properly tested etc. and make
> sure nothing is left over.
> 
> regards
> 
> 
> [1]
> https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> 

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two 
nested loops (worked for simple postgres_fdw cases, but "make check" 
failed).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Bharath Rupireddy
Дата:
On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> Here's a v2 fixing a silly bug with reusing the same variable in two
> nested loops (worked for simple postgres_fdw cases, but "make check"
> failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

With Regards,
Bharath Rupireddy.



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:

On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Hi,
>>
>> Here's a v2 fixing a silly bug with reusing the same variable in two
>> nested loops (worked for simple postgres_fdw cases, but "make check"
>> failed).
> 
> I applied these patches and ran make check in postgres_fdw contrib
> module, I saw a server crash. Is it the same failure you were saying
> above?
> 

Nope, that was causing infinite loop. This is jut a silly mistake on my 
side - I forgot to replace the i/j variable inside the loop. Here's v3.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Bharath Rupireddy
Дата:
On Wed, Jun 9, 2021 at 4:38 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
> > On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> Hi,
> >>
> >> Here's a v2 fixing a silly bug with reusing the same variable in two
> >> nested loops (worked for simple postgres_fdw cases, but "make check"
> >> failed).
> >
> > I applied these patches and ran make check in postgres_fdw contrib
> > module, I saw a server crash. Is it the same failure you were saying
> > above?
> >
>
> Nope, that was causing infinite loop. This is jut a silly mistake on my
> side - I forgot to replace the i/j variable inside the loop. Here's v3.

Thanks. The postgres_fdw regression test execution time is not
increased too much with the patches even with the test case added by
the below commit. With and without the patches attached in this
thread, the execution times are 5 sec and 17 sec respectively. So,
essentially these patches are reducing the execution time for the test
case added by the below commit.

commit cb92703384e2bb3fa0a690e5dbb95ad333c2b44c
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Tue Jun 8 20:22:18 2021 +0200

    Adjust batch size in postgres_fdw to not use too many parameters

With Regards,
Bharath Rupireddy.



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
On 6/9/21 1:08 PM, Tomas Vondra wrote:
> 
> 
> On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
>> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>>
>>> Hi,
>>>
>>> Here's a v2 fixing a silly bug with reusing the same variable in two
>>> nested loops (worked for simple postgres_fdw cases, but "make check"
>>> failed).
>>
>> I applied these patches and ran make check in postgres_fdw contrib
>> module, I saw a server crash. Is it the same failure you were saying
>> above?
>>
> 
> Nope, that was causing infinite loop. This is jut a silly mistake on my
> side - I forgot to replace the i/j variable inside the loop. Here's v3.
> 
> regards
> 

FWIW I've pushed this, after improving the comments a little bit.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Alexander Pyhalov
Дата:
Tomas Vondra писал 2021-06-12 00:01:
> On 6/9/21 1:08 PM, Tomas Vondra wrote:
>> 
>> 
>> On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
>>> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Here's a v2 fixing a silly bug with reusing the same variable in two
>>>> nested loops (worked for simple postgres_fdw cases, but "make check"
>>>> failed).
>>> 
>>> I applied these patches and ran make check in postgres_fdw contrib
>>> module, I saw a server crash. Is it the same failure you were saying
>>> above?
>>> 
>> 
>> Nope, that was causing infinite loop. This is jut a silly mistake on 
>> my
>> side - I forgot to replace the i/j variable inside the loop. Here's 
>> v3.
>> 
>> regards
>> 
> 
> FWIW I've pushed this, after improving the comments a little bit.
> 
> 
> regards

Hi.
It seems this commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Fri Jun 11 20:19:48 2021 +0200

     Optimize creation of slots for FDW bulk inserts

has broken batch insert for partitions with unique indexes.

Earlier the case worked as expected, inserting 1000 tuples. Now it exits 
with

ERROR:  duplicate key value violates unique constraint "p0_pkey"
DETAIL:  Key (x)=(1) already exists.
CONTEXT:  remote SQL command: INSERT INTO public.batch_table_p0(x, 
field1, field2) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9), ($10, 
$11, $12), ($13, $14, $15), ($16, $17, $18), ($19, $20, $21), ($22, $23, 
$24), ($25, $26, $27), ($28, $29, $30), ($31, $32, $33), ($34, $35, 
$36), ($37, $38, $39), ($40, $41, $42), ($43, $44, $45), ($46, $47, 
$48), ($49, $50, $51), ($52, $53, $54), ($55, $56, $57), ($58, $59, 
$60), ($61, $62, $63), ($64, $65, $66), ($67, $68, $69), ($70, $71, 
$72), ($73, $74, $75), ($76, $77, $78), ($79, $80, $81), ($82, $83, 
$84), ($85, $86, $87), ($88, $89, $90), ($91, $92, $93), ($94, $95, 
$96), ($97, $98, $99), ($100, $101, $102), ($103, $104, $105), ($106, 
$107, $108), ($109, $110, $111), ($112, $113, $114), ($115, $116, $117), 
($118, $119, $120), ($121, $122, $123), ($124, $125, $126), ($127, $128, 
$129), ($130, $131, $132), ($133, $134, $135), ($136, $137, $138), 
($139, $140, $141), ($142, $143, $144), ($145, $146, $147), ($148, $149, 
$150), ($151, $152, $153), ($154, $155, $156), ($157, $158, $159), 
($160, $161, $162), ($163, $164, $165), ($166, $167, $168), ($169, $170, 
$171), ($172, $173, $174), ($175, $176, $177), ($178, $179, $180), 
($181, $182, $183), ($184, $185, $186), ($187, $188, $189), ($190, $191, 
$192), ($193, $194, $195), ($196, $197, $198), ($199, $200, $201), 
($202, $203, $204), ($205, $206, $207), ($208, $209, $210), ($211, $212, 
$213), ($214, $215, $216), ($217, $218, $219), ($220, $221, $222), 
($223, $224, $225), ($226, $227, $228), ($229, $230, $231), ($232, $233, 
$234), ($235, $236, $237), ($238, $239, $240), ($241, $242, $243), 
($244, $245, $246), ($247, $248, $249), ($250, $251, $252), ($253, $254, 
$255), ($256, $257, $258), ($259, $260, $261), ($262, $263, $264), 
($265, $266, $267), ($268, $269, $270), ($271, $272, $273), ($274, $275, 
$276), ($277, $278, $279), ($280, $281, $282), ($283, $284, $285), 
($286, $287, $288), ($289, $290, $291), ($292, $293, $294), ($295, $296, 
$297), ($298, $299, $300)


-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
On 6/16/21 2:36 PM, Alexander Pyhalov wrote:
> 
> Hi.
> It seems this commit
> 
> commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
> Author: Tomas Vondra <tomas.vondra@postgresql.org>
> Date:   Fri Jun 11 20:19:48 2021 +0200
> 
>      Optimize creation of slots for FDW bulk inserts
> 
> has broken batch insert for partitions with unique indexes.
> 

Thanks for the report and reproducer!

Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-( 
The data is copied into the slots only in the branch that initializes 
them, so the subsequent batches just insert the same data over and over.

The attached patch fixes that, and adds a regression test (a bit smaller 
version of your reproducer). I'll get this committed shortly.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: postgres_fdw batching vs. (re)creating the tuple slots

От
Tomas Vondra
Дата:
On 6/16/21 4:23 PM, Tomas Vondra wrote:
> On 6/16/21 2:36 PM, Alexander Pyhalov wrote:
>>
>> Hi.
>> It seems this commit
>>
>> commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
>> Author: Tomas Vondra <tomas.vondra@postgresql.org>
>> Date:   Fri Jun 11 20:19:48 2021 +0200
>>
>>      Optimize creation of slots for FDW bulk inserts
>>
>> has broken batch insert for partitions with unique indexes.
>>
> 
> Thanks for the report and reproducer!
> 
> Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-( 
> The data is copied into the slots only in the branch that initializes 
> them, so the subsequent batches just insert the same data over and over.
> 
> The attached patch fixes that, and adds a regression test (a bit smaller 
> version of your reproducer). I'll get this committed shortly.
> 

Pushed, after a bit more cleanup and testing.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company