Обсуждение: postgres_fdw batching vs. (re)creating the tuple slots
Hi, while looking at the other thread related to postgres_fdw batching [1] and testing with very large batches, I noticed this disappointing behavior when inserting 1M rows (just integers, nothing fancy): no batching: 64782 ms 100 rows: 2118 ms 32767 rows: 41115 ms Pretty nice improvement when batching 100 rows, but then it all goes wrong for some reason. The problem is pretty obvious from a perf profile: --100.00%--ExecModifyTable | --99.70%--ExecInsert | |--50.87%--MakeSingleTupleTableSlot | | | --50.85%--MakeTupleTableSlot | | | --50.70%--IncrTupleDescRefCount | | | --50.69%--ResourceOwnerRememberTupleDesc | | | --50.69%--ResourceArrayAdd | |--48.18%--ExecBatchInsert | | | --47.92%--ExecDropSingleTupleTableSlot | | | |--47.17%--DecrTupleDescRefCount | | | | | --47.15%--ResourceOwnerForgetTupleDesc | | | | | --47.14%--ResourceArrayRemove | | | --0.53%--ExecClearTuple | --0.60%--ExecCopySlot There are two problems at play, here. Firstly, the way it's coded now the slots are pretty much re-created for each batch. So with 1M rows and batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful, and it shouldn't be too difficult to keep the slots across batches. (We can't initialize all the slots in advance, because we don't know how many will be needed, but we don't have to release them between batches.) The other problem is that ResourceArrayAdd/Remove seem to behave a bit poorly with very many elements - I'm not sure if it's O(N^2) or worse, but growing the array and linear searches seem to be a bit expensive. I'll take a look at fixing the first point, but I'm not entirely sure how much will that improve the situation. regards [1] https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote: > There are two problems at play, here. Firstly, the way it's coded now > the slots are pretty much re-created for each batch. So with 1M rows and > batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful, > and it shouldn't be too difficult to keep the slots across batches. (We > can't initialize all the slots in advance, because we don't know how > many will be needed, but we don't have to release them between batches.) Yea, that sounds like an obvious improvement. > I'll take a look at fixing the first point, but I'm not entirely sure > how much will that improve the situation. Hm - I'd not expect this to still show up in the profile afterwards, when you insert >> 32k rows. Still annoying when a smaller number is inserted, of course. > The other problem is that ResourceArrayAdd/Remove seem to behave a bit > poorly with very many elements - I'm not sure if it's O(N^2) or worse, > but growing the array and linear searches seem to be a bit expensive. Hm. I assume this is using the hashed representation of a resowner array most of the time, not the array one? I suspect the problem is that pretty quickly the ResourceArrayRemove() degrades to a linear search, because all of the resowner entries are the same, so the hashing doesn't help us at all. The peril of a simplistic open-coded hash table :( I think in this specific situation the easiest workaround is to use a copy of the tuple desc, instead of the one in the relcache - the copy won't be refcounted. The whole tupledesc refcount / resowner stuff is a mess. We don't really utilize it much, and pay a pretty steep price for maintaining it. This'd be less of an issue if we didn't store one resowner item for each reference, but kept track of the refcount one tupdesc resowner item has. But there's no space to store that right now, nor is it easy to make space, due to the way comparisons work for resowner. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote: >> The other problem is that ResourceArrayAdd/Remove seem to behave a bit >> poorly with very many elements - I'm not sure if it's O(N^2) or worse, >> but growing the array and linear searches seem to be a bit expensive. > Hm. I assume this is using the hashed representation of a resowner array > most of the time, not the array one? I suspect the problem is that > pretty quickly the ResourceArrayRemove() degrades to a linear search, > because all of the resowner entries are the same, so the hashing doesn't > help us at all. The peril of a simplistic open-coded hash table :( Not only does ResourceArrayRemove degrade, but so does ResourceArrayAdd. > I think in this specific situation the easiest workaround is to use a > copy of the tuple desc, instead of the one in the relcache - the copy > won't be refcounted. Probably. There's no obvious reason why these transient slots need a long-lived tupdesc. But it does seem like the hashing scheme somebody added to resowners is a bit too simplistic. It ought to be able to cope with lots of refs to the same object, or at least not be extra-awful for that case. regards, tom lane
Hi, On 2021-05-30 17:10:59 -0400, Tom Lane wrote: > But it does seem like the hashing scheme somebody added to resowners > is a bit too simplistic. It ought to be able to cope with lots of > refs to the same object, or at least not be extra-awful for that case. It's not really the hashing that's the problem, right? The array representation would have nearly the same problem, I think? It doesn't seem trivial to improve it without making resowner.c's representation a good bit more complicated. Right now there's no space to store a 'per resowner & tupdesc refcount'. We can't even just make the tuple desc reference a separate allocation (of (tupdesc, refcount)), because ResourceArrayRemove() relies on testing for equality with ==. I think we'd basically need an additional version of ResourceArray (type + functions) which can store some additional data for each entry? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-05-30 17:10:59 -0400, Tom Lane wrote: >> But it does seem like the hashing scheme somebody added to resowners >> is a bit too simplistic. It ought to be able to cope with lots of >> refs to the same object, or at least not be extra-awful for that case. > It's not really the hashing that's the problem, right? The array > representation would have nearly the same problem, I think? ResourceArrayAdd would have zero problem. ResourceArrayRemove is O(1) as long as resources are removed in reverse order ... which is effectively true if they're all the same resource. So while I've not tested, I believe that this particular case would have no issue at all with the old resowner implementation, stupid though that was. > It doesn't seem trivial to improve it without making resowner.c's > representation a good bit more complicated. Dunno, I have not studied the new version at all. regards, tom lane
Hi, Here's two WIP patches that fixes the regression for me. The first part is from [1], so make large batches work, 0002 just creates a copy of the tupledesc to not cause issues in resource owner, 0003 ensures we only initialize the slots once (not per batch). With the patches applied, the timings look like this: batch timing ---------------------- 1 64194.942 ms 10 7233.785 ms 100 2244.255 ms 32k 1372.175 ms which seems fine. I still need to get this properly tested etc. and make sure nothing is left over. regards [1] https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Argh! I forgot the attachments, of course. On 6/4/21 1:48 PM, Tomas Vondra wrote: > Hi, > > Here's two WIP patches that fixes the regression for me. The first part > is from [1], so make large batches work, 0002 just creates a copy of the > tupledesc to not cause issues in resource owner, 0003 ensures we only > initialize the slots once (not per batch). > > With the patches applied, the timings look like this: > > batch timing > ---------------------- > 1 64194.942 ms > 10 7233.785 ms > 100 2244.255 ms > 32k 1372.175 ms > > which seems fine. I still need to get this properly tested etc. and make > sure nothing is left over. > > regards > > > [1] > https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com > -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, Here's a v2 fixing a silly bug with reusing the same variable in two nested loops (worked for simple postgres_fdw cases, but "make check" failed). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > Here's a v2 fixing a silly bug with reusing the same variable in two > nested loops (worked for simple postgres_fdw cases, but "make check" > failed). I applied these patches and ran make check in postgres_fdw contrib module, I saw a server crash. Is it the same failure you were saying above? With Regards, Bharath Rupireddy.
On 6/9/21 12:50 PM, Bharath Rupireddy wrote: > On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> Hi, >> >> Here's a v2 fixing a silly bug with reusing the same variable in two >> nested loops (worked for simple postgres_fdw cases, but "make check" >> failed). > > I applied these patches and ran make check in postgres_fdw contrib > module, I saw a server crash. Is it the same failure you were saying > above? > Nope, that was causing infinite loop. This is jut a silly mistake on my side - I forgot to replace the i/j variable inside the loop. Here's v3. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Wed, Jun 9, 2021 at 4:38 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 6/9/21 12:50 PM, Bharath Rupireddy wrote: > > On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> Hi, > >> > >> Here's a v2 fixing a silly bug with reusing the same variable in two > >> nested loops (worked for simple postgres_fdw cases, but "make check" > >> failed). > > > > I applied these patches and ran make check in postgres_fdw contrib > > module, I saw a server crash. Is it the same failure you were saying > > above? > > > > Nope, that was causing infinite loop. This is jut a silly mistake on my > side - I forgot to replace the i/j variable inside the loop. Here's v3. Thanks. The postgres_fdw regression test execution time is not increased too much with the patches even with the test case added by the below commit. With and without the patches attached in this thread, the execution times are 5 sec and 17 sec respectively. So, essentially these patches are reducing the execution time for the test case added by the below commit. commit cb92703384e2bb3fa0a690e5dbb95ad333c2b44c Author: Tomas Vondra <tomas.vondra@postgresql.org> Date: Tue Jun 8 20:22:18 2021 +0200 Adjust batch size in postgres_fdw to not use too many parameters With Regards, Bharath Rupireddy.
On 6/9/21 1:08 PM, Tomas Vondra wrote: > > > On 6/9/21 12:50 PM, Bharath Rupireddy wrote: >> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >>> >>> Hi, >>> >>> Here's a v2 fixing a silly bug with reusing the same variable in two >>> nested loops (worked for simple postgres_fdw cases, but "make check" >>> failed). >> >> I applied these patches and ran make check in postgres_fdw contrib >> module, I saw a server crash. Is it the same failure you were saying >> above? >> > > Nope, that was causing infinite loop. This is jut a silly mistake on my > side - I forgot to replace the i/j variable inside the loop. Here's v3. > > regards > FWIW I've pushed this, after improving the comments a little bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra писал 2021-06-12 00:01: > On 6/9/21 1:08 PM, Tomas Vondra wrote: >> >> >> On 6/9/21 12:50 PM, Bharath Rupireddy wrote: >>> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Hi, >>>> >>>> Here's a v2 fixing a silly bug with reusing the same variable in two >>>> nested loops (worked for simple postgres_fdw cases, but "make check" >>>> failed). >>> >>> I applied these patches and ran make check in postgres_fdw contrib >>> module, I saw a server crash. Is it the same failure you were saying >>> above? >>> >> >> Nope, that was causing infinite loop. This is jut a silly mistake on >> my >> side - I forgot to replace the i/j variable inside the loop. Here's >> v3. >> >> regards >> > > FWIW I've pushed this, after improving the comments a little bit. > > > regards Hi. It seems this commit commit b676ac443b6a83558d4701b2dd9491c0b37e17c4 Author: Tomas Vondra <tomas.vondra@postgresql.org> Date: Fri Jun 11 20:19:48 2021 +0200 Optimize creation of slots for FDW bulk inserts has broken batch insert for partitions with unique indexes. Earlier the case worked as expected, inserting 1000 tuples. Now it exits with ERROR: duplicate key value violates unique constraint "p0_pkey" DETAIL: Key (x)=(1) already exists. CONTEXT: remote SQL command: INSERT INTO public.batch_table_p0(x, field1, field2) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9), ($10, $11, $12), ($13, $14, $15), ($16, $17, $18), ($19, $20, $21), ($22, $23, $24), ($25, $26, $27), ($28, $29, $30), ($31, $32, $33), ($34, $35, $36), ($37, $38, $39), ($40, $41, $42), ($43, $44, $45), ($46, $47, $48), ($49, $50, $51), ($52, $53, $54), ($55, $56, $57), ($58, $59, $60), ($61, $62, $63), ($64, $65, $66), ($67, $68, $69), ($70, $71, $72), ($73, $74, $75), ($76, $77, $78), ($79, $80, $81), ($82, $83, $84), ($85, $86, $87), ($88, $89, $90), ($91, $92, $93), ($94, $95, $96), ($97, $98, $99), ($100, $101, $102), ($103, $104, $105), ($106, $107, $108), ($109, $110, $111), ($112, $113, $114), ($115, $116, $117), ($118, $119, $120), ($121, $122, $123), ($124, $125, $126), ($127, $128, $129), ($130, $131, $132), ($133, $134, $135), ($136, $137, $138), ($139, $140, $141), ($142, $143, $144), ($145, $146, $147), ($148, $149, $150), ($151, $152, $153), ($154, $155, $156), ($157, $158, $159), ($160, $161, $162), ($163, $164, $165), ($166, $167, $168), ($169, $170, $171), ($172, $173, $174), ($175, $176, $177), ($178, $179, $180), ($181, $182, $183), ($184, $185, $186), ($187, $188, $189), ($190, $191, $192), ($193, $194, $195), ($196, $197, $198), ($199, $200, $201), ($202, $203, $204), ($205, $206, $207), ($208, $209, $210), ($211, $212, $213), ($214, $215, $216), ($217, $218, $219), ($220, $221, $222), ($223, $224, $225), ($226, $227, $228), ($229, $230, $231), ($232, $233, $234), ($235, $236, $237), ($238, $239, $240), ($241, $242, $243), ($244, $245, $246), ($247, $248, $249), ($250, $251, $252), ($253, $254, $255), ($256, $257, $258), ($259, $260, $261), ($262, $263, $264), ($265, $266, $267), ($268, $269, $270), ($271, $272, $273), ($274, $275, $276), ($277, $278, $279), ($280, $281, $282), ($283, $284, $285), ($286, $287, $288), ($289, $290, $291), ($292, $293, $294), ($295, $296, $297), ($298, $299, $300) -- Best regards, Alexander Pyhalov, Postgres Professional
Вложения
On 6/16/21 2:36 PM, Alexander Pyhalov wrote: > > Hi. > It seems this commit > > commit b676ac443b6a83558d4701b2dd9491c0b37e17c4 > Author: Tomas Vondra <tomas.vondra@postgresql.org> > Date: Fri Jun 11 20:19:48 2021 +0200 > > Optimize creation of slots for FDW bulk inserts > > has broken batch insert for partitions with unique indexes. > Thanks for the report and reproducer! Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-( The data is copied into the slots only in the branch that initializes them, so the subsequent batches just insert the same data over and over. The attached patch fixes that, and adds a regression test (a bit smaller version of your reproducer). I'll get this committed shortly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 6/16/21 4:23 PM, Tomas Vondra wrote: > On 6/16/21 2:36 PM, Alexander Pyhalov wrote: >> >> Hi. >> It seems this commit >> >> commit b676ac443b6a83558d4701b2dd9491c0b37e17c4 >> Author: Tomas Vondra <tomas.vondra@postgresql.org> >> Date: Fri Jun 11 20:19:48 2021 +0200 >> >> Optimize creation of slots for FDW bulk inserts >> >> has broken batch insert for partitions with unique indexes. >> > > Thanks for the report and reproducer! > > Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-( > The data is copied into the slots only in the branch that initializes > them, so the subsequent batches just insert the same data over and over. > > The attached patch fixes that, and adds a regression test (a bit smaller > version of your reproducer). I'll get this committed shortly. > Pushed, after a bit more cleanup and testing. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company