Обсуждение: bulk_multi_insert infinite loops with large rows and small fill factors
COPY IN loops in heap_multi_insert() extending the table until it fills the disk when trying to insert a wide row into a table with a low fill-factor. Internally fill-factor is implemented by reserving some space space on a page. For large enough rows and small enough fill-factor bulk_multi_insert() can't fit the row even on a new empty page, so it keeps allocating new pages but is never able to place the row. It should always put at least one row on an empty page. In the excerpt below saveFreeSpace is the reserved space for the fill-factor. while (ndone < ntuples) { ... /* * Find buffer where at least the next tuple will fit. If the page is * all-visible, this will also pin the requisite visibility map page. */ buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len, ... /* Put as many tuples as fit on this page */ for (nthispage = 0; ndone + nthispage < ntuples; nthispage++) { HeapTuple heaptup = heaptuples[ndone + nthispage]; if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace) break; RelationPutHeapTuple(relation, buffer, heaptup); } ...Do a bunch of dirtying and logging etc ... } This was introduced in 9.2 as part of the bulk insert speedup. One more point, in the case where we don't insert any rows, we still do all the dirtying and logging work even though we did not modify the page. I have tried skip all this if no rows are added (nthispage == 0), but my access method foo is sadly out of date, so someone should take a skeptical look at that. A test case and patch against 9.2.2 is attached. It fixes the problem and passes make check. Most of the diff is just indentation changes. Whoever tries this will want to test this on a small partition by itself. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Вложения
On 2012-12-12 03:04:19 -0800, David Gould wrote: > > COPY IN loops in heap_multi_insert() extending the table until it fills the > disk when trying to insert a wide row into a table with a low fill-factor. > Internally fill-factor is implemented by reserving some space space on a > page. For large enough rows and small enough fill-factor bulk_multi_insert() > can't fit the row even on a new empty page, so it keeps allocating new pages > but is never able to place the row. It should always put at least one row on > an empty page. Heh. Nice one. Did you hit that in practice? > One more point, in the case where we don't insert any rows, we still do all the > dirtying and logging work even though we did not modify the page. I have tried > skip all this if no rows are added (nthispage == 0), but my access method foo > is sadly out of date, so someone should take a skeptical look at that. > > A test case and patch against 9.2.2 is attached. It fixes the problem and passes > make check. Most of the diff is just indentation changes. Whoever tries this will > want to test this on a small partition by itself. ISTM this would be fixed with a smaller footprint by just making if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace) if (!PageIsEmpty(page) && PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace) I think that should work? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
Heikki Linnakangas
Дата:
On 12.12.2012 13:27, Andres Freund wrote: > On 2012-12-12 03:04:19 -0800, David Gould wrote: >> One more point, in the case where we don't insert any rows, we still do all the >> dirtying and logging work even though we did not modify the page. I have tried >> skip all this if no rows are added (nthispage == 0), but my access method foo >> is sadly out of date, so someone should take a skeptical look at that. >> >> A test case and patch against 9.2.2 is attached. It fixes the problem and passes >> make check. Most of the diff is just indentation changes. Whoever tries this will >> want to test this on a small partition by itself. > > ISTM this would be fixed with a smaller footprint by just making > > if (PageGetHeapFreeSpace(page)< MAXALIGN(heaptup->t_len) + saveFreeSpace) > > if (!PageIsEmpty(page)&& > PageGetHeapFreeSpace(page)< MAXALIGN(heaptup->t_len) + saveFreeSpace) > > I think that should work? Yeah, seems that it should, although PageIsEmpty() is no guarantee that the tuple fits, because even though PageIsEmpty() returns true, there might be dead line pointers consuming so much space that the tuple at hand doesn't fit. However, RelationGetBufferForTuple() won't return such a page, it guarantees that the first tuple does indeed fit on the page it returns. For the same reason, the later check that at least one tuple was actually placed on the page is not necessary. I committed a slightly different version, which unconditionally puts the first tuple on the page, and only applies the freespace check to the subsequent tuples. Since RelationGetBufferForTuple() guarantees that the first tuple fits, we can trust that, like heap_insert does. --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); - /* Put as many tuples as fit on this page */ - for (nthispage = 0; ndone + nthispage < ntuples; nthispage++) + /* + * RelationGetBufferForTuple has ensured that the first tuple fits. + * Put that on the page, and then as many other tuples as fit. + */ + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]); + for (nthispage = 1; ndone + nthispage < ntuples; nthispage++) { HeapTuple heaptup = heaptuples[ndone+ nthispage]; Thanks for the report! - Heikki
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
David Gould
Дата:
On Wed, 12 Dec 2012 12:27:11 +0100 Andres Freund <andres@2ndquadrant.com> wrote: > On 2012-12-12 03:04:19 -0800, David Gould wrote: > > > > COPY IN loops in heap_multi_insert() extending the table until it fills the > Heh. Nice one. Did you hit that in practice? Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy happens late in the initial setup script for new hosts. The first batch of new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over within a minute. Since the script setups up a lot of stuff we had no idea at first who oomed. > ISTM this would be fixed with a smaller footprint by just making > > if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace) > > if (!PageIsEmpty(page) && > PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace) > > I think that should work? I like PageIsEmpty() better (and would have used if I I knew), but I'm not so crazy about the negation. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
Heikki Linnakangas
Дата:
On 12.12.2012 14:17, David Gould wrote: > On Wed, 12 Dec 2012 12:27:11 +0100 > Andres Freund<andres@2ndquadrant.com> wrote: > >> On 2012-12-12 03:04:19 -0800, David Gould wrote: >>> >>> COPY IN loops in heap_multi_insert() extending the table until it fills the > >> Heh. Nice one. Did you hit that in practice? > > Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy > happens late in the initial setup script for new hosts. The first batch of > new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over > within a minute. Since the script setups up a lot of stuff we had no idea > at first who oomed. The bug's been fixed now, but note that huge tuples like this will always cause the table to be extended. Even if there are completely empty pages in the table, after a vacuum. Even a completely empty existing page is not considered spacious enough in this case, because it's still too small when you take fillfactor into account, so the insertion will always extend the table. If you regularly run into this situation, you might want to raise your fillfactor.. - Heikki
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
David Gould
Дата:
On Wed, 12 Dec 2012 13:56:08 +0200 Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > However, RelationGetBufferForTuple() won't return such > a page, it guarantees that the first tuple does indeed fit on the page > it returns. For the same reason, the later check that at least one tuple > was actually placed on the page is not necessary. > > I committed a slightly different version, which unconditionally puts the > first tuple on the page, and only applies the freespace check to the > subsequent tuples. Since RelationGetBufferForTuple() guarantees that the > first tuple fits, we can trust that, like heap_insert does. > > --- a/src/backend/access/heap/heapam.c > +++ b/src/backend/access/heap/heapam.c > @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple > *tuples, int ntuples, > /* NO EREPORT(ERROR) from here till changes are logged */ > START_CRIT_SECTION(); > > - /* Put as many tuples as fit on this page */ > - for (nthispage = 0; ndone + nthispage < ntuples; nthispage++) > + /* > + * () has ensured that the first tuple fits. > + * Put that on the page, and then as many other tuples as fit. > + */ > + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]); > + for (nthispage = 1; ndone + nthispage < ntuples; nthispage++) > { > HeapTuple heaptup = heaptuples[ndone + nthispage]; I don't know if this is the same thing. At least in the comments I was reading trying to figure this out there was some concern that someone else could change the space on the page. Does RelationGetBufferForTuple() guarantee against this too? -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
Heikki Linnakangas
Дата:
On 12.12.2012 14:24, David Gould wrote: > I don't know if this is the same thing. At least in the comments I was > reading trying to figure this out there was some concern that someone > else could change the space on the page. Does RelationGetBufferForTuple() > guarantee against this too? Yeah, RelationGetBufferForTuple grabs a lock on the page before returning it. For comparison, plain heap_insert does simply this: > buffer = RelationGetBufferForTuple(relation, heaptup->t_len, > InvalidBuffer, options, bistate, > &vmbuffer, NULL); > > /* NO EREPORT(ERROR) from here till changes are logged */ > START_CRIT_SECTION(); > > RelationPutHeapTuple(relation, buffer, heaptup); - Heikki
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
David Gould
Дата:
On Wed, 12 Dec 2012 14:23:12 +0200 Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > The bug's been fixed now, but note that huge tuples like this will > always cause the table to be extended. Even if there are completely > empty pages in the table, after a vacuum. Even a completely empty > existing page is not considered spacious enough in this case, because > it's still too small when you take fillfactor into account, so the > insertion will always extend the table. If you regularly run into this > situation, you might want to raise your fillfactor.. Actually, we'd like it lower. Ideally, one row per page. We lose noticable performance when we raise fill-factor above 10. Even 20 is slower. During busy times these hosts sometimes fall into a stable state with very high cpu use mostly in s_lock() and LWLockAcquire() and I think PinBuffer plus very high system cpu in the scheduler (I don't have the perf trace in front of me so take this with a grain of salt). In this mode they fall from the normal 7000 queries per second to below 3000. Once in this state they tend to stay that way. If we turn down the number of incoming requests they go back to normal. Our conjecture is that most requests are for only a few keys and so we have multiple sessions contending for few pages and convoying in the buffer manager. The table is under 20k rows, but the hot items are probably only a couple hundred different rows. The busy processes are doing reads only, but there is some update activity on this table too. Ah, found an email with the significant part of the perf output: > ... set number of client threads = number of postgres backends = 70. That way > all my threads have constant access to a backend and they just spin in a tight > loop running the same query over and over (with different values). ... this > seems to have tapped into 9.2's resonant frequency, right now we're spending > almost all our time spin locking. ... > 762377.00 71.0% s_lock /usr/local/bin/postgres > 22279.00 2.1% LWLockAcquire /usr/local/bin/postgres > 18916.00 1.8% LWLockRelease /usr/local/bin/postgres I was trying to resurrect the pthread s_lock() patch to see if that helps, but it did not apply at all and I have not had time to persue it. We have tried lots of number of processes and get the best result with about ten less active postgresql backends than HT cores. System is 128GB with: processor : 79 vendor_id : GenuineIntel cpu family : 6 model : 47 model name : Intel(R) Xeon(R) CPU E7-L8867 @ 2.13GHz stepping : 2 cpu MHz : 2128.478 cache size : 30720 KB -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > The bug's been fixed now, but note that huge tuples like this will > always cause the table to be extended. Even if there are completely > empty pages in the table, after a vacuum. Even a completely empty > existing page is not considered spacious enough in this case, because > it's still too small when you take fillfactor into account, so the > insertion will always extend the table. Seems like that's a bug in itself: there's no reason to reject an empty existing page. regards, tom lane
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
Robert Haas
Дата:
On Wed, Dec 12, 2012 at 8:29 AM, David Gould <daveg@sonic.net> wrote: > We lose noticable performance when we raise fill-factor above 10. Even 20 is > slower. Whoa. > During busy times these hosts sometimes fall into a stable state > with very high cpu use mostly in s_lock() and LWLockAcquire() and I think > PinBuffer plus very high system cpu in the scheduler (I don't have the perf > trace in front of me so take this with a grain of salt). In this mode they > fall from the normal 7000 queries per second to below 3000. I have seen signs of something similar to this when running pgbench -S tests at high concurrency. I've never been able to track down where the problem is happening. My belief is that once a spinlock starts to be contended, there's some kind of death spiral that can't be arrested until the workload eases up. But I haven't had much luck identifying exactly which spinlock is the problem or if it even is just one... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors
От
David Gould
Дата:
On Fri, 14 Dec 2012 15:39:44 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Dec 12, 2012 at 8:29 AM, David Gould <daveg@sonic.net> wrote: > > We lose noticable performance when we raise fill-factor above 10. Even 20 is > > slower. > > Whoa. Any interest in a fill-factor patch to place exactly one row per page? That would be the least contended. There are applications where it might help. > > During busy times these hosts sometimes fall into a stable state > > with very high cpu use mostly in s_lock() and LWLockAcquire() and I think > > PinBuffer plus very high system cpu in the scheduler (I don't have the perf > > trace in front of me so take this with a grain of salt). In this mode they > > fall from the normal 7000 queries per second to below 3000. > > I have seen signs of something similar to this when running pgbench -S > tests at high concurrency. I've never been able to track down where I think I may have seen that with pgbench -S too. I did not have time to investigate more, but out of a sequence of three minute runs I got most runs at 300k+ qps and but a couple were around 200k qps. > the problem is happening. My belief is that once a spinlock starts to > be contended, there's some kind of death spiral that can't be arrested > until the workload eases up. But I haven't had much luck identifying > exactly which spinlock is the problem or if it even is just one... I agree about the death spiral. I think what happens is all the backends get synchcronized by waiting and they are more likely to contend again. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.