Обсуждение: bulk_multi_insert infinite loops with large rows and small fill factors

Поиск

Список

Период

Сортировка

bulk_multi_insert infinite loops with large rows and small fill factors

От

David Gould

Дата:

12 декабря 2012 г., 11:04:11

COPY IN loops in heap_multi_insert() extending the table until it fills the
disk when trying to insert a wide row into a table with a low fill-factor.
Internally fill-factor is implemented by reserving some space space on a
page. For large enough rows and small enough fill-factor bulk_multi_insert()
can't fit the row even on a new empty page, so it keeps allocating new pages
but is never able to place the row. It should always put at least one row on
an empty page.

In the excerpt below saveFreeSpace is the reserved space for the fill-factor.

    while (ndone < ntuples)
    { ...
        /*
         * Find buffer where at least the next tuple will fit.  If the page is
         * all-visible, this will also pin the requisite visibility map page.
         */
        buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
        ...
        /* Put as many tuples as fit on this page */
        for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
        {
            HeapTuple   heaptup = heaptuples[ndone + nthispage];

            if (PageGetHeapFreeSpace(page) < MAXALIGN(heaptup->t_len) + saveFreeSpace)
                break;

            RelationPutHeapTuple(relation, buffer, heaptup);
        }
        ...Do a bunch of dirtying and logging etc ...
     }

This was introduced in 9.2 as part of the bulk insert speedup.

One more point, in the case where we don't insert any rows, we still do all the
dirtying and logging work even though we did not modify the page. I have tried
skip all this if no rows are added (nthispage == 0), but my access method foo
is sadly out of date, so someone should take a skeptical look at that.

A test case and patch against 9.2.2 is attached. It fixes the problem and passes
make check. Most of the diff is just indentation changes. Whoever tries this will
want to test this on a small partition by itself.

-dg

--
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Вложения

Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Andres Freund

Дата:

12 декабря 2012 г., 11:27:16

On 2012-12-12 03:04:19 -0800, David Gould wrote:
>
> COPY IN loops in heap_multi_insert() extending the table until it fills the
> disk when trying to insert a wide row into a table with a low fill-factor.
> Internally fill-factor is implemented by reserving some space space on a
> page. For large enough rows and small enough fill-factor bulk_multi_insert()
> can't fit the row even on a new empty page, so it keeps allocating new pages
> but is never able to place the row. It should always put at least one row on
> an empty page.

Heh. Nice one. Did you hit that in practice?

> One more point, in the case where we don't insert any rows, we still do all the
> dirtying and logging work even though we did not modify the page. I have tried
> skip all this if no rows are added (nthispage == 0), but my access method foo
> is sadly out of date, so someone should take a skeptical look at that.
>
> A test case and patch against 9.2.2 is attached. It fixes the problem and passes
> make check. Most of the diff is just indentation changes. Whoever tries this will
> want to test this on a small partition by itself.

ISTM this would be fixed with a smaller footprint by just making

if (PageGetHeapFreeSpace(page) <  MAXALIGN(heaptup->t_len) + saveFreeSpace)

if (!PageIsEmpty(page) &&   PageGetHeapFreeSpace(page) <  MAXALIGN(heaptup->t_len) + saveFreeSpace)

I think that should work?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Heikki Linnakangas

Дата:

12 декабря 2012 г., 11:56:13

On 12.12.2012 13:27, Andres Freund wrote:
> On 2012-12-12 03:04:19 -0800, David Gould wrote:
>> One more point, in the case where we don't insert any rows, we still do all the
>> dirtying and logging work even though we did not modify the page. I have tried
>> skip all this if no rows are added (nthispage == 0), but my access method foo
>> is sadly out of date, so someone should take a skeptical look at that.
>>
>> A test case and patch against 9.2.2 is attached. It fixes the problem and passes
>> make check. Most of the diff is just indentation changes. Whoever tries this will
>> want to test this on a small partition by itself.
>
> ISTM this would be fixed with a smaller footprint by just making
>
> if (PageGetHeapFreeSpace(page)<   MAXALIGN(heaptup->t_len) + saveFreeSpace)
>
> if (!PageIsEmpty(page)&&
>      PageGetHeapFreeSpace(page)<   MAXALIGN(heaptup->t_len) + saveFreeSpace)
>
> I think that should work?

Yeah, seems that it should, although PageIsEmpty() is no guarantee that 
the tuple fits, because even though PageIsEmpty() returns true, there 
might be dead line pointers consuming so much space that the tuple at 
hand doesn't fit. However, RelationGetBufferForTuple() won't return such 
a page, it guarantees that the first tuple does indeed fit on the page 
it returns. For the same reason, the later check that at least one tuple 
was actually placed on the page is not necessary.

I committed a slightly different version, which unconditionally puts the 
first tuple on the page, and only applies the freespace check to the 
subsequent tuples. Since RelationGetBufferForTuple() guarantees that the 
first tuple fits, we can trust that, like heap_insert does.

--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple 
*tuples, int ntuples,         /* NO EREPORT(ERROR) from here till changes are logged */         START_CRIT_SECTION();

-        /* Put as many tuples as fit on this page */
-        for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
+        /*
+         * RelationGetBufferForTuple has ensured that the first tuple fits.
+         * Put that on the page, and then as many other tuples as fit.
+         */
+        RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
+        for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)         {             HeapTuple    heaptup =
heaptuples[ndone+ nthispage];

Thanks for the report!

- Heikki

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

David Gould

Дата:

12 декабря 2012 г., 12:17:13

On Wed, 12 Dec 2012 12:27:11 +0100
Andres Freund <andres@2ndquadrant.com> wrote:

> On 2012-12-12 03:04:19 -0800, David Gould wrote:
> >
> > COPY IN loops in heap_multi_insert() extending the table until it fills the

> Heh. Nice one. Did you hit that in practice?

Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
happens late in the initial setup script for new hosts. The first batch of
new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
within a minute. Since the script setups up a lot of stuff we had no idea
at first who oomed.

> ISTM this would be fixed with a smaller footprint by just making
> 
> if (PageGetHeapFreeSpace(page) <  MAXALIGN(heaptup->t_len) + saveFreeSpace)
> 
> if (!PageIsEmpty(page) &&
>     PageGetHeapFreeSpace(page) <  MAXALIGN(heaptup->t_len) + saveFreeSpace)
> 
> I think that should work?

I like PageIsEmpty() better (and would have used if I I knew), but I'm not
so crazy about the negation.

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Heikki Linnakangas

Дата:

12 декабря 2012 г., 12:23:17

On 12.12.2012 14:17, David Gould wrote:
> On Wed, 12 Dec 2012 12:27:11 +0100
> Andres Freund<andres@2ndquadrant.com>  wrote:
>
>> On 2012-12-12 03:04:19 -0800, David Gould wrote:
>>>
>>> COPY IN loops in heap_multi_insert() extending the table until it fills the
>
>> Heh. Nice one. Did you hit that in practice?
>
> Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
> happens late in the initial setup script for new hosts. The first batch of
> new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
> within a minute. Since the script setups up a lot of stuff we had no idea
> at first who oomed.

The bug's been fixed now, but note that huge tuples like this will 
always cause the table to be extended. Even if there are completely 
empty pages in the table, after a vacuum. Even a completely empty 
existing page is not considered spacious enough in this case, because 
it's still too small when you take fillfactor into account, so the 
insertion will always extend the table. If you regularly run into this 
situation, you might want to raise your fillfactor..

- Heikki

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

David Gould

Дата:

12 декабря 2012 г., 12:23:57

On Wed, 12 Dec 2012 13:56:08 +0200
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

> However, RelationGetBufferForTuple() won't return such 
> a page, it guarantees that the first tuple does indeed fit on the page 
> it returns. For the same reason, the later check that at least one tuple 
> was actually placed on the page is not necessary.
> 
> I committed a slightly different version, which unconditionally puts the 
> first tuple on the page, and only applies the freespace check to the 
> subsequent tuples. Since RelationGetBufferForTuple() guarantees that the 
> first tuple fits, we can trust that, like heap_insert does.
> 
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple 
> *tuples, int ntuples,
>           /* NO EREPORT(ERROR) from here till changes are logged */
>           START_CRIT_SECTION();
> 
> -        /* Put as many tuples as fit on this page */
> -        for (nthispage = 0; ndone + nthispage < ntuples; nthispage++)
> +        /*
> +         * ()  has ensured that the first tuple fits.
> +         * Put that on the page, and then as many other tuples as fit.
> +         */
> +        RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
> +        for (nthispage = 1; ndone + nthispage < ntuples; nthispage++)
>           {
>               HeapTuple    heaptup = heaptuples[ndone + nthispage];

I don't know if this is the same thing. At least in the comments I was
reading trying to figure this out there was some concern that someone
else could change the space on the page. Does RelationGetBufferForTuple()
guarantee against this too?

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Heikki Linnakangas

Дата:

12 декабря 2012 г., 12:27:59

On 12.12.2012 14:24, David Gould wrote:
> I don't know if this is the same thing. At least in the comments I was
> reading trying to figure this out there was some concern that someone
> else could change the space on the page. Does RelationGetBufferForTuple()
> guarantee against this too?

Yeah, RelationGetBufferForTuple grabs a lock on the page before 
returning it. For comparison, plain heap_insert does simply this:

>     buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
>                                        InvalidBuffer, options, bistate,
>                                        &vmbuffer, NULL);
>
>     /* NO EREPORT(ERROR) from here till changes are logged */
>     START_CRIT_SECTION();
>
>     RelationPutHeapTuple(relation, buffer, heaptup);

- Heikki

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

David Gould

Дата:

12 декабря 2012 г., 13:29:37

On Wed, 12 Dec 2012 14:23:12 +0200
Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> The bug's been fixed now, but note that huge tuples like this will 
> always cause the table to be extended. Even if there are completely 
> empty pages in the table, after a vacuum. Even a completely empty 
> existing page is not considered spacious enough in this case, because 
> it's still too small when you take fillfactor into account, so the 
> insertion will always extend the table. If you regularly run into this 
> situation, you might want to raise your fillfactor..

Actually, we'd like it lower. Ideally, one row per page.

We lose noticable performance when we raise fill-factor above 10. Even 20 is
slower.

During busy times these hosts sometimes fall into a stable state
with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
PinBuffer plus very high system cpu in the scheduler (I don't have the perf
trace in front of me so take this with a grain of salt). In this mode they
fall from the normal 7000 queries per second to below 3000. Once in this
state they tend to stay that way. If we turn down the number of incoming
requests they go back to normal. Our conjecture is that most requests are
for only a few keys and so we have multiple sessions contending for few
pages and convoying in the buffer manager. The table is under 20k rows, but
the hot items are probably only a couple hundred different rows. The busy
processes are doing reads only, but there is some update activity on this
table too.

Ah, found an email with the significant part of the perf output:

> ... set number of client threads = number of postgres backends = 70. That way
> all my threads have constant access to a backend and they just spin in a tight
> loop running the same query over and over (with different values). ... this
> seems to have tapped into 9.2's resonant frequency, right now we're spending
> almost all our time spin locking.
...
>            762377.00 71.0% s_lock                       /usr/local/bin/postgres
>             22279.00  2.1% LWLockAcquire                /usr/local/bin/postgres
>             18916.00  1.8% LWLockRelease                /usr/local/bin/postgres 

I was trying to resurrect the pthread s_lock() patch to see if that helps,
but it did not apply at all and I have not had time to persue it.

We have tried lots of number of processes and get the best result with
about ten less active postgresql backends than HT cores. System is 128GB
with:
processor       : 79
vendor_id       : GenuineIntel
cpu family      : 6
model           : 47
model name      : Intel(R) Xeon(R) CPU E7-L8867  @ 2.13GHz
stepping        : 2
cpu MHz         : 2128.478
cache size      : 30720 KB

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Tom Lane

Дата:

12 декабря 2012 г., 15:37:18

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> The bug's been fixed now, but note that huge tuples like this will 
> always cause the table to be extended. Even if there are completely 
> empty pages in the table, after a vacuum. Even a completely empty 
> existing page is not considered spacious enough in this case, because 
> it's still too small when you take fillfactor into account, so the 
> insertion will always extend the table.

Seems like that's a bug in itself: there's no reason to reject an empty
existing page.
        regards, tom lane

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

Robert Haas

Дата:

14 декабря 2012 г., 20:39:51

On Wed, Dec 12, 2012 at 8:29 AM, David Gould <daveg@sonic.net> wrote:
> We lose noticable performance when we raise fill-factor above 10. Even 20 is
> slower.

Whoa.

> During busy times these hosts sometimes fall into a stable state
> with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
> PinBuffer plus very high system cpu in the scheduler (I don't have the perf
> trace in front of me so take this with a grain of salt). In this mode they
> fall from the normal 7000 queries per second to below 3000.

I have seen signs of something similar to this when running pgbench -S
tests at high concurrency.  I've never been able to track down where
the problem is happening.  My belief is that once a spinlock starts to
be contended, there's some kind of death spiral that can't be arrested
until the workload eases up.  But I haven't had much luck identifying
exactly which spinlock is the problem or if it even is just one...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: bulk_multi_insert infinite loops with large rows and small fill factors

От

David Gould

Дата:

14 декабря 2012 г., 23:37:09

On Fri, 14 Dec 2012 15:39:44 -0500
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Dec 12, 2012 at 8:29 AM, David Gould <daveg@sonic.net> wrote:
> > We lose noticable performance when we raise fill-factor above 10. Even 20 is
> > slower.
>
> Whoa.

Any interest in a fill-factor patch to place exactly one row per page? That
would be the least contended. There are applications where it might help.

> > During busy times these hosts sometimes fall into a stable state
> > with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
> > PinBuffer plus very high system cpu in the scheduler (I don't have the perf
> > trace in front of me so take this with a grain of salt). In this mode they
> > fall from the normal 7000 queries per second to below 3000.
> 
> I have seen signs of something similar to this when running pgbench -S
> tests at high concurrency.  I've never been able to track down where

I think I may have seen that with pgbench -S too. I did not have time to
investigate more, but out of a sequence of three minute runs I got most
runs at 300k+ qps and but a couple were around 200k qps.

> the problem is happening.  My belief is that once a spinlock starts to
> be contended, there's some kind of death spiral that can't be arrested
> until the workload eases up.  But I haven't had much luck identifying
> exactly which spinlock is the problem or if it even is just one...

I agree about the death spiral. I think what happens is all the backends
get synchcronized by waiting and they are more likely to contend again.

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: bulk_multi_insert infinite loops with large rows and small fill factors

Вложения