Обсуждение: vac truncation scan problems

Поиск
Список
Период
Сортировка

vac truncation scan problems

От
Jeff Janes
Дата:
After freeing up the rows at the end of the table so it is eligible for truncation, then running a manual VACUUM to actually release the space, I kept running into the problem that the truncation scan was consistently suspended and then aborted due to a conflicting lock requested/held.  

But the perversity is that that conflicting lock request can only be coming, as far as I can tell, from the autovac process.  I'm not sure how this happens, as I thought autovac never waited for locks but only obtained one if it were instantaneously available, but that it is the only explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so maybe that is just luck.

On an otherwise idle system:

pgbench -i -s 1000 -n

alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey ;
delete from pgbench_accounts; 
vacuum verbose pgbench_accounts;

As soon as truncation scan starts, it suspends and then quickly terminates.

Anyone have theories on what's going on?

Cheers,

Jeff

Re: vac truncation scan problems

От
Jeff Janes
Дата:
On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
After freeing up the rows at the end of the table so it is eligible for truncation, then running a manual VACUUM to actually release the space, I kept running into the problem that the truncation scan was consistently suspended and then aborted due to a conflicting lock requested/held.  

But the perversity is that that conflicting lock request can only be coming, as far as I can tell, from the autovac process.  I'm not sure how this happens, as I thought autovac never waited for locks but only obtained one if it were instantaneously available, but that it is the only explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so maybe that is just luck.


It looks like the culprit is this:

commit 0d831389749a3baaced7b984205b9894a82444b9
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Mar 18 11:52:33 2015 -0300

    Rationalize vacuuming options and parameters

I'd guess the autovac nature of the autovac process is getting lost in there where, but I don't see where.

Cheers,

Jeff

Re: vac truncation scan problems

От
Kyotaro HORIGUCHI
Дата:
Hi, this is a bug in the commit 0d831389749a3baaced7b984205b9894a82444b9 .

It allows vucuum freeze to be skipped and inversely lets regular
vacuum wait for lock. The attched patch fixes it.


In table_recheck_autovac, vacuum options are determined as following,
>     tab->at_vacoptions = VACOPT_SKIPTOAST |>         (dovacuum ? VACOPT_VACUUM : 0) |>         (doanalyze ?
VACOPT_ANALYZE: 0) |
 
!>         (wraparound ? VACOPT_NOWAIT : 0);

The line prefixed by '!' looks inverted.

At Mon, 30 Mar 2015 23:42:51 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in
<CAMkU=1xmTEiaY=5oMHsSQo5vd9V1Ze4kNLL0qN2eH0P_GXOaYw@mail.gmail.com>
jeff.janes> On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > But the perversity is that that conflicting lock request can only be
> > coming, as far as I can tell, from the autovac process.  I'm not sure how
> > this happens, as I thought autovac never waited for locks but only obtained
> > one if it were instantaneously available, but that it is the only
> > explanation I can think of.
> >
> > I'm not seeing this in 9.4, but I'm not sure how deterministic it is so
> > maybe that is just luck.
> >

Good catch!

> It looks like the culprit is this:
> 
> commit 0d831389749a3baaced7b984205b9894a82444b9
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date:   Wed Mar 18 11:52:33 2015 -0300
> 
>     Rationalize vacuuming options and parameters
> 
> I'd guess the autovac nature of the autovac process is getting lost in
> there where, but I don't see where.

You're right, the patch does something like following,


+        tab->at_vacoptions = VACOPT_SKIPTOAST |
+            (dovacuum ? VACOPT_VACUUM : 0) |
+            (doanalyze ? VACOPT_ANALYZE : 0) |
+            (wraparound ? VACOPT_NOWAIT : 0);
..
-    if (!tab->at_wraparound)
-        vacstmt.options = VACOPT_NOWAIT;

This inverses the meaning of at_wraparound. No previous version
isn't affected by this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: vac truncation scan problems

От
Michael Paquier
Дата:


On Tue, Mar 31, 2015 at 3:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
After freeing up the rows at the end of the table so it is eligible for truncation, then running a manual VACUUM to actually release the space, I kept running into the problem that the truncation scan was consistently suspended and then aborted due to a conflicting lock requested/held.  

But the perversity is that that conflicting lock request can only be coming, as far as I can tell, from the autovac process.  I'm not sure how this happens, as I thought autovac never waited for locks but only obtained one if it were instantaneously available, but that it is the only explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so maybe that is just luck.


It looks like the culprit is this:

commit 0d831389749a3baaced7b984205b9894a82444b9
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Mar 18 11:52:33 2015 -0300

    Rationalize vacuuming options and parameters

I'd guess the autovac nature of the autovac process is getting lost in there where, but I don't see where.

Hm. I ran a couple of tests and am noticing that a manual VACUUM is not able to truncate all the pages (it should, no?)... For example with your test case on REL9_4_STABLE VACUUM VERBOSE reports that all the pages are truncated:
INFO:  00000: "pgbench_accounts": truncated 16394 to 0 pages

OK, on HEAD this does not seem to work:
INFO:  00000: "pgbench_accounts": truncated 16394 to 13554 pages

But if I try as well with 4559167c (0d831389~1) I am getting a similar result:
INFO:  00000: "pgbench_accounts": truncated 16394 to 3309 pages

I will try to bisect to the origin of that. This may be related to what you are seeing.
Regards,
--
Michael

Re: vac truncation scan problems

От
Jeff Janes
Дата:
On Tue, Mar 31, 2015 at 1:28 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi, this is a bug in the commit 0d831389749a3baaced7b984205b9894a82444b9 .

It allows vucuum freeze to be skipped and inversely lets regular
vacuum wait for lock. The attched patch fixes it.


In table_recheck_autovac, vacuum options are determined as following,

 >      tab->at_vacoptions = VACOPT_SKIPTOAST |
 >              (dovacuum ? VACOPT_VACUUM : 0) |
 >              (doanalyze ? VACOPT_ANALYZE : 0) |
!>              (wraparound ? VACOPT_NOWAIT : 0);

The line prefixed by '!' looks inverted.

Thanks, it is obvious once you see it!

Your patch solved the problem for me.

Cheers,

Jeff

Re: vac truncation scan problems

От
Jeff Janes
Дата:
On Tue, Mar 31, 2015 at 1:29 AM, Michael Paquier <michael.paquier@gmail.com> wrote:


On Tue, Mar 31, 2015 at 3:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Mar 30, 2015 at 8:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
After freeing up the rows at the end of the table so it is eligible for truncation, then running a manual VACUUM to actually release the space, I kept running into the problem that the truncation scan was consistently suspended and then aborted due to a conflicting lock requested/held.  

But the perversity is that that conflicting lock request can only be coming, as far as I can tell, from the autovac process.  I'm not sure how this happens, as I thought autovac never waited for locks but only obtained one if it were instantaneously available, but that it is the only explanation I can think of.

I'm not seeing this in 9.4, but I'm not sure how deterministic it is so maybe that is just luck.


It looks like the culprit is this:

commit 0d831389749a3baaced7b984205b9894a82444b9
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Wed Mar 18 11:52:33 2015 -0300

    Rationalize vacuuming options and parameters

I'd guess the autovac nature of the autovac process is getting lost in there where, but I don't see where.

Hm. I ran a couple of tests and am noticing that a manual VACUUM is not able to truncate all the pages (it should, no?)... For example with your test case on REL9_4_STABLE VACUUM VERBOSE reports that all the pages are truncated:
INFO:  00000: "pgbench_accounts": truncated 16394 to 0 pages

OK, on HEAD this does not seem to work:
INFO:  00000: "pgbench_accounts": truncated 16394 to 13554 pages

But if I try as well with 4559167c (0d831389~1) I am getting a similar result:
INFO:  00000: "pgbench_accounts": truncated 16394 to 3309 pages

I was not seeing that on commits before 0d831389--although often once the truncation scan had run for a while without problem, I would abort the session and call it good, as my failures were always quick ones.  

Did it tell you why?  If it surrendered the lock to a competing process, it should report that as previous INFO messages. If it doesn't give one of those, then it probably just thinks there are some tuples it can't remove yet somewhere.  What did it give earlier up in the verbose output, for the number of removed and nonremovable tuples?
 
Cheers,

Jeff

Re: vac truncation scan problems

От
Michael Paquier
Дата:


On Wed, Apr 1, 2015 at 2:18 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Mar 31, 2015 at 1:28 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi, this is a bug in the commit 0d831389749a3baaced7b984205b9894a82444b9 .

It allows vucuum freeze to be skipped and inversely lets regular
vacuum wait for lock. The attched patch fixes it.


In table_recheck_autovac, vacuum options are determined as following,

 >      tab->at_vacoptions = VACOPT_SKIPTOAST |
 >              (dovacuum ? VACOPT_VACUUM : 0) |
 >              (doanalyze ? VACOPT_ANALYZE : 0) |
!>              (wraparound ? VACOPT_NOWAIT : 0);

The line prefixed by '!' looks inverted.

Thanks, it is obvious once you see it!

Nice catch, Horiguchi-san.
--
Michael

Re: vac truncation scan problems

От
Michael Paquier
Дата:


On Wed, Apr 1, 2015 at 2:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
Did it tell you why?  If it surrendered the lock to a competing process, it should report that as previous INFO messages. If it doesn't give one of those, then it probably just thinks there are some tuples it can't remove yet somewhere.  What did it give earlier up in the verbose output, for the number of removed and nonremovable tuples?

I just had an extra look at that, and I just got trapped a0f5954a that bumped max_wal_size from 128GB to 1GB.. Sorry for the noise.
--
Michael

Re: vac truncation scan problems

От
Kyotaro HORIGUCHI
Дата:
By the way, what shoud we do about this?
- Waiting for someone's picking up this.
- Making another thread to attract notice
- Otherwise..

At Wed, 1 Apr 2015 10:49:55 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTMBd6=5i3ZOg9t6A0km4VZ=wNt4_rwOzPVm683-aQ=Qw@mail.gmail.com>
> On Wed, Apr 1, 2015 at 2:18 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> 
> > On Tue, Mar 31, 2015 at 1:28 AM, Kyotaro HORIGUCHI <
> > horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >
> >> Hi, this is a bug in the commit 0d831389749a3baaced7b984205b9894a82444b9 .
> >>
> >> It allows vucuum freeze to be skipped and inversely lets regular
> >> vacuum wait for lock. The attched patch fixes it.
> >>
> >>
> >> In table_recheck_autovac, vacuum options are determined as following,
> >>
> >>  >      tab->at_vacoptions = VACOPT_SKIPTOAST |
> >>  >              (dovacuum ? VACOPT_VACUUM : 0) |
> >>  >              (doanalyze ? VACOPT_ANALYZE : 0) |
> >> !>              (wraparound ? VACOPT_NOWAIT : 0);
> >>
> >> The line prefixed by '!' looks inverted.
> >>
> >
> > Thanks, it is obvious once you see it!
> >
> 
> Nice catch, Horiguchi-san.

Thank you:)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: vac truncation scan problems

От
Michael Paquier
Дата:


On Wed, Apr 1, 2015 at 4:35 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
By the way, what should we do about this?

 - Waiting for someone's picking up this.
 - Making another thread to attract notice
 - Otherwise..

I am sure someone will show up quickly and push the fix you provided.
--
Michael
 

Re: vac truncation scan problems

От
Kyotaro HORIGUCHI
Дата:
Hi,

At Wed, 1 Apr 2015 16:50:41 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTxVdpju+a5rk3p2VGe_gHaVk+ht97_HUgWfG9ULyho9w@mail.gmail.com>
> I am sure someone will show up quickly and push the fix you provided.

Ok, I'll be a good boy.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: vac truncation scan problems

От
Alvaro Herrera
Дата:
Kyotaro HORIGUCHI wrote:
> Hi, this is a bug in the commit 0d831389749a3baaced7b984205b9894a82444b9 .
> 
> It allows vucuum freeze to be skipped and inversely lets regular
> vacuum wait for lock. The attched patch fixes it.
> 
> 
> In table_recheck_autovac, vacuum options are determined as following,
> 
>  >     tab->at_vacoptions = VACOPT_SKIPTOAST |
>  >         (dovacuum ? VACOPT_VACUUM : 0) |
>  >         (doanalyze ? VACOPT_ANALYZE : 0) |
> !>         (wraparound ? VACOPT_NOWAIT : 0);
> 
> The line prefixed by '!' looks inverted.

You're absolutely right.  My mistake.  Pushed your patch, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: vac truncation scan problems

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Kyotaro HORIGUCHI wrote:
>> The line prefixed by '!' looks inverted.

> You're absolutely right.  My mistake.  Pushed your patch, thanks.

Don't see any such commit from here?
        regards, tom lane