Обсуждение: autovacuum process blocks without reporting a deadlock

Поиск
Список
Период
Сортировка

autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
Hi anybody,

I step in just one of our identically customer databases in a kind of
a deadlock with Autovacuum involved.

One Autovacuum process stuck in the middle of the night and it seemed
that it compete with another Select process for an index:

[14398 / 2007-11-21 00:52:04 CET]CONTEXT:  SQL statement "UPDATE
hst_timerecording SET id_timerecording_join = NULL WHERE
id_timerecording_join = -1"
PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement
        SQL statement "UPDATE hst_timerecording SET sales_volume =
NULL, sales_volume_commission = NULL WHERE business_day =  $1  AND
id_employee =  $2 "
        PL/pgSQL function "compress_salaries_day" line 168 at SQL statement
        SQL statement "SELECT  compress_salaries_day( $1 , NULL, NULL)"
        PL/pgSQL function "compress" line 460 at perform
[14398 / 2007-11-21 00:52:04 CET]LOCATION:  exec_stmt_raise, pl_exec.c:2110
[14391 / 2007-11-21 00:52:14 CET]DEBUG:  00000: index
"hst_timerecording_id_timerecording_idx" now contains 8537 row
versions in 61 pages
[14391 / 2007-11-21 00:52:14 CET]DETAIL:  4454 index row versions were removed.
        12 index pages have been deleted, 0 are currently reusable.
        CPU 0.00s/0.00u sec elapsed 3.47 sec.
[14391 / 2007-11-21 00:52:14 CET]LOCATION:  lazy_vacuum_index, vacuumlazy.c:736

This are the last log entires for these both processes. Over 9 hours
later, i can see them allready running  in the process list :

14391 ?        S      0:00 postgres: autovacuum process
backoffice_db
14398 ?        S      0:02 postgres: spoon backoffice_db office(39302)
SELECT waiting

This happens every night and a dump restore wont help.

I dont set any explicit locks and so i would expect that no deadlock
could occure? And when, then i would expect that it would be logged
together with the dead locked relations?

Can anyone give a tipp, please?

Version: PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 4.0.2 (Debian 4.0.2-2)

regards
thomas

Re: autovacuum process blocks without reporting a deadlock

От
Alvaro Herrera
Дата:
Thomas Chille escribió:

> This are the last log entires for these both processes. Over 9 hours
> later, i can see them allready running  in the process list :
>
> 14391 ?        S      0:00 postgres: autovacuum process
> backoffice_db
> 14398 ?        S      0:02 postgres: spoon backoffice_db office(39302)
> SELECT waiting

Can you paste the relevant lines from pg_locks?

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

Re: autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
i have to wait till monday, then i can provide these lines.

thanks,
thomas

On Nov 23, 2007 1:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Thomas Chille escribió:
>
> > This are the last log entires for these both processes. Over 9 hours
> > later, i can see them allready running  in the process list :
> >
> > 14391 ?        S      0:00 postgres: autovacuum process
> > backoffice_db
> > 14398 ?        S      0:02 postgres: spoon backoffice_db office(39302)
> > SELECT waiting
>
> Can you paste the relevant lines from pg_locks?
>
> --
> Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
> "Estoy de acuerdo contigo en que la verdad absoluta no existe...
> El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
>

Re: autovacuum process blocks without reporting a deadlock

От
Tom Lane
Дата:
"Thomas Chille" <thomas.chille@gmail.com> writes:
> One Autovacuum process stuck in the middle of the night and it seemed
> that it compete with another Select process for an index:

> [14391 / 2007-11-21 00:52:14 CET]DEBUG:  00000: index
> "hst_timerecording_id_timerecording_idx" now contains 8537 row
> versions in 61 pages

What other indexes does that table have?

            regards, tom lane

Re: autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
On Nov 24, 2007 6:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What other indexes does that table have?
>
>                         regards, tom lane
>

Hi,

last night it happend again. In the log-snippet u can see all indexes
of this table:

[9293 / 2007-11-26 21:46:28 CET]CONTEXT:  SQL statement "UPDATE
hst_timerecording SET id_timerecording_join = NULL WHERE
id_timerecording_join = -1"
        PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement
        SQL statement "UPDATE hst_timerecording SET sales_volume =
NULL, sales_volume_commission = NULL WHERE business_day =  $1  AND
id_employee =  $2 "
        PL/pgSQL function "compress_salaries_day" line 168 at SQL statement
        SQL statement "SELECT  compress_salaries_day( $1 , NULL, NULL)"
        PL/pgSQL function "compress" line 460 at perform
[9293 / 2007-11-26 21:46:28 CET]LOCATION:  exec_stmt_raise, pl_exec.c:2110
[9317 / 2007-11-26 21:46:34 CET]DEBUG:  00000: index
"hst_timerecording_business_day_idx" now contains 8640 row versions in
80 pages
[9317 / 2007-11-26 21:46:34 CET]DETAIL:  4469 index row versions were removed.
        13 index pages have been deleted, 9 are currently reusable.
        CPU 0.00s/0.00u sec elapsed 3.40 sec.
[9317 / 2007-11-26 21:46:34 CET]LOCATION:  lazy_vacuum_index, vacuumlazy.c:736
[9317 / 2007-11-26 21:46:39 CET]DEBUG:  00000: index
"hst_timerecording_id_employee_idx" now contains 8640 row versions in
95 pages
[9317 / 2007-11-26 21:46:39 CET]DETAIL:  4469 index row versions were removed.
        10 index pages have been deleted, 6 are currently reusable.
        CPU 0.00s/0.00u sec elapsed 4.22 sec.
[9317 / 2007-11-26 21:46:39 CET]LOCATION:  lazy_vacuum_index, vacuumlazy.c:736
[9317 / 2007-11-26 21:46:43 CET]DEBUG:  00000: index
"hst_timerecording_id_timerecording_idx" now contains 8640 row
versions in 97 pages
[9317 / 2007-11-26 21:46:43 CET]DETAIL:  4469 index row versions were removed.
        11 index pages have been deleted, 5 are currently reusable.
        CPU 0.00s/0.00u sec elapsed 4.40 sec.
[9317 / 2007-11-26 21:46:43 CET]LOCATION:  lazy_vacuum_index, vacuumlazy.c:736

This are again the last log-entries of the blocking processes.
Obviously the three indexes where successful vacuumed?

I think this are the relevant pg_locks entries:

relation        75685778        75686189
                         9017862     25467   AccessShareLock f
relation        75685778        75686189
                         9009323     9317    ShareUpdateExclusiveLock
      t
relation        75685778        75686189
                         9009312     9293    AccessShareLock t
relation        75685778        75686189
                         9009312     9293    RowExclusiveLock        t
relation        75685778        75686189
                         9009312     9293    AccessExclusiveLock     f
relation        75685778        75686189
                         9012978     28370   AccessShareLock f

75686189 is the table hst_timerecording. for me it looks like the
autovacuum is not releasing the blocking ShareUpdateExclusiveLock?

I hope this infos could help.

regards, t
thomas

Re: autovacuum process blocks without reporting a deadlock

От
Alvaro Herrera
Дата:
Thomas Chille wrote:

> I think this are the relevant pg_locks entries:
>
> relation        75685778        75686189
>                          9017862     25467   AccessShareLock f
> relation        75685778        75686189
>                          9009323     9317    ShareUpdateExclusiveLock
>       t
> relation        75685778        75686189
>                          9009312     9293    AccessShareLock t
> relation        75685778        75686189
>                          9009312     9293    RowExclusiveLock        t
> relation        75685778        75686189
>                          9009312     9293    AccessExclusiveLock     f
> relation        75685778        75686189
>                          9012978     28370   AccessShareLock f
>
> 75686189 is the table hst_timerecording. for me it looks like the
> autovacuum is not releasing the blocking ShareUpdateExclusiveLock?

What are the column headings?  I find this difficult to read.

Please post the whole of pg_locks.  I may be missing something but I
think we're missing part of the picture here.  Autovacuum does not seem
to be locking on anything.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)

Re: autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
hat are the column headings?  I find this difficult to read.
>
> Please post the whole of pg_locks.  I may be missing something but I
> think we're missing part of the picture here.  Autovacuum does not seem
> to be locking on anything.

Unfortunately i logged the pg_locks-state not well formated. i added
now the heading manually. maybe it is better or i have to wait till
tomorrow morning.

but what i think i can see, is that the process with pid 9317 is
holding a ShareUpdateExclusiveLock

locktype   |  database       |  relation       |   transaction |  pid
|  mode                             |  granted
relation    |  75685778    | 75686189   |    9009323    |   9317   |
ShareUpdateExclusiveLock | t

and process with pid 9293 is waiting to get AccessExclusiveLock on the
same relation:

relation    |  75685778    | 75686189   |    9009312    |   9293   |
AccessShareLock |  f

... and 9317 is the autvacuumprocess?

afterkiling it, erything is fine.

i attached the full pg_lock.

regards,
thomas

Вложения

Re: autovacuum process blocks without reporting a deadlock

От
Alvaro Herrera
Дата:
Thomas Chille wrote:
> On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> hat are the column headings?  I find this difficult to read.
> >
> > Please post the whole of pg_locks.  I may be missing something but I
> > think we're missing part of the picture here.  Autovacuum does not seem
> > to be locking on anything.
>
> Unfortunately i logged the pg_locks-state not well formated. i added
> now the heading manually. maybe it is better or i have to wait till
> tomorrow morning.
>
> but what i think i can see, is that the process with pid 9317 is
> holding a ShareUpdateExclusiveLock

You missed that 9317 is also holding an ExclusiveLock.  What are the
vacuum_cost_delay and autovacuum_vacuum_cost_delay setting?  I think
what's happening here is that you're being bitten by the bug that made
autovac sleep because of vacuum_delay, with the exclusive lock held
trying to truncate the table.

The solution is to upgrade to 8.1.10.

Also I just noticed you're on 8.1.4.  This is a bad idea because of
another ancient autovacuum bug.  As soon as you upgrade, allow
connections to template0 (update pg_database, set datallowconn), connect
to it and issue VACUUM FREEZE.  Then disallow connections to it again.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

Re: autovacuum process blocks without reporting a deadlock

От
Alvaro Herrera
Дата:
Alvaro Herrera wrote:
> Thomas Chille wrote:
> > On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > hat are the column headings?  I find this difficult to read.
> > >
> > > Please post the whole of pg_locks.  I may be missing something but I
> > > think we're missing part of the picture here.  Autovacuum does not seem
> > > to be locking on anything.
> >
> > Unfortunately i logged the pg_locks-state not well formated. i added
> > now the heading manually. maybe it is better or i have to wait till
> > tomorrow morning.
> >
> > but what i think i can see, is that the process with pid 9317 is
> > holding a ShareUpdateExclusiveLock
>
> You missed that 9317 is also holding an ExclusiveLock.

Sorry, I misread it.  My analysis is bogus :-)  I suggest you upgrade
anyway because there are nasty bugs in the version you are using.

I assume 77862554, 77862552 and 77862553 are indexes on the 75686189
table?

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

Re: autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
yes, u are right.

this are the 3 involved indexes:

hst_timerecording_business_day_idx on hst_timerecording
hst_timerecording_id_employee_idxon hst_timerecording
hst_timerecording_id_timerecording_idxon hst_timerecording

lg t

On Nov 27, 2007 4:07 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Alvaro Herrera wrote:
> > Thomas Chille wrote:
> > > On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > hat are the column headings?  I find this difficult to read.
> > > >
> > > > Please post the whole of pg_locks.  I may be missing something but I
> > > > think we're missing part of the picture here.  Autovacuum does not seem
> > > > to be locking on anything.
> > >
> > > Unfortunately i logged the pg_locks-state not well formated. i added
> > > now the heading manually. maybe it is better or i have to wait till
> > > tomorrow morning.
> > >
> > > but what i think i can see, is that the process with pid 9317 is
> > > holding a ShareUpdateExclusiveLock
> >
> > You missed that 9317 is also holding an ExclusiveLock.
>
> Sorry, I misread it.  My analysis is bogus :-)  I suggest you upgrade
> anyway because there are nasty bugs in the version you are using.
>
> I assume 77862554, 77862552 and 77862553 are indexes on the 75686189
> table?
>
> --
> Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
>
> "La experiencia nos dice que el hombre peló millones de veces las patatas,
> pero era forzoso admitir la posibilidad de que en un caso entre millones,
> las patatas pelarían al hombre" (Ijon Tichy)
>

Re: autovacuum process blocks without reporting a deadlock

От
Tom Lane
Дата:
"Thomas Chille" <thomas@chille.de> writes:
> I think this are the relevant pg_locks entries:

> relation        75685778        75686189
>                          9017862     25467   AccessShareLock f
> relation        75685778        75686189
>                          9009323     9317    ShareUpdateExclusiveLock
>       t
> relation        75685778        75686189
>                          9009312     9293    AccessShareLock t
> relation        75685778        75686189
>                          9009312     9293    RowExclusiveLock        t
> relation        75685778        75686189
>                          9009312     9293    AccessExclusiveLock     f
> relation        75685778        75686189
>                          9012978     28370   AccessShareLock f

I don't think the vacuum is the problem here.  The problem is process
9293, which for some reason is trying to get AccessExclusiveLock, and is
blocked behind autovac, and everything else is stacking up behind it.

You didn't happen to note what 9293 was doing did you?  It's living
fairly dangerously in any case by trying to acquire exclusive lock
when it already holds a bunch of other lower-level locks; that's a
recipe for deadlock if I ever saw one.

            regards, tom lane

Re: autovacuum process blocks without reporting a deadlock

От
"Thomas Chille"
Дата:
On Nov 27, 2007 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You didn't happen to note what 9293 was doing did you?  It's living
> fairly dangerously in any case by trying to acquire exclusive lock
> when it already holds a bunch of other lower-level locks; that's a
> recipe for deadlock if I ever saw one.
>
>                         regards, tom lane
>

Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ...
DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an
AccessExclusiveLock and runs in a deadlock?

So is better not to use "ALTER TABLE ... " in triggerfunctions,
because there are always existing lower-level locks?

regards,
thomas

Re: autovacuum process blocks without reporting a deadlock

От
Tom Lane
Дата:
"Thomas Chille" <thomas@chille.de> writes:
> Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ...
> DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an
> AccessExclusiveLock and runs in a deadlock?

Well, you're certainly risking deadlock with that; and even if no
actual deadlock happens, waiting for the exclusive lock needed to
do the ALTER can cause the kind of lock queueing you had here.

I'd suggest trying to think of a way to solve your problem that doesn't
need trigger enabling/disabling.

            regards, tom lane