Обсуждение: VACUUM FULL results in deadlock

Поиск
Список
Период
Сортировка

VACUUM FULL results in deadlock

От
Manuel Rigger
Дата:
Hi everyone,

When executing multiple threads that execute VACUUM FULL on distinct
databases, a deadlock like the following can occur:

ERROR: deadlock detected
  Detail: Process 16407 waits for AccessShareLock on relation 1260 of
database 0; blocked by process 16404.
Process 16404 waits for RowExclusiveLock on relation 1214 of database
0; blocked by process 16407.
  Hint: See server log for query details.

This is unexpected, because the documentation does not mention that
VACUUM FULL can result in a deadlock. Also, VACUUM without a table
argument should affect only the current database [1]:

"Without a table_and_columns list, VACUUM processes every table and
materialized view in the current database that the current user has
permission to vacuum."

To reproduce such a deadlock, I've attached a Java program that first
creates 32 databases (test0 to test31), and then starts 32 threads,
each one connecting to one of the databases (with superuser
privileges). Every thread then repeatedly executes "VACUUM FULL".
Within a few seconds, deadlock error messages should pop up.

I'm using the following Postgres version: psql (11.4 (Ubuntu
11.4-1.pgdg19.04+1)).

Is this a bug?

Best,
Manuel

[1] https://www.postgresql.org/docs/11/sql-vacuum.html

Вложения

Re: VACUUM FULL results in deadlock

От
Manuel Rigger
Дата:
Hi everyone,

Did anyone try to verify this? I found a number of other race
conditions where VACUUM causes errors such as "ERROR: found unexpected
null value in index "i0", "invalid input syntax for type boolean", or
"missing chunk number 0 for toast value 13171 in pg_toast_2619".
Fixing this would help me to narrow down these issues.

Best,
Manuel

On Sat, Jun 29, 2019 at 5:51 PM Manuel Rigger <rigger.manuel@gmail.com> wrote:
>
> Hi everyone,
>
> When executing multiple threads that execute VACUUM FULL on distinct
> databases, a deadlock like the following can occur:
>
> ERROR: deadlock detected
>   Detail: Process 16407 waits for AccessShareLock on relation 1260 of
> database 0; blocked by process 16404.
> Process 16404 waits for RowExclusiveLock on relation 1214 of database
> 0; blocked by process 16407.
>   Hint: See server log for query details.
>
> This is unexpected, because the documentation does not mention that
> VACUUM FULL can result in a deadlock. Also, VACUUM without a table
> argument should affect only the current database [1]:
>
> "Without a table_and_columns list, VACUUM processes every table and
> materialized view in the current database that the current user has
> permission to vacuum."
>
> To reproduce such a deadlock, I've attached a Java program that first
> creates 32 databases (test0 to test31), and then starts 32 threads,
> each one connecting to one of the databases (with superuser
> privileges). Every thread then repeatedly executes "VACUUM FULL".
> Within a few seconds, deadlock error messages should pop up.
>
> I'm using the following Postgres version: psql (11.4 (Ubuntu
> 11.4-1.pgdg19.04+1)).
>
> Is this a bug?
>
> Best,
> Manuel
>
> [1] https://www.postgresql.org/docs/11/sql-vacuum.html



Re: VACUUM FULL results in deadlock

От
Robert Haas
Дата:
On Thu, Jul 4, 2019 at 9:12 AM Manuel Rigger <rigger.manuel@gmail.com> wrote:
> Did anyone try to verify this? I found a number of other race
> conditions where VACUUM causes errors such as "ERROR: found unexpected
> null value in index "i0", "invalid input syntax for type boolean", or
> "missing chunk number 0 for toast value 13171 in pg_toast_2619".
> Fixing this would help me to narrow down these issues.

I don't know whether or not this is a bug, but my guess is that it
isn't.  pg_database and several other system catalogs are shared
relations, which means that they are shared by all databases. So, it's
not too surprising that you could get some interaction between VACUUM
FULL command in different databases.  Routine use of VACUUM FULL is
something that should be avoided, so it doesn't seem like a huge
problem to me that you can't run 32 of them at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM FULL results in deadlock

От
Manuel Rigger
Дата:
Thanks for your comment! I can also reproduce deadlocks for other
statements that operate on distinct databases, at least also for
ANALYZE. The documentation states that such statements operate on
their current databases. If this is not a bug, should maybe the
documentation be updated to mention this caveat?

Best,
Manuel

On Fri, Jul 5, 2019 at 3:17 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jul 4, 2019 at 9:12 AM Manuel Rigger <rigger.manuel@gmail.com> wrote:
> > Did anyone try to verify this? I found a number of other race
> > conditions where VACUUM causes errors such as "ERROR: found unexpected
> > null value in index "i0", "invalid input syntax for type boolean", or
> > "missing chunk number 0 for toast value 13171 in pg_toast_2619".
> > Fixing this would help me to narrow down these issues.
>
> I don't know whether or not this is a bug, but my guess is that it
> isn't.  pg_database and several other system catalogs are shared
> relations, which means that they are shared by all databases. So, it's
> not too surprising that you could get some interaction between VACUUM
> FULL command in different databases.  Routine use of VACUUM FULL is
> something that should be avoided, so it doesn't seem like a huge
> problem to me that you can't run 32 of them at the same time.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: VACUUM FULL results in deadlock

От
Robert Haas
Дата:
On Fri, Jul 5, 2019 at 9:27 AM Manuel Rigger <rigger.manuel@gmail.com> wrote:
> Thanks for your comment! I can also reproduce deadlocks for other
> statements that operate on distinct databases, at least also for
> ANALYZE. The documentation states that such statements operate on
> their current databases. If this is not a bug, should maybe the
> documentation be updated to mention this caveat?

Possibly. The trick is always to find a good place to put things like
this -- they need to not consume a disproportionate amount of space
relative to the importance of an issue, and they need to be inserted
into a place in the documentation where they make logical sense and
are likely to be seen by users. If you have a good idea, feel free to
submit a patch, perhaps on pgsql-docs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM FULL results in deadlock

От
Alvaro Herrera
Дата:
On 2019-Jul-05, Robert Haas wrote:

> On Thu, Jul 4, 2019 at 9:12 AM Manuel Rigger <rigger.manuel@gmail.com> wrote:
> > Did anyone try to verify this? I found a number of other race
> > conditions where VACUUM causes errors such as "ERROR: found unexpected
> > null value in index "i0", "invalid input syntax for type boolean", or
> > "missing chunk number 0 for toast value 13171 in pg_toast_2619".
> > Fixing this would help me to narrow down these issues.
> 
> I don't know whether or not this is a bug, but my guess is that it
> isn't.  pg_database and several other system catalogs are shared
> relations, which means that they are shared by all databases. So, it's
> not too surprising that you could get some interaction between VACUUM
> FULL command in different databases.  Routine use of VACUUM FULL is
> something that should be avoided, so it doesn't seem like a huge
> problem to me that you can't run 32 of them at the same time.

Actually, in my read of the code, VACUUM FULL is supposed to obtain a
InvalidOid-database lock of the relation being vacuumed if it's a shared
one, so it seems to me that it should work -- namely that when vacuuming
a global table, any other vacuum of that table should be blocked
regardless of what database it occurs in.

On the other hand, not all the errors that Manuel reports are obviously
related to global catalogs.  For example, why is it complaining about a
missing toast value in pg_statistic's toast table?  Do global tables
have statistics in specific databases' pg_statistic?  How does analyze
work sensibly in that case? Maybe that's okay, but still the whole thing
should be blocked by the InvalidOid-database-level lock.

I agree that you should not be running 32 full vacuums at once, much
less in a loop, but if you do, they shouldn't result in weird corruption
such as the ones reported.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM FULL results in deadlock

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I agree that you should not be running 32 full vacuums at once, much
> less in a loop, but if you do, they shouldn't result in weird corruption
> such as the ones reported.

I agree that we probably should look closer rather than just writing this
off.  However, without a more concrete report there's no way to look
closer.

            regards, tom lane



Re: VACUUM FULL results in deadlock

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I agree that you should not be running 32 full vacuums at once, much
> less in a loop, but if you do, they shouldn't result in weird corruption
> such as the ones reported.

BTW, looking at the OIDs in the original problem report, they are for
pg_authid and pg_shdepend, both of which are shared catalogs.  It does not
seem terribly surprising to me that sets of operations that take out
exclusive locks on such catalogs are subject to deadlocks, especially not
when you consider that yet other sessions also have need to read those
catalogs.  So I'm more or less in agreement with Robert that the deadlock
errors aren't very interesting.  But I still concur with Alvaro that
those other error messages probably shouldn't be happening.

            regards, tom lane



Re: VACUUM FULL results in deadlock

От
Manuel Rigger
Дата:
On Fri, Jul 5, 2019 at 3:17 PM Robert Haas <robertmhaas@gmail.com> wrote:

> I don't know whether or not this is a bug, but my guess is that it
> isn't.  pg_database and several other system catalogs are shared
> relations, which means that they are shared by all databases. So, it's
> not too surprising that you could get some interaction between VACUUM
> FULL command in different databases.  Routine use of VACUUM FULL is
> something that should be avoided, so it doesn't seem like a huge
> problem to me that you can't run 32 of them at the same time.
>

To clarify (since it came up a couple of times in this thread): The
deadlocks can also be observed with significantly less threads and
(many) statements in between. My initial bug report used 32 threads
and only VACUUM to make it easy to quickly reproduce the deadlocks.



Re: VACUUM FULL results in deadlock

От
Manuel Rigger
Дата:
I assume that you are talking about the corruptions. I'm still trying
to make them reproducible. So far, I could not reproduce them by
trying to repeatedly replay the recorded statements.

Best,
Manuel

On Fri, Jul 5, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > I agree that you should not be running 32 full vacuums at once, much
> > less in a loop, but if you do, they shouldn't result in weird corruption
> > such as the ones reported.
>
> I agree that we probably should look closer rather than just writing this
> off.  However, without a more concrete report there's no way to look
> closer.
>
>                         regards, tom lane