Обсуждение: pg_dump does way too much before acquiring table locks

Поиск
Список
Период
Сортировка

pg_dump does way too much before acquiring table locks

От
Tom Lane
Дата:
While poking at pg_dump for some work I'll show later, I grew quite
unhappy with the extent to which people have ignored this advice
given near the head of getTables():

     * Note: in this phase we should collect only a minimal amount of
     * information about each table, basically just enough to decide if it is
     * interesting. We must fetch all tables in this phase because otherwise
     * we cannot correctly identify inherited columns, owned sequences, etc.

Far from collecting "a minimal amount of information", we have
repeatedly stuffed extra joins and expensive sub-selects into this
query.  That's fairly inefficient if we aren't interested in dumping
every table, but it's much worse than that: we are collecting all this
info *before we have lock* on the tables.  That means we are at
serious risk of data skew in any place where we consult server-side
functions, eg pg_get_partkeydef().  For example:

regression=# create table foo(f1 int) partition by range(f1);
CREATE TABLE
regression=# begin; drop table foo;
BEGIN
DROP TABLE

Now start a pg_dump in another shell session, wait a second or two,
and

regression=*# commit;
COMMIT

and the pg_dump blows up:

$ pg_dump -s regression
pg_dump: error: query failed: ERROR:  could not open relation with OID 37796

(Note: it's not so easy to provoke this failure manually before
e2ff7d9a8 guaranteed that pg_dump would wait around for you,
but it's certainly possible.)

To add insult to injury, all that work being done in this query
makes the time window for trouble wider.

Testing on the regression database, which isn't all that big,
I observe getTable's query taking 140 ms, substantially longer
than the next slowest thing done by "pg_dump -s regression".
It seems that the largest chunk of this can be blamed on the
sub-selects added by the pg_init_privs patch: EXPLAIN ANALYZE
puts their total runtime at ~100ms.

I believe that the pg_init_privs sub-selects can probably be
nuked, or at least their cost can be paid later.  The other work
I mentioned has proven that we do not actually need to know
the ACL situation to determine whether a table is "interesting",
so we don't have to do that work before acquiring lock.

However, that still leaves us doing several inessential joins,
not to mention those unsafe partition-examination functions,
before acquiring lock.

I am thinking that the best fix is to make getTables() perform
two queries (or, possibly, split it into two functions).  The
first pass would acquire only the absolutely minimal amount of
data needed to decide whether a table is "interesting", and then
lock such tables.  Then we'd go back to fill in the remaining
data.  While it's fairly annoying to read pg_class twice, it
looks like the extra query might only take a handful of msec.

Also, if we don't do it like that, it seems that we'd have to
add entirely new queries to call pg_get_partkeydef() and
pg_get_expr(relpartbound) in.  So I'm not really seeing a
better-performing alternative.

Thoughts?

            regards, tom lane



Re: Assorted improvements in pg_dump

От
Justin Pryzby
Дата:
On Wed, Oct 20, 2021 at 05:14:45PM -0400, Tom Lane wrote:
> Lastly, patch 0003 addresses the concern I raised at [3] that it's
> unsafe to call pg_get_partkeydef() and pg_get_expr(relpartbound)
> in getTables().  Looking closer I realized that we can't cast
> pg_class.reloftype to regtype at that point either, since regtypeout
> is going to notice if the type has been concurrently dropped.
> 
> In [3] I'd imagined that we could just delay those calls to a second
> query in getTables(), but that doesn't work at all: if we apply
> these functions to every row of pg_class, we still risk failure
> against any relation that we didn't lock.  So there seems little
> alternative but to push these functions out to secondary queries
> executed later.
> 
> Arguably, 0003 is a bug fix that we should consider back-patching.
> However, I've not heard field reports of the problems it fixes,
> so maybe there's no need to bother.

> [3] https://www.postgresql.org/message-id/1462940.1634496313%40sss.pgh.pa.us

FYI, I see this issue happen in production environment.

Grepping logfiles on ~40 servers, I see it happened at least 3 times since
October 1.

Our backup script is probably particularly sensitive to this: it separately
dumps each "old" partition once and for all.  We're more likely to hit this
since pg_dump is called in a loop.

I never reported it, since I think it's a documented issue, and it's no great
problem, so long as it runs the next day.  But it'd be a pain to find that the
backup was incomplete when we needed it.  Also, I use the backups to migrate to
new servers, and it would be a pain to start the job at a calculated time
expecting it to finish at the beginning of a coordinated maintenance window,
but then discover that it had failed and needs to be rerun with fingers
crossed.

On Sun, Oct 17, 2021 at 02:45:13PM -0400, Tom Lane wrote:
> While poking at pg_dump for some work I'll show later, I grew quite
> unhappy with the extent to which people have ignored this advice
> given near the head of getTables():
> 
>      * Note: in this phase we should collect only a minimal amount of
>      * information about each table, basically just enough to decide if it is
>      * interesting. We must fetch all tables in this phase because otherwise
>      * we cannot correctly identify inherited columns, owned sequences, etc.
> 
> Far from collecting "a minimal amount of information", we have
> repeatedly stuffed extra joins and expensive sub-selects into this
> query.  That's fairly inefficient if we aren't interested in dumping
> every table, but it's much worse than that: we are collecting all this
> info *before we have lock* on the tables.  That means we are at
> serious risk of data skew in any place where we consult server-side
> functions, eg pg_get_partkeydef().  For example:
> 
> regression=# create table foo(f1 int) partition by range(f1);
> CREATE TABLE
> regression=# begin; drop table foo;
> BEGIN
> DROP TABLE
> 
> Now start a pg_dump in another shell session, wait a second or two,
> and
> 
> regression=*# commit;
> COMMIT
> 
> and the pg_dump blows up:
> 
> $ pg_dump -s regression
> pg_dump: error: query failed: ERROR:  could not open relation with OID 37796
...



Re: Assorted improvements in pg_dump

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Wed, Oct 20, 2021 at 05:14:45PM -0400, Tom Lane wrote:
>> Arguably, 0003 is a bug fix that we should consider back-patching.
>> However, I've not heard field reports of the problems it fixes,
>> so maybe there's no need to bother.

> FYI, I see this issue happen in production environment.
> I never reported it, since I think it's a documented issue, and it's no great
> problem, so long as it runs the next day.  But it'd be a pain to find that the
> backup was incomplete when we needed it.  Also, I use the backups to migrate to
> new servers, and it would be a pain to start the job at a calculated time
> expecting it to finish at the beginning of a coordinated maintenance window,
> but then discover that it had failed and needs to be rerun with fingers
> crossed.

Yeah, if you're dropping tables all the time, pg_dump is going to have
a problem with that.  The fix I'm suggesting here would only ensure
that you get a "clean" failure at the LOCK TABLE command --- but from
an operational standpoint, that's little improvement.

The natural response is to consider retrying the whole dump after a lock
failure.  I'm not sure if it'd be practical to do that within pg_dump
itself, as opposed to putting a loop in whatever script calls it.

            regards, tom lane