Обсуждение: BUG #19086: pg_dump --data-only selects and do not uses index definitions for the dumped tables.

Поиск
Список
Период
Сортировка

BUG #19086: pg_dump --data-only selects and do not uses index definitions for the dumped tables.

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19086
Logged by:          Andrew Bille
Email address:      andrewbille@gmail.com
PostgreSQL version: 18.0
Operating system:   Debian 12
Description:

In case of system indexes corruption the collecting of index definitions can
take a really long time.

Synthetic example:

DO $$
DECLARE
    i INTEGER;
    j INTEGER;
BEGIN
    FOR i IN 1..15000 LOOP
        EXECUTE 'CREATE TABLE tab'  i  ' as SELECT 1 as f';
        FOR j IN 1..5 LOOP
            EXECUTE 'CREATE index idx_tab'  i  '_'  j   ' ON tab'  i  '(f)';
        END LOOP;
    END LOOP;
END;
$$;

If
ignore_system_indexes = on

time pg_dump --data-only test > test.sql

real    62m44,582s
user    0m0,576s
sys     0m0,259s

of which
LOG:  duration: 3474423.683 ms  statement: SELECT t.tableoid, t.oid,
i.indrelid, t.relname AS indexname, t.relpages, t.reltuples,
t.relallvisible, 0 AS relallfrozen, pg_catalog.pg_get_indexdef(i.indexrelid)
AS indexdef, i.indkey, i.indisclustered, c.contype, c.conname,
c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid,
pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, CASE WHEN
i.indexprs IS NOT NULL THEN (SELECT pg_catalog.array_agg(attname ORDER BY
attnum)  FROM pg_catalog.pg_attribute   WHERE attrelid = i.indexrelid) ELSE
NULL END AS indattnames, (SELECT spcname FROM pg_catalog.pg_tablespace s
WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions,
i.indisreplident, inh.inhparent AS parentidx, i.indnkeyatts AS indnkeyatts,
i.indnatts AS indnatts, (SELECT pg_catalog.array_agg(attnum ORDER BY attnum)
FROM pg_catalog.pg_attribute   WHERE attrelid = i.indexrelid AND
attstattarget >= 0) AS indstatcols, (SELECT
pg_catalog.array_agg(attstattarget ORDER BY attnum)   FROM
pg_catalog.pg_attribute   WHERE attrelid = i.indexrelid AND
attstattarget >= 0) AS indstatvals, i.indnullsnotdistinct, NULL AS conperiod
FROM

unnest('{16385,16393,16401,16409,16417,16425,16433,16441,16449,16457,16465,16473,16481,16489,16497,16505,16513,16521,16529,16537,16545,16553,16561,16569,16577,16585,16593,16601,16609,16617,16625,16633,16641,16649,16657,16665,16673,16681,16689,16697,16705,16713,16721,16729,16737,16745,16753,16761,16769,16777,16785,16793,16801,16809,16817,16825,16833,16841,16849,16857,16865,16873,16881,16889,16897,16905,16913,16921,16929,16937,16945,16953,16961,16969,16977,16985,16993,17001,17009,17017,17025,17033,17041,17049,17057,17065,17073,17081,17089,17097,17105,17113,17121,17129,17137,17145,17153,17161,17169,17177,17185,17193,17201,17209,17217
... 36361,136369,136377}'::pg_catalog.oid[]) AS src(tbloid)
        JOIN pg_catalog.pg_index i ON (src.tbloid = i.indrelid) JOIN
pg_catalog.pg_class t ON (t.oid = i.indexrelid) JOIN pg_catalog.pg_class t2
ON (t2.oid = i.indrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid
= c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x'))
LEFT JOIN pg_catalog.pg_inherits inh ON (inh.inhrelid = indexrelid) WHERE
(i.indisvalid OR t2.relkind = 'p') AND i.indisready ORDER BY i.indrelid,
indexname

With a simple patch (passes tests)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index a1976fae607..471e62da735 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -95,7 +95,7 @@ static IndxInfo *findIndexByOid(Oid oid);
  *       Collect information about all potentially dumpable objects
  */
 TableInfo *
-getSchemaData(Archive *fout, int *numTablesPtr)
+getSchemaData(Archive *fout, int *numTablesPtr, bool dataOnly)
 {
        TableInfo  *tblinfo;
        ExtensionInfo *extinfo;
@@ -211,11 +211,14 @@ getSchemaData(Archive *fout, int *numTablesPtr)
        pg_log_info("reading partitioning data");
        getPartitioningInfo(fout);

-       pg_log_info("reading indexes");
-       getIndexes(fout, tblinfo, numTables);
+       if (!dataOnly)
+       {
+               pg_log_info("reading indexes");
+               getIndexes(fout, tblinfo, numTables);

-       pg_log_info("flagging indexes in partitioned tables");
-       flagInhIndexes(fout, tblinfo, numTables);
+               pg_log_info("flagging indexes in partitioned tables");
+               flagInhIndexes(fout, tblinfo, numTables);
+       }

        pg_log_info("reading extended statistics");
        getExtendedStatistics(fout);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 641bece12c7..ef8bd786371 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1090,7 +1090,7 @@ main(int argc, char **argv)
         * Now scan the database and create DumpableObject structs for all
the
         * objects we intend to dump.
         */
-       tblinfo = getSchemaData(fout, &numTables);
+       tblinfo = getSchemaData(fout, &numTables, data_only);

        if (dopt.dumpData)
        {
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index fa6d1a510f7..83e097404a3 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -750,7 +750,7 @@ typedef struct _SubRelInfo
  *     common utility functions
  */

-extern TableInfo *getSchemaData(Archive *fout, int *numTablesPtr);
+extern TableInfo *getSchemaData(Archive *fout, int *numTablesPtr, bool
dataOnly);

 extern void AssignDumpId(DumpableObject *dobj);
 extern void recordAdditionalCatalogID(CatalogId catId, DumpableObject
*dobj);


we have:

time pg_dump --data-only test > test.sql

real    7m45,533s
user    0m0,726s
sys     0m0,634s

In the real case of system indexes corruption ... dump can take enourmous
amount of time.


On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote:
> In case of system indexes corruption the collecting of index definitions can
> take a really long time.

I don't think this qualifies as a bug, but avoiding pg_dump queries when
possible seems like a good idea.  Is this a hypothetical problem or
something you've experienced?

> -       pg_log_info("reading indexes");
> -       getIndexes(fout, tblinfo, numTables);
> +       if (!dataOnly)
> +       {
> +               pg_log_info("reading indexes");
> +               getIndexes(fout, tblinfo, numTables);
> 
> -       pg_log_info("flagging indexes in partitioned tables");
> -       flagInhIndexes(fout, tblinfo, numTables);
> +               pg_log_info("flagging indexes in partitioned tables");
> +               flagInhIndexes(fout, tblinfo, numTables);
> +       }

I think we ordinarily leave it up to the get*() functions to return early.
For example, getPartitioningInfo() has this near the top:

    /* needn't bother if not dumping data */
    if (!fout->dopt->dumpData)
        return;

Also, we need to be certain that nothing getIndexes() gathers is ever used
for --data-only dumps.  That seems plausible, but I haven't looked closely.

-- 
nathan



Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote:
>> In case of system indexes corruption the collecting of index definitions can
>> take a really long time.

> I don't think this qualifies as a bug, but avoiding pg_dump queries when
> possible seems like a good idea.  Is this a hypothetical problem or
> something you've experienced?

Even if it's been seen in the field, it hardly qualifies as a
justification for complicating pg_dump's behavior.  There's no reason
to think that catalog corruption preferentially affects indexes, or
does so in this particular way.  Should we also not collect data on
functions, types, etc etc?

> Also, we need to be certain that nothing getIndexes() gathers is ever used
> for --data-only dumps.  That seems plausible, but I haven't looked closely.

Yeah, that's the main thing I'd worry about here.  For example,
ignoring some objects would likely lead to different conclusions about
the database objects' dependency web, possibly leading to changes in
dump order or other effects.  Maybe there's no problem in practice,
but this unsupported bug report doesn't really motivate me to do the
analysis to see if it'd be all right.

            regards, tom lane



On Tue, Oct 14, 2025 at 03:50:15PM -0400, Tom Lane wrote:
> Even if it's been seen in the field, it hardly qualifies as a
> justification for complicating pg_dump's behavior.  There's no reason
> to think that catalog corruption preferentially affects indexes, or
> does so in this particular way.  Should we also not collect data on
> functions, types, etc etc?

FWIW the getIndexes() query does tend to be one of the slowest, even with
intact system indexes.  I've no concrete proposals, but there might be some
room for improvement.  I don't think we gain all that much by simply
avoiding the query in probably-somewhat-rare use-cases.  IMHO it ought to
be reworked for efficiency.

-- 
nathan



On Wed, 15 Oct 2025 at 11:03, Nathan Bossart <nathandbossart@gmail.com> wrote:
> FWIW the getIndexes() query does tend to be one of the slowest, even with
> intact system indexes.  I've no concrete proposals, but there might be some
> room for improvement.  I don't think we gain all that much by simply
> avoiding the query in probably-somewhat-rare use-cases.  IMHO it ought to
> be reworked for efficiency.

The extra slowness comes from all the subqueries in the targetlist, 3
of which are going to pg_attribute using the same join condition. That
results in 3 separate scans of pg_attribute, 2 more than needed.

The query could be made more efficient generally by doing a left join
to pg_attribute instead and then GROUP BY i.indexrelid.

I tried rewriting the query so that pg_attribute is joined to rather
than subqueries. With 1500 tables I get:

master:

ignore_system_indexes = on
Execution Time: 6853.262 ms

ignore_system_indexes = off
Execution Time: 66.781 ms

Rewritten query:

ignore_system_indexes = on
Execution Time: 53.351 ms

ignore_system_indexes = off
Execution Time: 56.965 ms

David

Вложения
> Is this a hypothetical problem or something you've experienced?

The problem is real: when trying to rescue data from a large corrupted database, we couldn't wait for the index definition query for more than 3 hours.


On Wed, Oct 15, 2025 at 2:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote:
> In case of system indexes corruption the collecting of index definitions can
> take a really long time.

I don't think this qualifies as a bug, but avoiding pg_dump queries when
possible seems like a good idea.  Is this a hypothetical problem or
something you've experienced?

> -       pg_log_info("reading indexes");
> -       getIndexes(fout, tblinfo, numTables);
> +       if (!dataOnly)
> +       {
> +               pg_log_info("reading indexes");
> +               getIndexes(fout, tblinfo, numTables);
>
> -       pg_log_info("flagging indexes in partitioned tables");
> -       flagInhIndexes(fout, tblinfo, numTables);
> +               pg_log_info("flagging indexes in partitioned tables");
> +               flagInhIndexes(fout, tblinfo, numTables);
> +       }

I think we ordinarily leave it up to the get*() functions to return early.
For example, getPartitioningInfo() has this near the top:

        /* needn't bother if not dumping data */
        if (!fout->dopt->dumpData)
                return;

Also, we need to be certain that nothing getIndexes() gathers is ever used
for --data-only dumps.  That seems plausible, but I haven't looked closely.

--
nathan
On 2025-Oct-15, Andrew Bille wrote:

> > Is this a hypothetical problem or something you've experienced?
> 
> The problem is real: when trying to rescue data from a large corrupted
> database, we couldn't wait for the index definition query for more than 3
> hours.

Maybe it'd be reasonable to start the recovery by doing VACUUM FULL of
all/some system catalogs.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Thank you, that case has already been resolved, and the data has been recovered.
System reindex and vacuum full didn't help.
The schema and indexes were not needed, a modified pg_dump helped us.


On Wed, Oct 15, 2025 at 3:56 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Oct-15, Andrew Bille wrote:

> > Is this a hypothetical problem or something you've experienced?
>
> The problem is real: when trying to rescue data from a large corrupted
> database, we couldn't wait for the index definition query for more than 3
> hours.

Maybe it'd be reasonable to start the recovery by doing VACUUM FULL of
all/some system catalogs.

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
On Wed, Oct 15, 2025 at 12:56:25PM +1300, David Rowley wrote:
> I tried rewriting the query so that pg_attribute is joined to rather
> than subqueries. With 1500 tables I get:
> 
> master:
> 
> ignore_system_indexes = on
> Execution Time: 6853.262 ms
> 
> ignore_system_indexes = off
> Execution Time: 66.781 ms
> 
> Rewritten query:
> 
> ignore_system_indexes = on
> Execution Time: 53.351 ms
> 
> ignore_system_indexes = off
> Execution Time: 56.965 ms

I tried this with 100K tables and saw the following (ignore_system_indexes
= off):

master:
 Planning Time: 1.672 ms
 Execution Time: 4077.008 ms

rewritten:
 Planning Time: 1.641 ms
 Execution Time: 3427.206 ms

I tried this with ignore_system_indexes = on, too, but it was very slow.

-- 
nathan



On Thu, 16 Oct 2025 at 05:24, Nathan Bossart <nathandbossart@gmail.com> wrote:
> I tried this with ignore_system_indexes = on, too, but it was very slow.

Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on
a cold cat cache. Getting rid of those the rewritten version runs in
1.8 seconds with 100k tables for me.

That invalidates my timings from yesterday as I must have run the
rewritten query once the catcache was populated. Here are updated
results.

ignore_system_indexes = on 1.5k tables:
master: Execution Time: 26167.741 ms
rewritten: Execution Time: 16661.774 ms

ignore_system_indexes = off 1.5k tables:
master: Execution Time: 105.415 ms
rewritten: Execution Time: 97.030 ms

So not as good.

David



David Rowley <dgrowleyml@gmail.com> writes:
> Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on
> a cold cat cache. Getting rid of those the rewritten version runs in
> 1.8 seconds with 100k tables for me.

I wonder how much win could be had by postponing those function calls
so that they only act on indexes we're going to dump.  It might be
a net loss in the default dump-everything case, though.

Also, it looks to me like getIndexes does not even look at the result
of pg_get_constraintdef unless contype == 'x'.  So there should be
some low-hanging fruit with

-                         "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
+                         "CASE WHEN c.contype = 'x' THEN "
+                         "pg_catalog.pg_get_constraintdef(c.oid, false) "
+                         "END AS condef, "

This wouldn't matter except with primary/unique constraints, but
surely there are plenty of those in a typical DB.

            regards, tom lane



On Thu, 16 Oct 2025 at 10:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on
> > a cold cat cache. Getting rid of those the rewritten version runs in
> > 1.8 seconds with 100k tables for me.
>
> I wonder how much win could be had by postponing those function calls
> so that they only act on indexes we're going to dump.  It might be
> a net loss in the default dump-everything case, though.

Just to make sure I understand correctly, that means run a query in
dumpIndex() specifically just for the index being dumped to call
pg_get_indexdef()?

It would mean firing off quite a large number of queries to the
server, which might be especially bad when pg_dump is being run
remotely. I suppose ideally we'd have some matrix to indicate
everything we're going to need based on the given options and just
fetch those things. That'd be a pretty big overhaul.

> Also, it looks to me like getIndexes does not even look at the result
> of pg_get_constraintdef unless contype == 'x'.  So there should be
> some low-hanging fruit with
>
> -                         "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
> +                         "CASE WHEN c.contype = 'x' THEN "
> +                         "pg_catalog.pg_get_constraintdef(c.oid, false) "
> +                         "END AS condef, "
>
> This wouldn't matter except with primary/unique constraints, but
> surely there are plenty of those in a typical DB.

I expect that would help quite a bit. We do have NOT NULL constraints
in that table now, so I expect it might be bigger than pg_index in
most cases for recent versions, so the full table scan in
pg_get_constraintdef_worker() with ignore_system_indexes = on could be
more painful than the same thing in pg_get_indexdef_worker().

David



David



David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 16 Oct 2025 at 10:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder how much win could be had by postponing those function calls
>> so that they only act on indexes we're going to dump.  It might be
>> a net loss in the default dump-everything case, though.

> Just to make sure I understand correctly, that means run a query in
> dumpIndex() specifically just for the index being dumped to call
> pg_get_indexdef()?

We could do it like that, but it's probably better to keep the
single-query approach.  A way that could work is

(1) Drop pg_get_indexdef and pg_get_constraintdef from getIndexes'
initial query.

(2) As we scan the results of that query, build new OID-list
strings listing just the index OIDs and constraint OIDs that
we require definition strings for.

(3) Still within getIndexes, run those queries and insert the
results into the arrays built in step 1.

On its face, this will be slower than doing it all in one query,
at least in the normal case where we need most or all indexes.
But what I'm hoping is that we need at most one of the two
function calls for any one index, depending on whether it's a
constraint or not.  That could buy back enough to justify the
extra overhead, perhaps.

(Hmm ... but on the third hand, if we only need one of the
two strings, couldn't we mechanize that by wrapping the
pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x'
?)

            regards, tom lane



On Thu, 16 Oct 2025 at 12:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (Hmm ... but on the third hand, if we only need one of the
> two strings, couldn't we mechanize that by wrapping the
> pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x'
> ?)

Unless I'm mistaken, it looks like the "indexdef" field is used only
when there's no corresponding constraint with contype 'p,', 'u' or
'x'. Wouldn't it be more like:

CASE WHEN c.conrelid IS NULL THEN
pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef

which saves calling the function a bit more often than what you said... ?

The code I'm looking at is:

/* Plain secondary index */
indxinfo[j].indexconstraint = 0;

and "if (!is_constraint)" in dumpIndex().

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 16 Oct 2025 at 12:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (Hmm ... but on the third hand, if we only need one of the
>> two strings, couldn't we mechanize that by wrapping the
>> pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x'
>> ?)

> Unless I'm mistaken, it looks like the "indexdef" field is used only
> when there's no corresponding constraint with contype 'p,', 'u' or
> 'x'.

Ah.  I hadn't researched that, but it makes sense.

> Wouldn't it be more like:

> CASE WHEN c.conrelid IS NULL THEN
> pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef

I'd leave out the ELSE so that you get a null if the function
isn't run, but yeah.  (The places saving these query results would
need PQgetisnull tests, too.)

            regards, tom lane



On Thu, 16 Oct 2025 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Wouldn't it be more like:
>
> > CASE WHEN c.conrelid IS NULL THEN
> > pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef
>
> I'd leave out the ELSE so that you get a null if the function
> isn't run, but yeah.  (The places saving these query results would
> need PQgetisnull tests, too.)

Just to put that to the test, I tried the attached.

select 'create table t'||t||'(a int primary key, b int not null);
create index on t'||t||'(b)' from generate_Series(1,10000)t;
\gexec

master
$ PGOPTIONS='-c ignore_system_indexes=1' time pg_dump --schema-only
postgres >> /dev/null
2:23.75elapsed

$ PGOPTIONS='-c ignore_system_indexes=0' time pg_dump --schema-only
postgres >> /dev/null
0:01.08 elapsed

patched:
$ PGOPTIONS='-c ignore_system_indexes=1' time pg_dump --schema-only
postgres >> /dev/null
0:40.28elapsed

$ PGOPTIONS='-c ignore_system_indexes=0' time pg_dump --schema-only
postgres >> /dev/null
0:00.78elapsed

i.e about 3.5x faster with ignore_system_indexes and 38% faster without.

David

Вложения
David Rowley <dgrowleyml@gmail.com> writes:
> Just to put that to the test, I tried the attached.

I'm confused by all the extraneous changes in this?

> i.e about 3.5x faster with ignore_system_indexes and 38% faster without.

Very promising, but I'm still confused.

            regards, tom lane



On Thu, 16 Oct 2025 at 14:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Just to put that to the test, I tried the attached.
>
> I'm confused by all the extraneous changes in this?

It includes the query rewrite too in order to get rid of the
subqueries in the targetlist to pg_attribute. There are 3 of those in
total. When ignore_system_indexes is on, that means a 3x Seq Scans to
pg_attribute per returned row. The rewrite gets rid of that and turns
that into a single join to pg_attribute which allows the planner to
hash or merge join to it.  We could just do the conditional calling of
the pg_get_*def() functions, but performance would still be terrible
for ignore_system_indexes=on due to the Seq Scans, and slightly worse
overall.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 16 Oct 2025 at 14:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm confused by all the extraneous changes in this?

> It includes the query rewrite too in order to get rid of the
> subqueries in the targetlist to pg_attribute.

Ah, I've not reviewed that bit.

> We could just do the conditional calling of
> the pg_get_*def() functions, but performance would still be terrible
> for ignore_system_indexes=on due to the Seq Scans, and slightly worse
> overall.

Got it.  Definitely looks promising, but I'm too tired to review
the whole change.

            regards, tom lane