Обсуждение: BUG #19086: pg_dump --data-only selects and do not uses index definitions for the dumped tables.
BUG #19086: pg_dump --data-only selects and do not uses index definitions for the dumped tables.
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 19086 Logged by: Andrew Bille Email address: andrewbille@gmail.com PostgreSQL version: 18.0 Operating system: Debian 12 Description: In case of system indexes corruption the collecting of index definitions can take a really long time. Synthetic example: DO $$ DECLARE i INTEGER; j INTEGER; BEGIN FOR i IN 1..15000 LOOP EXECUTE 'CREATE TABLE tab' i ' as SELECT 1 as f'; FOR j IN 1..5 LOOP EXECUTE 'CREATE index idx_tab' i '_' j ' ON tab' i '(f)'; END LOOP; END LOOP; END; $$; If ignore_system_indexes = on time pg_dump --data-only test > test.sql real 62m44,582s user 0m0,576s sys 0m0,259s of which LOG: duration: 3474423.683 ms statement: SELECT t.tableoid, t.oid, i.indrelid, t.relname AS indexname, t.relpages, t.reltuples, t.relallvisible, 0 AS relallfrozen, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, i.indkey, i.indisclustered, c.contype, c.conname, c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, CASE WHEN i.indexprs IS NOT NULL THEN (SELECT pg_catalog.array_agg(attname ORDER BY attnum) FROM pg_catalog.pg_attribute WHERE attrelid = i.indexrelid) ELSE NULL END AS indattnames, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions, i.indisreplident, inh.inhparent AS parentidx, i.indnkeyatts AS indnkeyatts, i.indnatts AS indnatts, (SELECT pg_catalog.array_agg(attnum ORDER BY attnum) FROM pg_catalog.pg_attribute WHERE attrelid = i.indexrelid AND attstattarget >= 0) AS indstatcols, (SELECT pg_catalog.array_agg(attstattarget ORDER BY attnum) FROM pg_catalog.pg_attribute WHERE attrelid = i.indexrelid AND attstattarget >= 0) AS indstatvals, i.indnullsnotdistinct, NULL AS conperiod FROM unnest('{16385,16393,16401,16409,16417,16425,16433,16441,16449,16457,16465,16473,16481,16489,16497,16505,16513,16521,16529,16537,16545,16553,16561,16569,16577,16585,16593,16601,16609,16617,16625,16633,16641,16649,16657,16665,16673,16681,16689,16697,16705,16713,16721,16729,16737,16745,16753,16761,16769,16777,16785,16793,16801,16809,16817,16825,16833,16841,16849,16857,16865,16873,16881,16889,16897,16905,16913,16921,16929,16937,16945,16953,16961,16969,16977,16985,16993,17001,17009,17017,17025,17033,17041,17049,17057,17065,17073,17081,17089,17097,17105,17113,17121,17129,17137,17145,17153,17161,17169,17177,17185,17193,17201,17209,17217 ... 36361,136369,136377}'::pg_catalog.oid[]) AS src(tbloid) JOIN pg_catalog.pg_index i ON (src.tbloid = i.indrelid) JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) JOIN pg_catalog.pg_class t2 ON (t2.oid = i.indrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) LEFT JOIN pg_catalog.pg_inherits inh ON (inh.inhrelid = indexrelid) WHERE (i.indisvalid OR t2.relkind = 'p') AND i.indisready ORDER BY i.indrelid, indexname With a simple patch (passes tests) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index a1976fae607..471e62da735 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -95,7 +95,7 @@ static IndxInfo *findIndexByOid(Oid oid); * Collect information about all potentially dumpable objects */ TableInfo * -getSchemaData(Archive *fout, int *numTablesPtr) +getSchemaData(Archive *fout, int *numTablesPtr, bool dataOnly) { TableInfo *tblinfo; ExtensionInfo *extinfo; @@ -211,11 +211,14 @@ getSchemaData(Archive *fout, int *numTablesPtr) pg_log_info("reading partitioning data"); getPartitioningInfo(fout); - pg_log_info("reading indexes"); - getIndexes(fout, tblinfo, numTables); + if (!dataOnly) + { + pg_log_info("reading indexes"); + getIndexes(fout, tblinfo, numTables); - pg_log_info("flagging indexes in partitioned tables"); - flagInhIndexes(fout, tblinfo, numTables); + pg_log_info("flagging indexes in partitioned tables"); + flagInhIndexes(fout, tblinfo, numTables); + } pg_log_info("reading extended statistics"); getExtendedStatistics(fout); diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 641bece12c7..ef8bd786371 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1090,7 +1090,7 @@ main(int argc, char **argv) * Now scan the database and create DumpableObject structs for all the * objects we intend to dump. */ - tblinfo = getSchemaData(fout, &numTables); + tblinfo = getSchemaData(fout, &numTables, data_only); if (dopt.dumpData) { diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index fa6d1a510f7..83e097404a3 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -750,7 +750,7 @@ typedef struct _SubRelInfo * common utility functions */ -extern TableInfo *getSchemaData(Archive *fout, int *numTablesPtr); +extern TableInfo *getSchemaData(Archive *fout, int *numTablesPtr, bool dataOnly); extern void AssignDumpId(DumpableObject *dobj); extern void recordAdditionalCatalogID(CatalogId catId, DumpableObject *dobj); we have: time pg_dump --data-only test > test.sql real 7m45,533s user 0m0,726s sys 0m0,634s In the real case of system indexes corruption ... dump can take enourmous amount of time.
On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote: > In case of system indexes corruption the collecting of index definitions can > take a really long time. I don't think this qualifies as a bug, but avoiding pg_dump queries when possible seems like a good idea. Is this a hypothetical problem or something you've experienced? > - pg_log_info("reading indexes"); > - getIndexes(fout, tblinfo, numTables); > + if (!dataOnly) > + { > + pg_log_info("reading indexes"); > + getIndexes(fout, tblinfo, numTables); > > - pg_log_info("flagging indexes in partitioned tables"); > - flagInhIndexes(fout, tblinfo, numTables); > + pg_log_info("flagging indexes in partitioned tables"); > + flagInhIndexes(fout, tblinfo, numTables); > + } I think we ordinarily leave it up to the get*() functions to return early. For example, getPartitioningInfo() has this near the top: /* needn't bother if not dumping data */ if (!fout->dopt->dumpData) return; Also, we need to be certain that nothing getIndexes() gathers is ever used for --data-only dumps. That seems plausible, but I haven't looked closely. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote: >> In case of system indexes corruption the collecting of index definitions can >> take a really long time. > I don't think this qualifies as a bug, but avoiding pg_dump queries when > possible seems like a good idea. Is this a hypothetical problem or > something you've experienced? Even if it's been seen in the field, it hardly qualifies as a justification for complicating pg_dump's behavior. There's no reason to think that catalog corruption preferentially affects indexes, or does so in this particular way. Should we also not collect data on functions, types, etc etc? > Also, we need to be certain that nothing getIndexes() gathers is ever used > for --data-only dumps. That seems plausible, but I haven't looked closely. Yeah, that's the main thing I'd worry about here. For example, ignoring some objects would likely lead to different conclusions about the database objects' dependency web, possibly leading to changes in dump order or other effects. Maybe there's no problem in practice, but this unsupported bug report doesn't really motivate me to do the analysis to see if it'd be all right. regards, tom lane
On Tue, Oct 14, 2025 at 03:50:15PM -0400, Tom Lane wrote: > Even if it's been seen in the field, it hardly qualifies as a > justification for complicating pg_dump's behavior. There's no reason > to think that catalog corruption preferentially affects indexes, or > does so in this particular way. Should we also not collect data on > functions, types, etc etc? FWIW the getIndexes() query does tend to be one of the slowest, even with intact system indexes. I've no concrete proposals, but there might be some room for improvement. I don't think we gain all that much by simply avoiding the query in probably-somewhat-rare use-cases. IMHO it ought to be reworked for efficiency. -- nathan
On Wed, 15 Oct 2025 at 11:03, Nathan Bossart <nathandbossart@gmail.com> wrote: > FWIW the getIndexes() query does tend to be one of the slowest, even with > intact system indexes. I've no concrete proposals, but there might be some > room for improvement. I don't think we gain all that much by simply > avoiding the query in probably-somewhat-rare use-cases. IMHO it ought to > be reworked for efficiency. The extra slowness comes from all the subqueries in the targetlist, 3 of which are going to pg_attribute using the same join condition. That results in 3 separate scans of pg_attribute, 2 more than needed. The query could be made more efficient generally by doing a left join to pg_attribute instead and then GROUP BY i.indexrelid. I tried rewriting the query so that pg_attribute is joined to rather than subqueries. With 1500 tables I get: master: ignore_system_indexes = on Execution Time: 6853.262 ms ignore_system_indexes = off Execution Time: 66.781 ms Rewritten query: ignore_system_indexes = on Execution Time: 53.351 ms ignore_system_indexes = off Execution Time: 56.965 ms David
Вложения
> Is this a hypothetical problem or something you've experienced?
The problem is real: when trying to rescue data from a large corrupted database, we couldn't wait for the index definition query for more than 3 hours.
On Wed, Oct 15, 2025 at 2:36 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Oct 14, 2025 at 08:13:52AM +0000, PG Bug reporting form wrote:
> In case of system indexes corruption the collecting of index definitions can
> take a really long time.
I don't think this qualifies as a bug, but avoiding pg_dump queries when
possible seems like a good idea. Is this a hypothetical problem or
something you've experienced?
> - pg_log_info("reading indexes");
> - getIndexes(fout, tblinfo, numTables);
> + if (!dataOnly)
> + {
> + pg_log_info("reading indexes");
> + getIndexes(fout, tblinfo, numTables);
>
> - pg_log_info("flagging indexes in partitioned tables");
> - flagInhIndexes(fout, tblinfo, numTables);
> + pg_log_info("flagging indexes in partitioned tables");
> + flagInhIndexes(fout, tblinfo, numTables);
> + }
I think we ordinarily leave it up to the get*() functions to return early.
For example, getPartitioningInfo() has this near the top:
/* needn't bother if not dumping data */
if (!fout->dopt->dumpData)
return;
Also, we need to be certain that nothing getIndexes() gathers is ever used
for --data-only dumps. That seems plausible, but I haven't looked closely.
--
nathan
On 2025-Oct-15, Andrew Bille wrote: > > Is this a hypothetical problem or something you've experienced? > > The problem is real: when trying to rescue data from a large corrupted > database, we couldn't wait for the index definition query for more than 3 > hours. Maybe it'd be reasonable to start the recovery by doing VACUUM FULL of all/some system catalogs. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Thank you, that case has already been resolved, and the data has been recovered.
System reindex and vacuum full didn't help.
The schema and indexes were not needed, a modified pg_dump helped us.
On Wed, Oct 15, 2025 at 3:56 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Oct-15, Andrew Bille wrote:
> > Is this a hypothetical problem or something you've experienced?
>
> The problem is real: when trying to rescue data from a large corrupted
> database, we couldn't wait for the index definition query for more than 3
> hours.
Maybe it'd be reasonable to start the recovery by doing VACUUM FULL of
all/some system catalogs.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed, Oct 15, 2025 at 12:56:25PM +1300, David Rowley wrote: > I tried rewriting the query so that pg_attribute is joined to rather > than subqueries. With 1500 tables I get: > > master: > > ignore_system_indexes = on > Execution Time: 6853.262 ms > > ignore_system_indexes = off > Execution Time: 66.781 ms > > Rewritten query: > > ignore_system_indexes = on > Execution Time: 53.351 ms > > ignore_system_indexes = off > Execution Time: 56.965 ms I tried this with 100K tables and saw the following (ignore_system_indexes = off): master: Planning Time: 1.672 ms Execution Time: 4077.008 ms rewritten: Planning Time: 1.641 ms Execution Time: 3427.206 ms I tried this with ignore_system_indexes = on, too, but it was very slow. -- nathan
On Thu, 16 Oct 2025 at 05:24, Nathan Bossart <nathandbossart@gmail.com> wrote: > I tried this with ignore_system_indexes = on, too, but it was very slow. Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on a cold cat cache. Getting rid of those the rewritten version runs in 1.8 seconds with 100k tables for me. That invalidates my timings from yesterday as I must have run the rewritten query once the catcache was populated. Here are updated results. ignore_system_indexes = on 1.5k tables: master: Execution Time: 26167.741 ms rewritten: Execution Time: 16661.774 ms ignore_system_indexes = off 1.5k tables: master: Execution Time: 105.415 ms rewritten: Execution Time: 97.030 ms So not as good. David
David Rowley <dgrowleyml@gmail.com> writes: > Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on > a cold cat cache. Getting rid of those the rewritten version runs in > 1.8 seconds with 100k tables for me. I wonder how much win could be had by postponing those function calls so that they only act on indexes we're going to dump. It might be a net loss in the default dump-everything case, though. Also, it looks to me like getIndexes does not even look at the result of pg_get_constraintdef unless contype == 'x'. So there should be some low-hanging fruit with - "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, " + "CASE WHEN c.contype = 'x' THEN " + "pg_catalog.pg_get_constraintdef(c.oid, false) " + "END AS condef, " This wouldn't matter except with primary/unique constraints, but surely there are plenty of those in a typical DB. regards, tom lane
On Thu, 16 Oct 2025 at 10:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Seems to be due to pg_get_indexdef / pg_get_constraintdef operating on > > a cold cat cache. Getting rid of those the rewritten version runs in > > 1.8 seconds with 100k tables for me. > > I wonder how much win could be had by postponing those function calls > so that they only act on indexes we're going to dump. It might be > a net loss in the default dump-everything case, though. Just to make sure I understand correctly, that means run a query in dumpIndex() specifically just for the index being dumped to call pg_get_indexdef()? It would mean firing off quite a large number of queries to the server, which might be especially bad when pg_dump is being run remotely. I suppose ideally we'd have some matrix to indicate everything we're going to need based on the given options and just fetch those things. That'd be a pretty big overhaul. > Also, it looks to me like getIndexes does not even look at the result > of pg_get_constraintdef unless contype == 'x'. So there should be > some low-hanging fruit with > > - "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, " > + "CASE WHEN c.contype = 'x' THEN " > + "pg_catalog.pg_get_constraintdef(c.oid, false) " > + "END AS condef, " > > This wouldn't matter except with primary/unique constraints, but > surely there are plenty of those in a typical DB. I expect that would help quite a bit. We do have NOT NULL constraints in that table now, so I expect it might be bigger than pg_index in most cases for recent versions, so the full table scan in pg_get_constraintdef_worker() with ignore_system_indexes = on could be more painful than the same thing in pg_get_indexdef_worker(). David David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 16 Oct 2025 at 10:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder how much win could be had by postponing those function calls >> so that they only act on indexes we're going to dump. It might be >> a net loss in the default dump-everything case, though. > Just to make sure I understand correctly, that means run a query in > dumpIndex() specifically just for the index being dumped to call > pg_get_indexdef()? We could do it like that, but it's probably better to keep the single-query approach. A way that could work is (1) Drop pg_get_indexdef and pg_get_constraintdef from getIndexes' initial query. (2) As we scan the results of that query, build new OID-list strings listing just the index OIDs and constraint OIDs that we require definition strings for. (3) Still within getIndexes, run those queries and insert the results into the arrays built in step 1. On its face, this will be slower than doing it all in one query, at least in the normal case where we need most or all indexes. But what I'm hoping is that we need at most one of the two function calls for any one index, depending on whether it's a constraint or not. That could buy back enough to justify the extra overhead, perhaps. (Hmm ... but on the third hand, if we only need one of the two strings, couldn't we mechanize that by wrapping the pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x' ?) regards, tom lane
On Thu, 16 Oct 2025 at 12:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (Hmm ... but on the third hand, if we only need one of the > two strings, couldn't we mechanize that by wrapping the > pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x' > ?) Unless I'm mistaken, it looks like the "indexdef" field is used only when there's no corresponding constraint with contype 'p,', 'u' or 'x'. Wouldn't it be more like: CASE WHEN c.conrelid IS NULL THEN pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef which saves calling the function a bit more often than what you said... ? The code I'm looking at is: /* Plain secondary index */ indxinfo[j].indexconstraint = 0; and "if (!is_constraint)" in dumpIndex(). David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 16 Oct 2025 at 12:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (Hmm ... but on the third hand, if we only need one of the >> two strings, couldn't we mechanize that by wrapping the >> pg_get_indexdef call in CASE WHEN c.contype IS DISTINCT FROM 'x' >> ?) > Unless I'm mistaken, it looks like the "indexdef" field is used only > when there's no corresponding constraint with contype 'p,', 'u' or > 'x'. Ah. I hadn't researched that, but it makes sense. > Wouldn't it be more like: > CASE WHEN c.conrelid IS NULL THEN > pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef I'd leave out the ELSE so that you get a null if the function isn't run, but yeah. (The places saving these query results would need PQgetisnull tests, too.) regards, tom lane
On Thu, 16 Oct 2025 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Wouldn't it be more like: > > > CASE WHEN c.conrelid IS NULL THEN > > pg_catalog.pg_get_indexdef(i.indexrelid) ELSE '' END AS indexdef > > I'd leave out the ELSE so that you get a null if the function > isn't run, but yeah. (The places saving these query results would > need PQgetisnull tests, too.) Just to put that to the test, I tried the attached. select 'create table t'||t||'(a int primary key, b int not null); create index on t'||t||'(b)' from generate_Series(1,10000)t; \gexec master $ PGOPTIONS='-c ignore_system_indexes=1' time pg_dump --schema-only postgres >> /dev/null 2:23.75elapsed $ PGOPTIONS='-c ignore_system_indexes=0' time pg_dump --schema-only postgres >> /dev/null 0:01.08 elapsed patched: $ PGOPTIONS='-c ignore_system_indexes=1' time pg_dump --schema-only postgres >> /dev/null 0:40.28elapsed $ PGOPTIONS='-c ignore_system_indexes=0' time pg_dump --schema-only postgres >> /dev/null 0:00.78elapsed i.e about 3.5x faster with ignore_system_indexes and 38% faster without. David
Вложения
David Rowley <dgrowleyml@gmail.com> writes: > Just to put that to the test, I tried the attached. I'm confused by all the extraneous changes in this? > i.e about 3.5x faster with ignore_system_indexes and 38% faster without. Very promising, but I'm still confused. regards, tom lane
On Thu, 16 Oct 2025 at 14:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > Just to put that to the test, I tried the attached. > > I'm confused by all the extraneous changes in this? It includes the query rewrite too in order to get rid of the subqueries in the targetlist to pg_attribute. There are 3 of those in total. When ignore_system_indexes is on, that means a 3x Seq Scans to pg_attribute per returned row. The rewrite gets rid of that and turns that into a single join to pg_attribute which allows the planner to hash or merge join to it. We could just do the conditional calling of the pg_get_*def() functions, but performance would still be terrible for ignore_system_indexes=on due to the Seq Scans, and slightly worse overall. David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 16 Oct 2025 at 14:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm confused by all the extraneous changes in this? > It includes the query rewrite too in order to get rid of the > subqueries in the targetlist to pg_attribute. Ah, I've not reviewed that bit. > We could just do the conditional calling of > the pg_get_*def() functions, but performance would still be terrible > for ignore_system_indexes=on due to the Seq Scans, and slightly worse > overall. Got it. Definitely looks promising, but I'm too tired to review the whole change. regards, tom lane