Обсуждение: improve performance of pg_dump with many sequences

Поиск

Список

Период

Сортировка

improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

03 мая 2024 г., 02:51:40

Similar to 'pg_dump --binary-upgrade' [0], we can speed up pg_dump with
many sequences by gathering the required information in a single query
instead of two queries per sequence.  The attached patches are
works-in-progress, but here are the results I see on my machine for
'pg_dump --schema-only --binary-upgrade' with a million sequences:

          HEAD : 6m22.809s
           [0] : 1m54.701s
[0] + attached : 0m38.233s

I'm not sure I have all the details correct in 0003, and we might want to
separate the table into two tables which are only populated when the
relevant section is dumped.  Furthermore, the query in 0003 is a bit goofy
because it needs to dance around a bug reported elsewhere [1].

[0] https://postgr.es/m/20240418041712.GA3441570%40nathanxps13
[1] https://postgr.es/m/20240501005730.GA594666%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Вложения

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

09 июля 2024 г., 19:11:51

rebased

-- 
nathan

Вложения

Re: improve performance of pg_dump with many sequences

От

"Euler Taveira"

Дата:

10 июля 2024 г., 20:08:56

On Tue, Jul 9, 2024, at 4:11 PM, Nathan Bossart wrote:

rebased

Nice improvement. The numbers for a realistic scenario (10k sequences) are

for i in `seq 1 10000`; do echo "CREATE SEQUENCE s$i;"; done > /tmp/s.sql

master:

real 0m1,141s

user 0m0,056s

sys 0m0,147s

patched:

real 0m0,410s

user 0m0,045s

sys 0m0,103s

You are changing internal representation from char to int64. Is the main goal to

validate catalog data? What if there is a new sequence data type whose

representation is not an integer?

This code path is adding zero byte to the last position of the fixed string. I

suggest that the zero byte is added to the position after the string length.

Assert(strlen(PQgetvalue(res, 0, 0)) < sizeof(seqtype));

strncpy(seqtype, PQgetvalue(res, 0, 0), sizeof(seqtype));

seqtype[sizeof(seqtype) - 1] = '\0';

Something like

l = strlen(PQgetvalue(res, 0, 0));

Assert(l < sizeof(seqtype));

strncpy(seqtype, PQgetvalue(res, 0, 0), l);

seqtype[l] = '\0';

Another suggestion is to use a constant for seqtype

char seqtype[MAX_SEQNAME_LEN];

and simplify the expression:

size_t seqtype_sz = sizeof(((SequenceItem *) 0)->seqtype);

If you are not planning to apply 0003, make sure you fix collectSequences() to

avoid versions less than 10. Move this part to 0002.

@@ -17233,11 +17235,24 @@ collectSequences(Archive *fout)

PGresult *res;

const char *query;

+ if (fout->remoteVersion < 100000)

+ return;

Since you apply a fix for pg_sequence_last_value function, you can simplify the

query in 0003. CASE is not required.

I repeated the same test but not applying 0003.

patched (0001 and 0002):

real 0m0,290s

user 0m0,038s

sys 0m0,104s

I'm not sure if 0003 is worth. Maybe if you have another table like you

suggested.

Euler Taveira

EDB https://www.enterprisedb.com/

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

10 июля 2024 г., 22:05:11

On Wed, Jul 10, 2024 at 05:08:56PM -0300, Euler Taveira wrote:
> Nice improvement. The numbers for a realistic scenario (10k sequences) are

Thanks for taking a look!

> You are changing internal representation from char to int64. Is the main goal to
> validate catalog data? What if there is a new sequence data type whose
> representation is not an integer?

IIRC 0001 was primarily intended to reduce the amount of memory needed for
the sorted table.  Regarding a new sequence data type, I'm assuming we'll
have much bigger fish to fry if we do that (e.g., pg_sequence uses int8 for
the values), and I'd hope that adjusting this code wouldn't be too
difficult, anyway.

> This code path is adding zero byte to the last position of the fixed string. I
> suggest that the zero byte is added to the position after the string length.

I'm not following why that would be a better approach.  strncpy() will add
a NUL to the end of the string unless it doesn't fit in the buffer, in
which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'".
Furthermore, the compiler can determine the position where the NUL should
be placed, whereas placing it at the end of the copied string requires a
runtime strlen().

> l = strlen(PQgetvalue(res, 0, 0));
> Assert(l < sizeof(seqtype));
> strncpy(seqtype, PQgetvalue(res, 0, 0), l);
> seqtype[l] = '\0';

I think the strncpy() should really be limited to the size of the seqtype
buffer.  IMHO an Assert is not sufficient.

> If you are not planning to apply 0003, make sure you fix collectSequences() to
> avoid versions less than 10. Move this part to 0002.

Yeah, no need to create the table if we aren't going to use it.

> Since you apply a fix for pg_sequence_last_value function, you can simplify the
> query in 0003. CASE is not required.

Unfortunately, I think we have to keep this workaround since older minor
releases of PostgreSQL don't have the fix.

> patched (0001 and 0002):
> real 0m0,290s
> user 0m0,038s
> sys 0m0,104s
> 
> I'm not sure if 0003 is worth. Maybe if you have another table like you
> suggested.

What pg_dump command did you test here?  Did you dump the sequence data, or
was this --schema-only?

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

"Euler Taveira"

Дата:

11 июля 2024 г., 02:52:33

On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote:

I'm not following why that would be a better approach. strncpy() will add
a NUL to the end of the string unless it doesn't fit in the buffer, in
which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'".
Furthermore, the compiler can determine the position where the NUL should
be placed, whereas placing it at the end of the copied string requires a
runtime strlen().

Nevermind, you are copying the whole buffer (n = sizeof(seqtype)).

Unfortunately, I think we have to keep this workaround since older minor
releases of PostgreSQL don't have the fix.

Hmm. Right.

What pg_dump command did you test here? Did you dump the sequence data, or
was this --schema-only?

time pg_dump -f - -s -d postgres

Euler Taveira

EDB https://www.enterprisedb.com/

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

12 июля 2024 г., 02:09:17

On Wed, Jul 10, 2024 at 11:52:33PM -0300, Euler Taveira wrote:
> On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote:
>> Unfortunately, I think we have to keep this workaround since older minor
>> releases of PostgreSQL don't have the fix.
> 
> Hmm. Right.

On second thought, maybe we should just limit this improvement to the minor
releases with the fix so that we _can_ get rid of the workaround.  Or we
could use the hacky workaround only for versions with the bug.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

16 июля 2024 г., 21:36:15

On Thu, Jul 11, 2024 at 09:09:17PM -0500, Nathan Bossart wrote:
> On second thought, maybe we should just limit this improvement to the minor
> releases with the fix so that we _can_ get rid of the workaround.  Or we
> could use the hacky workaround only for versions with the bug.

Here is a new version of the patch set.  The main differences are 1) we no
longer gather the sequence data for schema-only dumps and 2) 0003 uses a
simplified query for dumps on v18 and newer.  I considered also using a
slightly simplified query for dumps on versions with the
unlogged-sequences-on-standbys fix, but I felt that wasn't worth the extra
code.

Unfortunately, I've also discovered a problem with 0003.
pg_sequence_last_value() returns NULL when is_called is false, in which
case we assume last_value == seqstart, which is, sadly, bogus due to
commands like ALTER SEQUENCE [RE]START WITH.  AFAICT there isn't an easy
way around this.  We could either create a giant query that gathers the
information from all sequences in the database, or we could introduce a new
function in v18 that returns everything we need (which would only help for
upgrades _from_ v18).  Assuming I'm not missing a better option, I think
the latter is the better choice, and I still think it's worth doing even
though it probably won't help anyone for ~2.5 years.

-- 
nathan

Вложения

Re: improve performance of pg_dump with many sequences

От

Michael Paquier

Дата:

17 июля 2024 г., 02:30:04

On Tue, Jul 16, 2024 at 04:36:15PM -0500, Nathan Bossart wrote:
> Unfortunately, I've also discovered a problem with 0003.
> pg_sequence_last_value() returns NULL when is_called is false, in which
> case we assume last_value == seqstart, which is, sadly, bogus due to
> commands like ALTER SEQUENCE [RE]START WITH.  AFAICT there isn't an easy
> way around this.  We could either create a giant query that gathers the
> information from all sequences in the database, or we could introduce a new
> function in v18 that returns everything we need (which would only help for
> upgrades _from_ v18).  Assuming I'm not missing a better option, I think
> the latter is the better choice, and I still think it's worth doing even
> though it probably won't help anyone for ~2.5 years.

Yeah, I have bumped on the same issue.  In the long term, I also think
that we'd better have pg_sequence_last_value() return a row with
is_called and the value scanned.  As you say, it won't help except
when upgrading from versions of Postgres that are at least to v18,
assuming that this change gets in the tree, but that would be much
better in the long term and time flies fast.

See 0001 as of this area:
https://www.postgresql.org/message-id/ZnPIUPMmp5TzBPC2%40paquier.xyz
--
Michael

Вложения

signature.asc

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

17 июля 2024 г., 03:23:08

On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote:
> Yeah, I have bumped on the same issue.  In the long term, I also think
> that we'd better have pg_sequence_last_value() return a row with
> is_called and the value scanned.  As you say, it won't help except
> when upgrading from versions of Postgres that are at least to v18,
> assuming that this change gets in the tree, but that would be much
> better in the long term and time flies fast.

AFAICT pg_sequence_last_value() is basically an undocumented internal
function only really intended for use by the pg_sequences system view, so
changing the function like this for v18 might not be out of the question.
Otherwise, I think we'd have to create a strikingly similar function with
slightly different behavior, which would be a bizarre place to end up.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

17 июля 2024 г., 18:48:00

On Tue, Jul 16, 2024 at 10:23:08PM -0500, Nathan Bossart wrote:
> On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote:
>> Yeah, I have bumped on the same issue.  In the long term, I also think
>> that we'd better have pg_sequence_last_value() return a row with
>> is_called and the value scanned.  As you say, it won't help except
>> when upgrading from versions of Postgres that are at least to v18,
>> assuming that this change gets in the tree, but that would be much
>> better in the long term and time flies fast.
> 
> AFAICT pg_sequence_last_value() is basically an undocumented internal
> function only really intended for use by the pg_sequences system view, so
> changing the function like this for v18 might not be out of the question.
> Otherwise, I think we'd have to create a strikingly similar function with
> slightly different behavior, which would be a bizarre place to end up.

On second thought, I worry that this change might needlessly complicate the
pg_sequences system view.  Maybe we should just add a
pg_sequence_get_tuple() function that returns everything in
FormData_pg_sequence_data for a given sequence OID...

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

17 июля 2024 г., 18:59:26

Nathan Bossart <nathandbossart@gmail.com> writes:
> On second thought, I worry that this change might needlessly complicate the
> pg_sequences system view.  Maybe we should just add a
> pg_sequence_get_tuple() function that returns everything in
> FormData_pg_sequence_data for a given sequence OID...

Uh ... why do we need a function, rather than just

select * from pg_sequence

?

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

17 июля 2024 г., 19:05:03

On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> On second thought, I worry that this change might needlessly complicate the
>> pg_sequences system view.  Maybe we should just add a
>> pg_sequence_get_tuple() function that returns everything in
>> FormData_pg_sequence_data for a given sequence OID...
> 
> Uh ... why do we need a function, rather than just
> 
> select * from pg_sequence

We can use that for dumpSequence(), but dumpSequenceData() requires
information from the sequence tuple itself.  Right now, we query each
sequence relation individually for that data, and I'm trying to find a way
to cut down on those round trips.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

17 июля 2024 г., 19:11:47

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote:
>> Uh ... why do we need a function, rather than just
>> select * from pg_sequence

> We can use that for dumpSequence(), but dumpSequenceData() requires
> information from the sequence tuple itself.  Right now, we query each
> sequence relation individually for that data, and I'm trying to find a way
> to cut down on those round trips.

Ah, I confused FormData_pg_sequence_data with FormData_pg_sequence.
Sorry for the noise.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

18 июля 2024 г., 03:45:32

Here is an attempt at adding a new function that returns the sequence tuple
and using that to avoid querying each sequence relation individually in
dumpSequenceData().

If we instead wanted to change pg_sequence_last_value() to return both
is_called and last_value, I think we could modify the pg_sequences system
view to use a LATERAL subquery, i.e.,

    SELECT
        ...
        CASE
            WHEN L.is_called THEN L.last_value
            ELSE NULL
        END AS last_value
    FROM pg_sequence S
        ...
        JOIN LATERAL pg_sequence_last_value(S.seqrelid) L ON true
    ...

That doesn't seem so bad, and it'd avoid an extra pg_proc entry, but it
would probably break anything that calls pg_sequence_last_value() directly.
Thoughts?

-- 
nathan

Вложения

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

18 июля 2024 г., 03:58:21

Nathan Bossart <nathandbossart@gmail.com> writes:
> Here is an attempt at adding a new function that returns the sequence tuple
> and using that to avoid querying each sequence relation individually in
> dumpSequenceData().

Didn't read the patch yet, but ...

> If we instead wanted to change pg_sequence_last_value() to return both
> is_called and last_value, I think we could modify the pg_sequences system
> view to use a LATERAL subquery, i.e.,
> ...
> That doesn't seem so bad, and it'd avoid an extra pg_proc entry, but it
> would probably break anything that calls pg_sequence_last_value() directly.
> Thoughts?

... one more pg_proc entry is pretty cheap.  I think we should leave
pg_sequence_last_value alone.  We don't know if anyone is depending
on it, and de-optimizing the pg_sequences view doesn't seem like a
win either.

... okay, I lied, I looked at the patch.  Why are you testing

+    if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) == ACLCHECK_OK &&

?  This is a substitute for a SELECT from the sequence and it seems
like it ought to demand exactly the same privilege as SELECT.
(If you want to get more technical, USAGE allows nextval() which
gives strictly less information than what this exposes; that's why
we're here after all.)  So there is a difference in the privilege
levels, which is another reason for not combining this with
pg_sequence_last_value.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

18 июля 2024 г., 18:22:10

On Wed, Jul 17, 2024 at 11:58:21PM -0400, Tom Lane wrote:
> ... okay, I lied, I looked at the patch.  Why are you testing
> 
> +    if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) == ACLCHECK_OK &&
> 
> ?  This is a substitute for a SELECT from the sequence and it seems
> like it ought to demand exactly the same privilege as SELECT.
> (If you want to get more technical, USAGE allows nextval() which
> gives strictly less information than what this exposes; that's why
> we're here after all.)  So there is a difference in the privilege
> levels, which is another reason for not combining this with
> pg_sequence_last_value.

Oh, that's a good point.  I wrongly assumed the privilege checks would be
the same as pg_sequence_last_value().  I fixed this in v5.

I also polished the rest of the patches a bit.  Among other things, I
created an enum for the sequence data types to avoid the hacky strncpy()
stuff, which was causing weird CI failures [0].

[0] https://cirrus-ci.com/task/4614801962303488

-- 
nathan

Вложения

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

18 июля 2024 г., 20:29:19

I fixed a compiler warning on Windows in v6 of the patch set.  Sorry for
the noise.

-- 
nathan

Вложения

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

24 июля 2024 г., 18:36:34

I ran Euler's tests again on the v6 patch set.

    for i in `seq 1 10000`; do psql postgres -c "CREATE SEQUENCE s$i;"; done
    time pg_dump -f - -s -d postgres > /dev/null

    HEAD:        0.607s
    0001 + 0002: 0.094s
    all patches: 0.094s

Barring additional feedback, I am planning to commit these patches early
next week.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

31 июля 2024 г., 15:19:01

Committed.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

29 декабря 2025 г., 20:26:01

Nathan Bossart <nathandbossart@gmail.com> writes:
> Committed.

In the no-good-deed-goes-unpunished department: pg_dump's use
of pg_get_sequence_data() (nee pg_sequence_read_tuple()) is
evidently responsible for the complaint in bug #19365 [1]
that pg_dump can no longer survive concurrent sequence drops.

Given that that function already silently returns NULLs if the
sequence isn't readable for other reasons, I think it'd be
sane to make it silently return NULL if the sequence isn't
there anymore.  Unfortunately, that looks like it'd require
nontrivial restructuring of init_sequence().

Or maybe we could make it not use init_sequence()?  For the moment
a plain try_relation_open and check that it's a sequence should do,
but I'm not sure how that'd fit into people's plans for future
improvement of the sequence API.

There are other reasons not to like use of init_sequence in this
code path, too.  pg_dump's session will build a SeqTable entry for
every sequence in the database, which there could be a lot of,
and it will acquire RowExclusiveLock on every sequence and hold
that to the end of the dump, which seems likely to be troublesome
from a concurrency standpoint.  Since pg_get_sequence_data is a
read-only operation this lock level feels wrong.

BTW, I'm unconvinced that pg_dump behaves sanely when this function
does return nulls.  I think the ideal thing would be for it to skip
issuing setval(), but right now it looks like it will issue one with
garbage values.

            regards, tom lane

[1] https://www.postgresql.org/message-id/19365-6245240d8b926327%40postgresql.org

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

08 января, 02:06:22

I'm still looking into this, but here are some preliminary thoughts.

On Mon, Dec 29, 2025 at 12:26:01PM -0500, Tom Lane wrote:
> In the no-good-deed-goes-unpunished department: pg_dump's use
> of pg_get_sequence_data() (nee pg_sequence_read_tuple()) is
> evidently responsible for the complaint in bug #19365 [1]
> that pg_dump can no longer survive concurrent sequence drops.

This seems to be reproducible on older versions.  With a well-timed sleep
right before dumpSequenceData()'s pre-v18 query, I can produce a
relation-does-not-exist error with a concurrent sequence drop.  Perhaps v18
made this easier to reach, but given it moved the sequence tuple access to
collectSequences()'s query, I'm not sure why that would be.

> BTW, I'm unconvinced that pg_dump behaves sanely when this function
> does return nulls.  I think the ideal thing would be for it to skip
> issuing setval(), but right now it looks like it will issue one with
> garbage values.

Before v18, pg_dump just ERRORs due to insufficient privileges on a
sequence.  IMHO that makes sense.  If you ask pg_dump to dump something you
don't have privileges on, I'd expect it to error instead of silently
skipping it.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

08 января, 02:13:48

Nathan Bossart <nathandbossart@gmail.com> writes:
> Before v18, pg_dump just ERRORs due to insufficient privileges on a
> sequence.  IMHO that makes sense.  If you ask pg_dump to dump something you
> don't have privileges on, I'd expect it to error instead of silently
> skipping it.

That would be a fine argument were it not that collectSequences()
tries to vacuum up the data for every sequence in the DB, whether
the user has asked to dump them all or not.  In other places in
pg_dump, we avoid such problems by restricting which tables we
ask for data about ... but not here.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

08 января, 02:27:31

On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> Before v18, pg_dump just ERRORs due to insufficient privileges on a
>> sequence.  IMHO that makes sense.  If you ask pg_dump to dump something you
>> don't have privileges on, I'd expect it to error instead of silently
>> skipping it.
> 
> That would be a fine argument were it not that collectSequences()
> tries to vacuum up the data for every sequence in the DB, whether
> the user has asked to dump them all or not.  In other places in
> pg_dump, we avoid such problems by restricting which tables we
> ask for data about ... but not here.

I meant that we could teach pg_dump to error in dumpSequenceData() if it
sees nulls for the sequence in question.

-- 
nathan

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

08 января, 03:24:52

Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote:
>> That would be a fine argument were it not that collectSequences()
>> tries to vacuum up the data for every sequence in the DB, whether
>> the user has asked to dump them all or not.

> I meant that we could teach pg_dump to error in dumpSequenceData() if it
> sees nulls for the sequence in question.

Ah, gotcha; I thought you were talking about changing
pg_get_sequence_data() to throw an error instead of returning nulls.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

08 января, 20:36:38

On Wed, Jan 07, 2026 at 07:24:52PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote:
>>> That would be a fine argument were it not that collectSequences()
>>> tries to vacuum up the data for every sequence in the DB, whether
>>> the user has asked to dump them all or not.
> 
>> I meant that we could teach pg_dump to error in dumpSequenceData() if it
>> sees nulls for the sequence in question.
> 
> Ah, gotcha; I thought you were talking about changing
> pg_get_sequence_data() to throw an error instead of returning nulls.

Here is a patch that does this along with what you described upthread,
i.e., teaching pg_get_sequence_data to return nulls for missing sequences.
Apparently pg_dump still runs through dumpSequenceData() for schema-only
dumps, which is a problem for this patch.  I've taught it to immediately
return for schema-only dumps to evade this problem.  That seems like a win
for older versions, too, as they will no longer run useless queries.

I believe this helps the reporter's case, as their problem involves dumping
one schema while dropping another, which v18 indeed makes worse because (as
you mentioned) we gather data for all sequences in the database.

-- 
nathan

Вложения

v1-0001-pg_dump-fix-use-of-pg_get_sequence_data.patch

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

08 января, 21:19:39

Nathan Bossart <nathandbossart@gmail.com> writes:
> Here is a patch that does this along with what you described upthread,
> i.e., teaching pg_get_sequence_data to return nulls for missing sequences.
> Apparently pg_dump still runs through dumpSequenceData() for schema-only
> dumps, which is a problem for this patch.  I've taught it to immediately
> return for schema-only dumps to evade this problem.  That seems like a win
> for older versions, too, as they will no longer run useless queries.

> I believe this helps the reporter's case, as their problem involves dumping
> one schema while dropping another, which v18 indeed makes worse because (as
> you mentioned) we gather data for all sequences in the database.

Looks plausible to me.  (I didn't test, just read the code.)

One nitpicky point is that try_sequence_open() will still error out
if it is given an OID that is a non-sequence relation.  I think it'd
be more desirable for it to close the relation again and return NULL.
That's probably insignificant for pg_dump's usage, because we could
only hit the case with very improbable OID wraparound timing.  But
I think our experience with catalog-inspection functions similar to
pg_get_sequence_data is that it's usually better to return NULL than
throw an error.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

08 января, 22:34:28

On Thu, Jan 08, 2026 at 01:19:39PM -0500, Tom Lane wrote:
> One nitpicky point is that try_sequence_open() will still error out
> if it is given an OID that is a non-sequence relation.  I think it'd
> be more desirable for it to close the relation again and return NULL.
> That's probably insignificant for pg_dump's usage, because we could
> only hit the case with very improbable OID wraparound timing.  But
> I think our experience with catalog-inspection functions similar to
> pg_get_sequence_data is that it's usually better to return NULL than
> throw an error.

Hm.  That makes sense, but both try_table_open and try_index_open error for
wrong relkinds.  I could change all of the try_*_open functions to return
NULL in that case, or I could just open-code the relkind check in
pg_get_sequence_data after try_relation_open (and have it return NULL for
non-sequences).  I'm leaning towards the latter, if for no other reason
than it might be slightly nicer for back-patching (e.g., smaller, no new
extern functions).

-- 
nathan

Вложения

v2-0001-pg_dump-fix-use-of-pg_get_sequence_data.patch

Re: improve performance of pg_dump with many sequences

От

Tom Lane

Дата:

08 января, 22:54:53

Nathan Bossart <nathandbossart@gmail.com> writes:
> Hm.  That makes sense, but both try_table_open and try_index_open error for
> wrong relkinds.  I could change all of the try_*_open functions to return
> NULL in that case, or I could just open-code the relkind check in
> pg_get_sequence_data after try_relation_open (and have it return NULL for
> non-sequences).  I'm leaning towards the latter, if for no other reason
> than it might be slightly nicer for back-patching (e.g., smaller, no new
> extern functions).

WFM.

            regards, tom lane

Re: improve performance of pg_dump with many sequences

От

Nathan Bossart

Дата:

09 января, 19:17:12

On Thu, Jan 08, 2026 at 02:54:53PM -0500, Tom Lane wrote:
> WFM.

Thanks, committed.

-- 
nathan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: improve performance of pg_dump with many sequences

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения