Обсуждение: improve performance of pg_dump with many sequences
Similar to 'pg_dump --binary-upgrade' [0], we can speed up pg_dump with
many sequences by gathering the required information in a single query
instead of two queries per sequence. The attached patches are
works-in-progress, but here are the results I see on my machine for
'pg_dump --schema-only --binary-upgrade' with a million sequences:
HEAD : 6m22.809s
[0] : 1m54.701s
[0] + attached : 0m38.233s
I'm not sure I have all the details correct in 0003, and we might want to
separate the table into two tables which are only populated when the
relevant section is dumped. Furthermore, the query in 0003 is a bit goofy
because it needs to dance around a bug reported elsewhere [1].
[0] https://postgr.es/m/20240418041712.GA3441570%40nathanxps13
[1] https://postgr.es/m/20240501005730.GA594666%40nathanxps13
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Вложения
On Tue, Jul 9, 2024, at 4:11 PM, Nathan Bossart wrote:
rebased
Nice improvement. The numbers for a realistic scenario (10k sequences) are
for i in `seq 1 10000`; do echo "CREATE SEQUENCE s$i;"; done > /tmp/s.sql
master:
real 0m1,141s
user 0m0,056s
sys 0m0,147s
patched:
real 0m0,410s
user 0m0,045s
sys 0m0,103s
You are changing internal representation from char to int64. Is the main goal to
validate catalog data? What if there is a new sequence data type whose
representation is not an integer?
This code path is adding zero byte to the last position of the fixed string. I
suggest that the zero byte is added to the position after the string length.
Assert(strlen(PQgetvalue(res, 0, 0)) < sizeof(seqtype));
strncpy(seqtype, PQgetvalue(res, 0, 0), sizeof(seqtype));
seqtype[sizeof(seqtype) - 1] = '\0';
Something like
l = strlen(PQgetvalue(res, 0, 0));
Assert(l < sizeof(seqtype));
strncpy(seqtype, PQgetvalue(res, 0, 0), l);
seqtype[l] = '\0';
Another suggestion is to use a constant for seqtype
char seqtype[MAX_SEQNAME_LEN];
and simplify the expression:
size_t seqtype_sz = sizeof(((SequenceItem *) 0)->seqtype);
If you are not planning to apply 0003, make sure you fix collectSequences() to
avoid versions less than 10. Move this part to 0002.
@@ -17233,11 +17235,24 @@ collectSequences(Archive *fout)
PGresult *res;
const char *query;
+ if (fout->remoteVersion < 100000)
+ return;
+
Since you apply a fix for pg_sequence_last_value function, you can simplify the
query in 0003. CASE is not required.
I repeated the same test but not applying 0003.
patched (0001 and 0002):
real 0m0,290s
user 0m0,038s
sys 0m0,104s
I'm not sure if 0003 is worth. Maybe if you have another table like you
suggested.
On Wed, Jul 10, 2024 at 05:08:56PM -0300, Euler Taveira wrote: > Nice improvement. The numbers for a realistic scenario (10k sequences) are Thanks for taking a look! > You are changing internal representation from char to int64. Is the main goal to > validate catalog data? What if there is a new sequence data type whose > representation is not an integer? IIRC 0001 was primarily intended to reduce the amount of memory needed for the sorted table. Regarding a new sequence data type, I'm assuming we'll have much bigger fish to fry if we do that (e.g., pg_sequence uses int8 for the values), and I'd hope that adjusting this code wouldn't be too difficult, anyway. > This code path is adding zero byte to the last position of the fixed string. I > suggest that the zero byte is added to the position after the string length. I'm not following why that would be a better approach. strncpy() will add a NUL to the end of the string unless it doesn't fit in the buffer, in which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'". Furthermore, the compiler can determine the position where the NUL should be placed, whereas placing it at the end of the copied string requires a runtime strlen(). > l = strlen(PQgetvalue(res, 0, 0)); > Assert(l < sizeof(seqtype)); > strncpy(seqtype, PQgetvalue(res, 0, 0), l); > seqtype[l] = '\0'; I think the strncpy() should really be limited to the size of the seqtype buffer. IMHO an Assert is not sufficient. > If you are not planning to apply 0003, make sure you fix collectSequences() to > avoid versions less than 10. Move this part to 0002. Yeah, no need to create the table if we aren't going to use it. > Since you apply a fix for pg_sequence_last_value function, you can simplify the > query in 0003. CASE is not required. Unfortunately, I think we have to keep this workaround since older minor releases of PostgreSQL don't have the fix. > patched (0001 and 0002): > real 0m0,290s > user 0m0,038s > sys 0m0,104s > > I'm not sure if 0003 is worth. Maybe if you have another table like you > suggested. What pg_dump command did you test here? Did you dump the sequence data, or was this --schema-only? -- nathan
On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote:
I'm not following why that would be a better approach. strncpy() will adda NUL to the end of the string unless it doesn't fit in the buffer, inwhich case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'".Furthermore, the compiler can determine the position where the NUL shouldbe placed, whereas placing it at the end of the copied string requires aruntime strlen().
Nevermind, you are copying the whole buffer (n = sizeof(seqtype)).
Unfortunately, I think we have to keep this workaround since older minorreleases of PostgreSQL don't have the fix.
Hmm. Right.
What pg_dump command did you test here? Did you dump the sequence data, orwas this --schema-only?
time pg_dump -f - -s -d postgres
On Wed, Jul 10, 2024 at 11:52:33PM -0300, Euler Taveira wrote: > On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote: >> Unfortunately, I think we have to keep this workaround since older minor >> releases of PostgreSQL don't have the fix. > > Hmm. Right. On second thought, maybe we should just limit this improvement to the minor releases with the fix so that we _can_ get rid of the workaround. Or we could use the hacky workaround only for versions with the bug. -- nathan
On Thu, Jul 11, 2024 at 09:09:17PM -0500, Nathan Bossart wrote: > On second thought, maybe we should just limit this improvement to the minor > releases with the fix so that we _can_ get rid of the workaround. Or we > could use the hacky workaround only for versions with the bug. Here is a new version of the patch set. The main differences are 1) we no longer gather the sequence data for schema-only dumps and 2) 0003 uses a simplified query for dumps on v18 and newer. I considered also using a slightly simplified query for dumps on versions with the unlogged-sequences-on-standbys fix, but I felt that wasn't worth the extra code. Unfortunately, I've also discovered a problem with 0003. pg_sequence_last_value() returns NULL when is_called is false, in which case we assume last_value == seqstart, which is, sadly, bogus due to commands like ALTER SEQUENCE [RE]START WITH. AFAICT there isn't an easy way around this. We could either create a giant query that gathers the information from all sequences in the database, or we could introduce a new function in v18 that returns everything we need (which would only help for upgrades _from_ v18). Assuming I'm not missing a better option, I think the latter is the better choice, and I still think it's worth doing even though it probably won't help anyone for ~2.5 years. -- nathan
Вложения
On Tue, Jul 16, 2024 at 04:36:15PM -0500, Nathan Bossart wrote: > Unfortunately, I've also discovered a problem with 0003. > pg_sequence_last_value() returns NULL when is_called is false, in which > case we assume last_value == seqstart, which is, sadly, bogus due to > commands like ALTER SEQUENCE [RE]START WITH. AFAICT there isn't an easy > way around this. We could either create a giant query that gathers the > information from all sequences in the database, or we could introduce a new > function in v18 that returns everything we need (which would only help for > upgrades _from_ v18). Assuming I'm not missing a better option, I think > the latter is the better choice, and I still think it's worth doing even > though it probably won't help anyone for ~2.5 years. Yeah, I have bumped on the same issue. In the long term, I also think that we'd better have pg_sequence_last_value() return a row with is_called and the value scanned. As you say, it won't help except when upgrading from versions of Postgres that are at least to v18, assuming that this change gets in the tree, but that would be much better in the long term and time flies fast. See 0001 as of this area: https://www.postgresql.org/message-id/ZnPIUPMmp5TzBPC2%40paquier.xyz -- Michael
Вложения
On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote: > Yeah, I have bumped on the same issue. In the long term, I also think > that we'd better have pg_sequence_last_value() return a row with > is_called and the value scanned. As you say, it won't help except > when upgrading from versions of Postgres that are at least to v18, > assuming that this change gets in the tree, but that would be much > better in the long term and time flies fast. AFAICT pg_sequence_last_value() is basically an undocumented internal function only really intended for use by the pg_sequences system view, so changing the function like this for v18 might not be out of the question. Otherwise, I think we'd have to create a strikingly similar function with slightly different behavior, which would be a bizarre place to end up. -- nathan
On Tue, Jul 16, 2024 at 10:23:08PM -0500, Nathan Bossart wrote: > On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote: >> Yeah, I have bumped on the same issue. In the long term, I also think >> that we'd better have pg_sequence_last_value() return a row with >> is_called and the value scanned. As you say, it won't help except >> when upgrading from versions of Postgres that are at least to v18, >> assuming that this change gets in the tree, but that would be much >> better in the long term and time flies fast. > > AFAICT pg_sequence_last_value() is basically an undocumented internal > function only really intended for use by the pg_sequences system view, so > changing the function like this for v18 might not be out of the question. > Otherwise, I think we'd have to create a strikingly similar function with > slightly different behavior, which would be a bizarre place to end up. On second thought, I worry that this change might needlessly complicate the pg_sequences system view. Maybe we should just add a pg_sequence_get_tuple() function that returns everything in FormData_pg_sequence_data for a given sequence OID... -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> On second thought, I worry that this change might needlessly complicate the
> pg_sequences system view. Maybe we should just add a
> pg_sequence_get_tuple() function that returns everything in
> FormData_pg_sequence_data for a given sequence OID...
Uh ... why do we need a function, rather than just
select * from pg_sequence
?
regards, tom lane
On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> On second thought, I worry that this change might needlessly complicate the >> pg_sequences system view. Maybe we should just add a >> pg_sequence_get_tuple() function that returns everything in >> FormData_pg_sequence_data for a given sequence OID... > > Uh ... why do we need a function, rather than just > > select * from pg_sequence We can use that for dumpSequence(), but dumpSequenceData() requires information from the sequence tuple itself. Right now, we query each sequence relation individually for that data, and I'm trying to find a way to cut down on those round trips. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote:
>> Uh ... why do we need a function, rather than just
>> select * from pg_sequence
> We can use that for dumpSequence(), but dumpSequenceData() requires
> information from the sequence tuple itself. Right now, we query each
> sequence relation individually for that data, and I'm trying to find a way
> to cut down on those round trips.
Ah, I confused FormData_pg_sequence_data with FormData_pg_sequence.
Sorry for the noise.
regards, tom lane
Here is an attempt at adding a new function that returns the sequence tuple
and using that to avoid querying each sequence relation individually in
dumpSequenceData().
If we instead wanted to change pg_sequence_last_value() to return both
is_called and last_value, I think we could modify the pg_sequences system
view to use a LATERAL subquery, i.e.,
SELECT
...
CASE
WHEN L.is_called THEN L.last_value
ELSE NULL
END AS last_value
FROM pg_sequence S
...
JOIN LATERAL pg_sequence_last_value(S.seqrelid) L ON true
...
That doesn't seem so bad, and it'd avoid an extra pg_proc entry, but it
would probably break anything that calls pg_sequence_last_value() directly.
Thoughts?
--
nathan
Вложения
Nathan Bossart <nathandbossart@gmail.com> writes:
> Here is an attempt at adding a new function that returns the sequence tuple
> and using that to avoid querying each sequence relation individually in
> dumpSequenceData().
Didn't read the patch yet, but ...
> If we instead wanted to change pg_sequence_last_value() to return both
> is_called and last_value, I think we could modify the pg_sequences system
> view to use a LATERAL subquery, i.e.,
> ...
> That doesn't seem so bad, and it'd avoid an extra pg_proc entry, but it
> would probably break anything that calls pg_sequence_last_value() directly.
> Thoughts?
... one more pg_proc entry is pretty cheap. I think we should leave
pg_sequence_last_value alone. We don't know if anyone is depending
on it, and de-optimizing the pg_sequences view doesn't seem like a
win either.
... okay, I lied, I looked at the patch. Why are you testing
+ if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) == ACLCHECK_OK &&
? This is a substitute for a SELECT from the sequence and it seems
like it ought to demand exactly the same privilege as SELECT.
(If you want to get more technical, USAGE allows nextval() which
gives strictly less information than what this exposes; that's why
we're here after all.) So there is a difference in the privilege
levels, which is another reason for not combining this with
pg_sequence_last_value.
regards, tom lane
On Wed, Jul 17, 2024 at 11:58:21PM -0400, Tom Lane wrote: > ... okay, I lied, I looked at the patch. Why are you testing > > + if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) == ACLCHECK_OK && > > ? This is a substitute for a SELECT from the sequence and it seems > like it ought to demand exactly the same privilege as SELECT. > (If you want to get more technical, USAGE allows nextval() which > gives strictly less information than what this exposes; that's why > we're here after all.) So there is a difference in the privilege > levels, which is another reason for not combining this with > pg_sequence_last_value. Oh, that's a good point. I wrongly assumed the privilege checks would be the same as pg_sequence_last_value(). I fixed this in v5. I also polished the rest of the patches a bit. Among other things, I created an enum for the sequence data types to avoid the hacky strncpy() stuff, which was causing weird CI failures [0]. [0] https://cirrus-ci.com/task/4614801962303488 -- nathan
Вложения
I fixed a compiler warning on Windows in v6 of the patch set. Sorry for the noise. -- nathan
Вложения
I ran Euler's tests again on the v6 patch set.
for i in `seq 1 10000`; do psql postgres -c "CREATE SEQUENCE s$i;"; done
time pg_dump -f - -s -d postgres > /dev/null
HEAD: 0.607s
0001 + 0002: 0.094s
all patches: 0.094s
Barring additional feedback, I am planning to commit these patches early
next week.
--
nathan
Committed. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> Committed.
In the no-good-deed-goes-unpunished department: pg_dump's use
of pg_get_sequence_data() (nee pg_sequence_read_tuple()) is
evidently responsible for the complaint in bug #19365 [1]
that pg_dump can no longer survive concurrent sequence drops.
Given that that function already silently returns NULLs if the
sequence isn't readable for other reasons, I think it'd be
sane to make it silently return NULL if the sequence isn't
there anymore. Unfortunately, that looks like it'd require
nontrivial restructuring of init_sequence().
Or maybe we could make it not use init_sequence()? For the moment
a plain try_relation_open and check that it's a sequence should do,
but I'm not sure how that'd fit into people's plans for future
improvement of the sequence API.
There are other reasons not to like use of init_sequence in this
code path, too. pg_dump's session will build a SeqTable entry for
every sequence in the database, which there could be a lot of,
and it will acquire RowExclusiveLock on every sequence and hold
that to the end of the dump, which seems likely to be troublesome
from a concurrency standpoint. Since pg_get_sequence_data is a
read-only operation this lock level feels wrong.
BTW, I'm unconvinced that pg_dump behaves sanely when this function
does return nulls. I think the ideal thing would be for it to skip
issuing setval(), but right now it looks like it will issue one with
garbage values.
regards, tom lane
[1] https://www.postgresql.org/message-id/19365-6245240d8b926327%40postgresql.org
I'm still looking into this, but here are some preliminary thoughts. On Mon, Dec 29, 2025 at 12:26:01PM -0500, Tom Lane wrote: > In the no-good-deed-goes-unpunished department: pg_dump's use > of pg_get_sequence_data() (nee pg_sequence_read_tuple()) is > evidently responsible for the complaint in bug #19365 [1] > that pg_dump can no longer survive concurrent sequence drops. This seems to be reproducible on older versions. With a well-timed sleep right before dumpSequenceData()'s pre-v18 query, I can produce a relation-does-not-exist error with a concurrent sequence drop. Perhaps v18 made this easier to reach, but given it moved the sequence tuple access to collectSequences()'s query, I'm not sure why that would be. > BTW, I'm unconvinced that pg_dump behaves sanely when this function > does return nulls. I think the ideal thing would be for it to skip > issuing setval(), but right now it looks like it will issue one with > garbage values. Before v18, pg_dump just ERRORs due to insufficient privileges on a sequence. IMHO that makes sense. If you ask pg_dump to dump something you don't have privileges on, I'd expect it to error instead of silently skipping it. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> Before v18, pg_dump just ERRORs due to insufficient privileges on a
> sequence. IMHO that makes sense. If you ask pg_dump to dump something you
> don't have privileges on, I'd expect it to error instead of silently
> skipping it.
That would be a fine argument were it not that collectSequences()
tries to vacuum up the data for every sequence in the DB, whether
the user has asked to dump them all or not. In other places in
pg_dump, we avoid such problems by restricting which tables we
ask for data about ... but not here.
regards, tom lane
On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> Before v18, pg_dump just ERRORs due to insufficient privileges on a >> sequence. IMHO that makes sense. If you ask pg_dump to dump something you >> don't have privileges on, I'd expect it to error instead of silently >> skipping it. > > That would be a fine argument were it not that collectSequences() > tries to vacuum up the data for every sequence in the DB, whether > the user has asked to dump them all or not. In other places in > pg_dump, we avoid such problems by restricting which tables we > ask for data about ... but not here. I meant that we could teach pg_dump to error in dumpSequenceData() if it sees nulls for the sequence in question. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote:
>> That would be a fine argument were it not that collectSequences()
>> tries to vacuum up the data for every sequence in the DB, whether
>> the user has asked to dump them all or not.
> I meant that we could teach pg_dump to error in dumpSequenceData() if it
> sees nulls for the sequence in question.
Ah, gotcha; I thought you were talking about changing
pg_get_sequence_data() to throw an error instead of returning nulls.
regards, tom lane
On Wed, Jan 07, 2026 at 07:24:52PM -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> On Wed, Jan 07, 2026 at 06:13:48PM -0500, Tom Lane wrote: >>> That would be a fine argument were it not that collectSequences() >>> tries to vacuum up the data for every sequence in the DB, whether >>> the user has asked to dump them all or not. > >> I meant that we could teach pg_dump to error in dumpSequenceData() if it >> sees nulls for the sequence in question. > > Ah, gotcha; I thought you were talking about changing > pg_get_sequence_data() to throw an error instead of returning nulls. Here is a patch that does this along with what you described upthread, i.e., teaching pg_get_sequence_data to return nulls for missing sequences. Apparently pg_dump still runs through dumpSequenceData() for schema-only dumps, which is a problem for this patch. I've taught it to immediately return for schema-only dumps to evade this problem. That seems like a win for older versions, too, as they will no longer run useless queries. I believe this helps the reporter's case, as their problem involves dumping one schema while dropping another, which v18 indeed makes worse because (as you mentioned) we gather data for all sequences in the database. -- nathan
Вложения
Nathan Bossart <nathandbossart@gmail.com> writes:
> Here is a patch that does this along with what you described upthread,
> i.e., teaching pg_get_sequence_data to return nulls for missing sequences.
> Apparently pg_dump still runs through dumpSequenceData() for schema-only
> dumps, which is a problem for this patch. I've taught it to immediately
> return for schema-only dumps to evade this problem. That seems like a win
> for older versions, too, as they will no longer run useless queries.
> I believe this helps the reporter's case, as their problem involves dumping
> one schema while dropping another, which v18 indeed makes worse because (as
> you mentioned) we gather data for all sequences in the database.
Looks plausible to me. (I didn't test, just read the code.)
One nitpicky point is that try_sequence_open() will still error out
if it is given an OID that is a non-sequence relation. I think it'd
be more desirable for it to close the relation again and return NULL.
That's probably insignificant for pg_dump's usage, because we could
only hit the case with very improbable OID wraparound timing. But
I think our experience with catalog-inspection functions similar to
pg_get_sequence_data is that it's usually better to return NULL than
throw an error.
regards, tom lane
On Thu, Jan 08, 2026 at 01:19:39PM -0500, Tom Lane wrote: > One nitpicky point is that try_sequence_open() will still error out > if it is given an OID that is a non-sequence relation. I think it'd > be more desirable for it to close the relation again and return NULL. > That's probably insignificant for pg_dump's usage, because we could > only hit the case with very improbable OID wraparound timing. But > I think our experience with catalog-inspection functions similar to > pg_get_sequence_data is that it's usually better to return NULL than > throw an error. Hm. That makes sense, but both try_table_open and try_index_open error for wrong relkinds. I could change all of the try_*_open functions to return NULL in that case, or I could just open-code the relkind check in pg_get_sequence_data after try_relation_open (and have it return NULL for non-sequences). I'm leaning towards the latter, if for no other reason than it might be slightly nicer for back-patching (e.g., smaller, no new extern functions). -- nathan
Вложения
Nathan Bossart <nathandbossart@gmail.com> writes:
> Hm. That makes sense, but both try_table_open and try_index_open error for
> wrong relkinds. I could change all of the try_*_open functions to return
> NULL in that case, or I could just open-code the relkind check in
> pg_get_sequence_data after try_relation_open (and have it return NULL for
> non-sequences). I'm leaning towards the latter, if for no other reason
> than it might be slightly nicer for back-patching (e.g., smaller, no new
> extern functions).
WFM.
regards, tom lane
On Thu, Jan 08, 2026 at 02:54:53PM -0500, Tom Lane wrote: > WFM. Thanks, committed. -- nathan