Обсуждение: Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

Поиск
Список
Период
Сортировка

Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Frédéric Yhuel
Дата:

On 6/3/25 17:34, Dimitrios Apostolou wrote:
> The backend process for each of the above ALTER TABLE commands, does not
>    parallelize the foreign key checks for the different partitions. I
>    know, because in the logs I see gigabytes of temporary files being
>    written, with the CONTEXT showing queries issued incrementally on
>    all the different partitions:
> 
>    :LOG:      temporary file: path "pg_tblspc/16390/PG_17_202406281/ 
> pgsql_tmp/pgsql_tmp3363462.579", size 1073741824
>    :CONTEXT:  SQL statement "SELECT fk."columnX" FROM ONLY 
> "public"."table_partition_214" fk
>               LEFT OUTER JOIN ONLY "public"."another_table" pk
>                   ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
>               WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"
> 
>    Why can't the backend issue these queries in parallel workers?

This has been discussed here: 
https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com

Perhaps we should exhume this patch, but I believe the optimal strategy 
is to perform a VACUUM between the data and post-data to build the 
visibility map. The anti-join can then use an efficient index-only scan.

Best regards,
Frédéric




Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Dimitrios Apostolou
Дата:
On Wed, 4 Jun 2025, Frédéric Yhuel wrote:

> On 6/3/25 17:34, Dimitrios Apostolou wrote:
>>  The backend process for each of the above ALTER TABLE commands, does not
>>     parallelize the foreign key checks for the different partitions. I
>>     know, because in the logs I see gigabytes of temporary files being
>>     written, with the CONTEXT showing queries issued incrementally on
>>     all the different partitions:
>>
>>    :LOG:      temporary file: path "pg_tblspc/16390/PG_17_202406281/
>>  pgsql_tmp/pgsql_tmp3363462.579", size 1073741824
>>    :CONTEXT:  SQL statement "SELECT fk."columnX" FROM ONLY
>>  "public"."table_partition_214" fk
>>                LEFT OUTER JOIN ONLY "public"."another_table" pk
>>                    ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
>>                WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"
>>
>>     Why can't the backend issue these queries in parallel workers?
>
> This has been discussed here:
> https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com
>
> Perhaps we should exhume this patch, but I believe the optimal strategy is to
> perform a VACUUM between the data and post-data to build the visibility map.
> The anti-join can then use an efficient index-only scan.

Thanks for pointing to this patch.
Since I run each of the pg_restore sections separately, I will try to
manually do a VACUUM after the "data" and before the "post-data" section.

In general I have noticed most operations are slower after a succesful
pg_restore until VACUUM is complete, which is unfortunate as the database
is huge and it takes days to run. Something I have on my list to try, is
whether a COPY FREEZE would alleviate all this trouble, since all tuples
are immediately visible then. Maybe a patch for a new pg_restore
option --freeze is a better solution. Are my assumptions right?

Thanks,
Dimitris

Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Frédéric Yhuel
Дата:

On 6/4/25 16:12, Dimitrios Apostolou wrote:
> In general I have noticed most operations are slower after a succesful 
> pg_restore until VACUUM is complete, which is unfortunate as the 
> database is huge and it takes days to run. Something I have on my list 
> to try, is whether a COPY FREEZE would alleviate all this trouble, since 
> all tuples are immediately visible then. Maybe a patch for a new 
> pg_restore option --freeze is a better solution. Are my assumptions right?

It seems that the idea has already been discussed: 

https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b

I've CCed Bruce Mojman, in the hope that he can tell us more about it.





Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Frédéric Yhuel
Дата:

On 6/5/25 16:13, Frédéric Yhuel wrote:
> 
> 
> On 6/4/25 16:12, Dimitrios Apostolou wrote:
>> In general I have noticed most operations are slower after a succesful 
>> pg_restore until VACUUM is complete, which is unfortunate as the 
>> database is huge and it takes days to run. Something I have on my list 
>> to try, is whether a COPY FREEZE would alleviate all this trouble, 
>> since all tuples are immediately visible then. Maybe a patch for a new 
>> pg_restore option --freeze is a better solution. Are my assumptions 
>> right?
> 
> It seems that the idea has already been discussed: https:// 
> www.postgresql.org/message-id/flat/ 
> CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr- 
> xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b
> 
> I've CCed Bruce Mojman, in the hope that he can tell us more about it.
> 
> 

(It might be more interesting now than 12 years ago thanks to this 
patch: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7db0cd2145f2bce84cac92402e205e4d2b045bf2)




[PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Dimitrios Apostolou
Дата:
On Thu, 5 Jun 2025, Frédéric Yhuel wrote:
>
> On 6/4/25 16:12, Dimitrios Apostolou wrote:
>>  In general I have noticed most operations are slower after a succesful
>>  pg_restore until VACUUM is complete, which is unfortunate as the database
>>  is huge and it takes days to run. Something I have on my list to try, is
>>  whether a COPY FREEZE would alleviate all this trouble, since all tuples
>>  are immediately visible then. Maybe a patch for a new pg_restore option
>>  --freeze is a better solution. Are my assumptions right?
>
> It seems that the idea has already been discussed:
>
https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b
>
> I've CCed Bruce Mojman, in the hope that he can tell us more about it.

Thanks for all the pointers, it shows that changes in postgres are harder
than they appear.

FWIW I implemented a pg_restore --freeze patch, see attached. It needs
another patch of mine from [1] that implements pg_restore --data-only
--clean, which for parallel restores encases each COPY in its own
transaction and prepends it with a TRUNCATE. All feedback is welcome.

[1] https://www.postgresql.org/message-id/c61263f2-7472-5dd8-703d-01e683421f61%40gmx.net

It works really fast for the data, and I see that some, but not all items
from section=post-data, start parallel plans. For example I see CREATE
INDEX spawns parallel workers.

But unfortunately the item in question (ADD FOREIGN KEY) is not parallel
(probably because the discussion [2] you posted in your previous email
never concluded). I /think/ though it's reading all the data faster than
before, but still has to go through terabytes of data and this takes a
long time, for each of the foreign keys it adds.

[2] https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52@dalibo.com

Still I wonder why pg_restore can't issue many ADD FOREIGN
KEY for the same table in parallel.


Regards,
Dimitris

Вложения

Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Dimitrios Apostolou
Дата:
>
> FWIW I implemented a pg_restore --freeze patch, see attached. It needs
> another patch of mine from [1] that implements pg_restore --data-only
> --clean, which for parallel restores encases each COPY in its own transaction
> and prepends it with a TRUNCATE. All feedback is welcome.
>
> [1] https://www.postgresql.org/message-id/c61263f2-7472-5dd8-703d-01e683421f61%40gmx.net
>
> It works really fast for the data, and I see that some, but not all items
> from section=post-data, start parallel plans. For example I see CREATE INDEX
> spawns parallel workers.

I added it to July's commitfest, mostly to trigger discussion around the
issue.

https://commitfest.postgresql.org/patch/5826/

Not sure how to mark on the commitfest page that the patch requires
another patch from another commitfest entry.


Dimitris




Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Stepan Neretin
Дата:


On Mon, Jul 21, 2025 at 12:24 PM Dimitrios Apostolou <jimis@gmx.net> wrote:
>
> FWIW I implemented a pg_restore --freeze patch, see attached. It needs
> another patch of mine from [1] that implements pg_restore --data-only
> --clean, which for parallel restores encases each COPY in its own transaction
> and prepends it with a TRUNCATE. All feedback is welcome.
>
> [1] https://www.postgresql.org/message-id/c61263f2-7472-5dd8-703d-01e683421f61%40gmx.net
>
> It works really fast for the data, and I see that some, but not all items
> from section=post-data, start parallel plans. For example I see CREATE INDEX
> spawns parallel workers.

I added it to July's commitfest, mostly to trigger discussion around the
issue.

https://commitfest.postgresql.org/patch/5826/

Not sure how to mark on the commitfest page that the patch requires
another patch from another commitfest entry.


Dimitris






Hello Dimitris and fellow hackers,

I'm And Warda(currently unavailable :( ) picking this patch up from the July CommitFest for review. Thanks, Dimitris, for submitting it and for tackling this well-known performance issue with restoring very large databases. The problem of `ALTER TABLE ... ADD FOREIGN KEY` taking days after a restore due to the lack of a visibility map is a significant pain point, and your proposed solution is a logical step.

I've reviewed the patch and the discussion in the thread. I've attached a rebased version of the patch against the current master.

The core idea of adding a `--freeze` option to `pg_restore` to leverage `COPY ... WITH (FREEZE)` is sound. It directly addresses the problem of needing a very long `VACUUM` run after data loading to make subsequent `post-data` steps, like foreign key validation, performant. The benefit isn't just a minor speedup in the `COPY` itself but the major time-saving of avoiding a full-table `VACUUM` on a massive dataset.

I ran a simple performance script based on your patch. On a small test case, I saw a consistent 10-15% speedup during the data-loading phase. While this is encouraging, the main benefit, as noted, is in the steps that follow.

=== Summary ===
freeze average: 1.205 seconds
nofreeze average: 1.324 seconds                                                                                                                                                                                      
~/.pg-vanilla/bin ᐅ ./tst.sh

=== Summary ===
freeze average: 1.362 seconds
nofreeze average: 1.480 seconds

#!/bin/bash
set -euo pipefail

DUMP_FILE="parttest.dump"
DB_PREFIX="parttest"
JOBS=$(nproc)
ROUNDS=3

timings() {
  local mode=$1
  local total_time=0

  for i in $(seq 1 $ROUNDS); do
    DB_NAME="${DB_PREFIX}_${mode}_$i"
    psql -q -c "DROP DATABASE IF EXISTS $DB_NAME"
    psql -q -c "CREATE DATABASE $DB_NAME"

    start_time=$(date +%s.%N)

    if [ "$mode" == "freeze" ]; then
      pg_restore -d "$DB_NAME" -j "$JOBS" --clean --if-exists --freeze "$DUMP_FILE" > /dev/null
    else
      pg_restore -d "$DB_NAME" -j "$JOBS" --clean --if-exists "$DUMP_FILE" > /dev/null
    fi

    end_time=$(date +%s.%N)
    duration=$(echo "$end_time - $start_time" | bc)
    total_time=$(echo "$total_time + $duration" | bc)
  done

  average=$(echo "scale=3; $total_time / $ROUNDS" | bc)
  echo "$mode average: $average seconds"
}

freeze_avg=$(timings freeze)
nofreeze_avg=$(timings nofreeze)

echo ""
echo "=== Summary ==="
echo "$freeze_avg"
echo "$nofreeze_avg"

The main area of concern is the implementation method in `pg_backup_archiver.c`. The patch introduces a `do_copy` function that modifies the `COPY` statement string to inject the `WITH (FREEZE)` option.

```c
if (cp_end - 10 > cp &&
    strncmp(cp_end - 10, "FROM stdin", 10) == 0
    )
    cp_is_well_formed = true;
```

This approach of string parsing is quite fragile. It makes a strong assumption that the `COPY` statement generated by `pg_dump` will always end with `... FROM stdin;` (preceded by optional whitespace). While this is true now, it creates a tight and brittle coupling between `pg_dump`'s output format and `pg_restore`'s parsing logic. Any future changes to `pg_dump` that might add options or slightly alter the `COPY` syntax could break this feature silently or with a non-obvious warning.

A more robust solution would be to avoid string manipulation entirely. `pg_restore` should assemble the `COPY` command from its constituent parts (table name, column list, etc.) and then conditionally append the `WITH (FREEZE)` clause before the final `FROM stdin;`. This would decouple the feature from the exact string representation in the dump archive.

An alternative—and arguably cleaner—approach might be to shift this logic to pg_dump.

One important consideration that needs to be highlighted in the documentation for this feature is the impact on WAL generation. `COPY ... WITH (FREEZE)` is known to generate significantly more WAL traffic because it must log the freezing of tuples, which can be a surprise for users. Maybe we can insert already frozen pages?

Additionally, it should be noted that the freeze option only works correctly when performing a fresh load of data into empty tables.

Thanks again for your work on this.

Regards,
Stepan Neretin
 
Вложения

Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

От
Dimitrios Apostolou
Дата:
Hello Stepan!

I can see you tested my patch by itself, and with the following command:

On Monday 2025-07-21 07:58, Stepan Neretin wrote:

>      pg_restore -d "$DB_NAME" -j "$JOBS" --clean --if-exists --freeze "$DUMP_FILE" > /dev/null

On the other hand, I have tested combining --freeze with --data-only
--clean, as implemented by another patch of mine at:


https://www.postgresql.org/message-id/flat/413c1cd8-1d6d-90ba-ac7b-b226a4dad5ed%40gmx.net#44f06ce85a14a6238125a0ad8c9767db

In fact I was considering that patch as a requirement to this one, but I
see that you are using my patch independently to issue --clean --freeze
without --data-only. As far as I understand, this will also issue a
TRUNCATE before each COPY, and the benefits of FREEZE will be the same.
Thank you for revealing this to me.


>The main area of concern is the implementation method in `pg_backup_archiver.c`. The patch introduces a `do_copy`
functionthat modifies the `COPY` statement string to inject the `WITH (FREEZE)` option. 
>
>```c
>if (cp_end - 10 > cp &&
>    strncmp(cp_end - 10, "FROM stdin", 10) == 0
>    )
>    cp_is_well_formed = true;
>```
>
>This approach of string parsing is quite fragile. It makes a strong assumption that the `COPY` statement generated by
`pg_dump`will always end with `... FROM stdin;` (preceded by optional whitespace). While this is true now, it creates a
tightand brittle coupling between `pg_dump`'s output format and 
>`pg_restore`'s parsing logic. Any future changes to `pg_dump` that might add options or slightly alter the `COPY`
syntaxcould break this feature silently or with a non-obvious warning. 

It troubled me too. At least I have a pg_log_warning() message in this
case, and nothing detrimental will happen, it's just that the FREEZE
option will not be used and the message will be logged.

>
>A more robust solution would be to avoid string manipulation entirely. `pg_restore` should assemble the `COPY` command
fromits constituent parts (table name, column list, etc.) and then conditionally append the `WITH (FREEZE)` clause
beforethe final `FROM stdin;`. This would decouple the feature from 
>the exact string representation in the dump archive.

Totally agree, but this sounds impossible to implement. Unfortunately
pg_dump generates SQL commands, and pg_restore edits them and executes
them. I wouldn't know where to start to change this whole architecture.

>
>An alternative—and arguably cleaner—approach might be to shift this logic to pg_dump.

Thinking it more thoroughly, pg_dump is not the place to compose the
restoration SQL commands. The proper place is pg_restore.
I guess it's being done in pg_dump for historical reasons, but I don't
think pg_dump should complicate its commands further (e.g. by adding
WITH FREEZE), as this is a choice to make during restore.
I think I've seen more places in pg_restore that edit the
commands before executing them.


>One important consideration that needs to be highlighted in the documentation for this feature is the impact on WAL
generation.`COPY ... WITH (FREEZE)` is known to generate significantly more WAL traffic because it must log the
freezingof tuples, which can be a surprise for users. Maybe we can insert 
>already frozen pages?

There should be no WAL traffic at all. If the transaction starts with a
TRUNCATE TABLE then COPY skips the wal completely. This gives a big
performance and stability gain when bulk-loading data, and this is what
my 2 patches try to leverage in more cases.

And as far as I can tell, if there is no TRUNCATE in the same
transaction, then pg_restore will output error like the following:

ERROR:  cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction

I hope such an erroris acceptable, since the sysadmin can just remove
--freeze then and re-run the command.

>Additionally, it should be noted that the freeze option only works correctly when performing a fresh load of data into
emptytables. 

Do you think this patch has chances of going in by itself? If so then
yes, I should definitely update the docs and submit it again
individually.


Thank you for the rebase and review.

Dimitris