Обсуждение: Feature Proposal: schema renaming in pg_dump/pg_restore

Поиск
Список
Период
Сортировка

Feature Proposal: schema renaming in pg_dump/pg_restore

От
Brad Arndt
Дата:
First time going through the contribution process with postgres, but from what I read in the documentation, the best place to start with a new feature proposal is an email to this mailing list for feedback.  So here goes...

Use Case:
We use pg_dump/pg_restore both as a mechanism to "archive"/"restore" data as well as a mechanism to move data between databases.  As part of both of these operations, one thing we frequently run into is the need to rename the schema as part of the process (either to fit a naming convention, or due to naming conflicts).

For a concrete example, suppose we have the following:
  • database_1
    • schema_1
    • schema_2
  • database_2
    • schema_1
And we wanted to copy database_1.schema_1 to database_2.schema_2.

What we have tried so far (and the drawbacks)

1. Using temporary schemas to avoid collisions.  

In our example above, this looks like the following:
  a. Clone database_1.schema_1 to database_1.temp_schema
  b. dump/restore database_1.temp_schema to database_2.temp_schema
  c. rename database_2.temp_schema to database_2.schema_2
  d. delete database_1.temp_schema

The obvious downside to this approach is the need for the temp schemas.  Especially on database_1 where the data must be duplicated temporarily.

2. "Post-processing" the output of pg_dump to do renaming

Specifically:
a. pg_dump database_1.schema_1 using SQL/text output
b. piping that output to a process which uses regular expressions to replace schema_1 with schema_2
c. piping that output into pgsql for restore 

While we were able to make this workflow work, it has a couple of drawbacks:
1. regular expression replacement can be brittle to maintain as pg_dump output changes
2. because we needed access to the text sql statements, we were limited to sql/text output from pg_dump and pgsql import which precluded us from taking advantage of many of the performance optimizations the other formats of pg_dump/pg_restore provide

My Proposal:
Build in the ability to rename schemas as part of pg_dump (and to a lesser extent pg_restore - more on that below) by adding a --rename-schema flag.

The flag could support:
- rename all schemas to a single replacement: --rename-schema <replacement name>
- rename a specific schema: --rename-schema <src schema>:<replacement name>
- or a combination (multiple flag occurrences): --rename-schema <schema1>:<schema1 replacement> --rename-schema <replacement for all other schemas>

For example, the following commands would all rename schema_1 to schema_2 in the dump output:
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_1:schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema *:schema_2

Secondary Feature: pg_restore replacement
Renaming as part of the dump works great when we know the ultimate restore name at the time of dumping.  However, there are times we may want to dump a db to cold storage, and then restore it at a later time to an arbitary schema. Renaming on the restore is a little less precise than on dump as in most cases we are just working with the raw sql statements created by the dump.  So for this use case, I'm proposing a more simplistic text find/replace.  Because of the unstructured nature, it would be on the user to ensure that their replacement string is unique.  For example (again, using the above scenario):

1. dump database_1.schema_1 and use the dump rename feature to rename schema1 to some_very_unique_schema in the dump file
2. restore at a later time doing a simple text replacement:
    pg_restore ... --replace some_very_unique_schema:schema_1


What I have done so far / Next Steps:
I have a working proof of concept built on top of postgres 16.8 (what we currently have in production).  

If the community feels this is a reasonable feature to add, I would like to go through the process of building on top of the release branch and submitting the patch.  However, I wanted to get feedback on whether this is a reasonable feature to pursue first.

Open to any and all feedback. Thanks!!

-Brad





Re: Feature Proposal: schema renaming in pg_dump/pg_restore

От
Tom Lane
Дата:
Brad Arndt <brad.arndt@gmail.com> writes:
> *My Proposal:*
> Build in the ability to rename schemas as part of pg_dump (and to a lesser
> extent pg_restore - more on that below) by adding a --rename-schema flag.

This has been proposed before, and the discussions always foundered
on the fact that pg_dump has no understanding of the contents of
function bodies.  For that matter it doesn't really understand
the contents of views, default expressions, etc either.  So any such
feature would be dangerously incomplete: it'd not be very much safer
than just doing a string substitution on "pg_dump -s" output.

I concede that it could be useful anyway to some people in some
use-cases, but we don't generally like to put development and support
effort into things that would have to come with big red warning flags.
Inevitably, people would use the feature carelessly, shoot themselves
in the foot, and complain to us about it.

            regards, tom lane