Обсуждение: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

Поиск
Список
Период
Сортировка

[HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robins Tharakan
Дата:
Hi,

Attached is a patch adds a --no-comments argument to pg_dump to skip generation of COMMENT statements when generating a backup. This is crucial for non-superusers to restore a database backup in a Single Transaction. Currently, this requires one to remove COMMENTs via scripts, which is inelegant at best.

A similar discussion had taken place earlier [1], however that stopped (with a Todo entry) since it required more work at the backend. 

This patch provides a stop-gap solution until then. If the feedback is to tackle this is by filtering comments for specific DB objects, an argument name (for e.g. --no-extension-comments or something) would help and I could submit a revised patch.

Alternatively, if there are no objections, I could submit this to the Commitfest.

References:

-
robins
​​
Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Greetings,

* Robins Tharakan (tharakan@gmail.com) wrote:
> Attached is a patch adds a --no-comments argument to pg_dump to skip
> generation of COMMENT statements when generating a backup. This is crucial
> for non-superusers to restore a database backup in a Single Transaction.
> Currently, this requires one to remove COMMENTs via scripts, which is
> inelegant at best.

Having --no-comments seems generally useful to me, in any case.

> A similar discussion had taken place earlier [1], however that stopped
> (with a Todo entry) since it required more work at the backend.

Well, that was one possible solution (COMMENT IF NOT EXISTS).  That does
seem like it'd be nice too, though what I think we really want here is
something like:

CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;

That general capability has been asked for and discussed before, but
it's complicated because we support comments on lots of objects and
doesn't address other possible issues with this approach (comments
aren't the only things that could exist on plpgsql, and I can't see us
having a way to get every component included in a single command...).

That leads to an interesting thought which we could consider, which
would be represented in some crazy syntax such as:

CREATE EXTENSION IF NOT EXISTS plpgsql ... ( COMMENT xyz; GRANT USAGE ON EXTENSION plpgsql whatever;
);

> This patch provides a stop-gap solution until then. If the feedback is to
> tackle this is by filtering comments for specific DB objects, an argument
> name (for e.g. --no-extension-comments or something) would help and I could
> submit a revised patch.

That seems like it might be a bit too specific.

> Alternatively, if there are no objections, I could submit this to the
> Commitfest.

Yes, please add it to the commitfest.

Thanks!

Stephen

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
"David G. Johnston"
Дата:
On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,

* Robins Tharakan (tharakan@gmail.com) wrote:
> Attached is a patch adds a --no-comments argument to pg_dump to skip
> generation of COMMENT statements when generating a backup. This is crucial
> for non-superusers to restore a database backup in a Single Transaction.
> Currently, this requires one to remove COMMENTs via scripts, which is
> inelegant at best.

Having --no-comments seems generally useful to me, in any case.

I​t smacks of being excessive to me.
CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;

​A less verbose way to add comments to objects would be nice but we have an immediate problem that we either need to solve or document a best practice for.

COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to what seems to me is the underlying problem...that people don't want a non-functional (usually...) aspect preventing successful restoration.

COMMENT ON object TRY 'text'  -- i.e., replace the word IS with TRY

If the attempt to comment fails for any reason log a warning (maybe) but otherwise ignore the result and continue on without invoking an error.

One suggestion I've seen is to simply "COMMENT ON EXTENSION plpgsql IS NULL;"  If that works in the scenarios people are currently dealing with then I'd say we should advise that such an action be taken for those whom wish to generate dumps that can be loaded by non-super-users.  If the affected users cannot make that work then maybe we should just remove the comment from the extension.  People can lookup "plpgsql" in the docs easily enough and "PL/pgSQL procedural language" doesn't do anything more than expand the acronym.

David J.

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
David,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Robins Tharakan (tharakan@gmail.com) wrote:
> > > Attached is a patch adds a --no-comments argument to pg_dump to skip
> > > generation of COMMENT statements when generating a backup. This is
> > crucial
> > > for non-superusers to restore a database backup in a Single Transaction.
> > > Currently, this requires one to remove COMMENTs via scripts, which is
> > > inelegant at best.
> >
> > Having --no-comments seems generally useful to me, in any case.
>
> I​t smacks of being excessive to me.
> ​

I have a hard time having an issue with an option that exists to exclude
a particular type of object from being in the dump.  A user who never
uses large objects/blobs might feel the same way about "--no-blobs", or
a user who never uses ACLs wondering why we have a "--no-privileges".
In the end, these are also possible components that users may wish to
have for their own reasons.

What I, certainly, agree isn't ideal is requiring users to use such an
option to generate a database-level dump file (assuming they have access
to more-or-less all database objects) which can be restored as a
non-superuser, that's why I was a bit hesitant about this particular
solution to the overall problem.

I do agree that if there is simply no use-case, ever, for wishing to
strip comments from a database then it might be excessive to provide
such an option, but I tend to feel that there is a reasonable, general,
use-case for the option.

> > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
>
> ​A less verbose way to add comments to objects would be nice but we have an
> immediate problem that we either need to solve or document a best practice
> for.

The above would be a solution to the immediate problem in as much as
adding COMMENT IF NOT EXISTS would be.

> COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to
> what seems to me is the underlying problem...that people don't want a
> non-functional (usually...) aspect preventing successful restoration.

I tend to disagree with this characterization- I'm of the opinion that
people are, rightly, confused as to why we bother to try and add a
COMMENT to an object which we didn't actually end up creating (as it
already existed), and then throw an error on it to boot.  Were pg_dump a
bit more intelligent of an application, it would realize that once the
CREATE ... IF NOT EXISTS returned a notice saying "this thing already
existed" that it would realize that it shouldn't try to adjust the
attributes of that object, as it was already existing.  That, however,
would preclude the ability of pg_dump to produce a text file used for
restore, unless we started to write into that text file DO blocks, which
I doubt would go over very well.

> COMMENT ON object TRY 'text'  -- i.e., replace the word IS with TRY

I'm not sure that I see why we should invent wholly new syntax for
something which we already have in IF NOT EXISTS, or why this should
really be viewed or considered any differently from the IF NOT EXISTS
syntax.

Perhaps you could elaborate as to how this is different from IF NOT
EXISTS?

> If the attempt to comment fails for any reason log a warning (maybe) but
> otherwise ignore the result and continue on without invoking an error.

In the IF NOT EXISTS case we log a NOTICE, which seems like it's what
would be appropriate here also, again begging the question of it this is
really different from IF NOT EXISTS in a meaningful way.

> One suggestion I've seen is to simply "COMMENT ON EXTENSION plpgsql IS
> NULL;"  If that works in the scenarios people are currently dealing with
> then I'd say we should advise that such an action be taken for those whom
> wish to generate dumps that can be loaded by non-super-users.  If the
> affected users cannot make that work then maybe we should just remove the
> comment from the extension.  People can lookup "plpgsql" in the docs easily
> enough and "PL/pgSQL procedural language" doesn't do anything more than
> expand the acronym.

Removing the comment as a way to deal with our deficiency in this area
strikes me as akin to adding planner hints.

Thanks!

Stephen

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
"David G. Johnston"
Дата:
Stephen,

On Tue, May 30, 2017 at 8:41 PM, Stephen Frost <sfrost@snowman.net> wrote:
David,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Robins Tharakan (tharakan@gmail.com) wrote:
> > > Attached is a patch adds a --no-comments argument to pg_dump to skip
> > > generation of COMMENT statements when generating a backup. This is
> > crucial
> > > for non-superusers to restore a database backup in a Single Transaction.
> > > Currently, this requires one to remove COMMENTs via scripts, which is
> > > inelegant at best.
> >
> > Having --no-comments seems generally useful to me, in any case.
>
> I​t smacks of being excessive to me.
> ​

I have a hard time having an issue with an option that exists to exclude
a particular type of object from being in the dump.

​Excessive with respect to the problem at hand.  A single comment in the dump is unable to be restored.  Because of that we are going to require people to omit every comment in their database.  The loss of all that information is in excess of what is required to solve the stated problem which is how I was thinking of excessive.

I agree that the general idea of allowing users to choose to include or exclude comments is worth discussion along the same lines of large objects and privileges.


> > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
>
> ​A less verbose way to add comments to objects would be nice but we have an
> immediate problem that we either need to solve or document a best practice
> for.

The above would be a solution to the immediate problem in as much as
adding COMMENT IF NOT EXISTS would be.

​I believe it would take a lot longer, possibly even until 12, to get the linked comment feature committed compared ​to committing COMMENT IF NOT EXISTS or some variation (or putting in a hack for that matter).


> COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to
> what seems to me is the underlying problem...that people don't want a
> non-functional (usually...) aspect preventing successful restoration.

I tend to disagree with this characterization- I'm of the opinion that
people are, rightly, confused as to why we bother to try and add a
COMMENT to an object which we didn't actually end up creating (as it
already existed), and then throw an error on it to boot.

My characterization does actually describe the current system though.  While I won't doubt that people do hold your belief it is an underlying mis-understanding as to how PostgreSQL works since comments aren't, as you say below, actual attributes but rather objects in their own right.  I would love to have someone solve the systemic problem here.  But the actual complaint can probably be addressed without it.
 
  Were pg_dump a
bit more intelligent of an application, it would realize that once the
CREATE ... IF NOT EXISTS returned a notice saying "this thing already
existed" that it would realize that it shouldn't try to adjust the
attributes of that object, as it was already existing.

​pg_dump isn't in play during the restore which is where the error occurs.

During restore you have pg_restore+psql - and having cross-statement logic in those applications is likely a non-starter. That is ultimately the problem here, and which is indeed fixed by the outstanding proposal of embedding COMMENT within the CREATE/ALTER object commands.  But today, comments are independent objects and solving the problem within that domain will probably prove easier than changing how the system treats comments.


> COMMENT ON object TRY 'text'  -- i.e., replace the word IS with TRY

Perhaps you could elaborate as to how this is different from IF NOT
EXISTS?


​If the comment on plpgsql were removed for some reason the COMMENT ON IF NOT EXISTS would fire and then it would fail due to permissions.  With "TRY" whether the comment (or object for that matter) exists or not the new comment would be attempted and if the permission failure kicked in it wouldn't care.​
 
 If the
> affected users cannot make that work then maybe we should just remove the
> comment from the extension. 

Removing the comment as a way to deal with our deficiency in this area
strikes me as akin to adding planner hints.

​Maybe, but the proposal you are supporting has been around for years, with people generally in favor of it, and hasn't happened yet.  At some point I'd rather hold my nose and fix the problem myself than wait for the professional to arrive and do it right.  Any, hey, we've had multiple planner hints since 8.4 ;)

David J.

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
David,

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Tue, May 30, 2017 at 8:41 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * David G. Johnston (david.g.johnston@gmail.com) wrote:
> > > On Fri, May 26, 2017 at 7:47 AM, Stephen Frost <sfrost@snowman.net>
> > wrote:
> > > > * Robins Tharakan (tharakan@gmail.com) wrote:
> > > > > Attached is a patch adds a --no-comments argument to pg_dump to skip
> > > > > generation of COMMENT statements when generating a backup. This is
> > > > crucial
> > > > > for non-superusers to restore a database backup in a Single
> > Transaction.
> > > > > Currently, this requires one to remove COMMENTs via scripts, which is
> > > > > inelegant at best.
> > > >
> > > > Having --no-comments seems generally useful to me, in any case.
> > >
> > > I​t smacks of being excessive to me.
> > > ​
> >
> > I have a hard time having an issue with an option that exists to exclude
> > a particular type of object from being in the dump.
>
> ​Excessive with respect to the problem at hand.

Fair enough.  I tend to agree with that sentiment.

> A single comment in the
> dump is unable to be restored.  Because of that we are going to require
> people to omit every comment in their database.  The loss of all that
> information is in excess of what is required to solve the stated problem
> which is how I was thinking of excessive.

That would be most unfortunate, I agree.

> I agree that the general idea of allowing users to choose to include or
> exclude comments is worth discussion along the same lines of large objects
> and privileges.

Good, then we can at least consider this particular feature as being one
we're generally interested in and move forward with it, even if we also,
perhaps, come up with a better solution to the specific issue at hand.

> > > > CREATE EXTENSION IF NOT EXISTS plpgsql ... COMMENT blah;
> > >
> > > ​A less verbose way to add comments to objects would be nice but we have
> > an
> > > immediate problem that we either need to solve or document a best
> > practice
> > > for.
> >
> > The above would be a solution to the immediate problem in as much as
> > adding COMMENT IF NOT EXISTS would be.
>
> ​I believe it would take a lot longer, possibly even until 12, to get the
> linked comment feature committed compared ​to committing COMMENT IF NOT
> EXISTS or some variation (or putting in a hack for that matter).

Perhaps, but I'm not convinced that such speculation really helps move
us forward.

> > > COMMENT IF NOT EXISTS has been brought up but it doesn't actually map to
> > > what seems to me is the underlying problem...that people don't want a
> > > non-functional (usually...) aspect preventing successful restoration.
> >
> > I tend to disagree with this characterization- I'm of the opinion that
> > people are, rightly, confused as to why we bother to try and add a
> > COMMENT to an object which we didn't actually end up creating (as it
> > already existed), and then throw an error on it to boot.
>
> My characterization does actually describe the current system though.

I'm not entirely sure what I was getting at above, to be honest, but I
believe we're generally on the same page here.  I certainly agree that
users don't expect a pg_dump to not be able to be successfully restored.
What I may have been getting at is simply that it's not about a lack of
COMMENT IF NOT EXISTS, in an ideal world, but perhaps that's what we
need to solve this particular issue, for now.

> While I won't doubt that people do hold your belief it is an underlying
> mis-understanding as to how PostgreSQL works since comments aren't, as you
> say below, actual attributes but rather objects in their own right.  I
> would love to have someone solve the systemic problem here.  But the actual
> complaint can probably be addressed without it.

Right, I tend to follow your line of thought here.

> >   Were pg_dump a
> > bit more intelligent of an application, it would realize that once the
> > CREATE ... IF NOT EXISTS returned a notice saying "this thing already
> > existed" that it would realize that it shouldn't try to adjust the
> > attributes of that object, as it was already existing.
>
> ​pg_dump isn't in play during the restore which is where the error occurs.

Ah, but pg_dump could potentially dump out more complicated logic than
it does today.  We currently presume that there is never any need to
reason about the results of a prior command before executing the next in
pg_dump's output.  In some ways, it's rather impressive that we've
gotten this far with that assumption, but ensuring that is the case
means that our users are also able to rely on that and write simple
scripts which can be rerun to reset the database to a specific state.

> During restore you have pg_restore+psql - and having cross-statement logic
> in those applications is likely a non-starter.

It would certainly be a very large shift from what we generate today. :)

> That is ultimately the
> problem here, and which is indeed fixed by the outstanding proposal of
> embedding COMMENT within the CREATE/ALTER object commands.  But today,
> comments are independent objects and solving the problem within that domain
> will probably prove easier than changing how the system treats comments.

I agree that adding COMMENT IF NOT EXISTS (or perhaps COMMENT ... TRY)
would be simpler than changing the syntax for every command to support a
COMMENT attribute being included.  The issue here, however, is what I
mentioned previously- COMMENTs aren't the only attributes of an object
and if we really want to support "CREATE OBJECT + ATTRIBUTES IF NOT
EXISTS, otherwise do nothing" then that approach simply doesn't work,
without heavily modifying our syntax to allow much greater flexibility
than we've ever had before.

That's why I was suggesting that we have a mechanism for passing a set
of sub-commands to be performed on an object *if* we end up creating it,
on the basis of CREATE ... IF NOT EXISTS.

> > > COMMENT ON object TRY 'text'  -- i.e., replace the word IS with TRY
> >
> > Perhaps you could elaborate as to how this is different from IF NOT
> > EXISTS?
>
> ​If the comment on plpgsql were removed for some reason the COMMENT ON IF
> NOT EXISTS would fire and then it would fail due to permissions.  With
> "TRY" whether the comment (or object for that matter) exists or not the new
> comment would be attempted and if the permission failure kicked in it
> wouldn't care.​

Ah, I see that distinction.  I'm wondering how it might relate to other
attributes which an object might have and if we need to have similar
options for each (or perhaps we do?  I've not looked, but I don't recall
anything similar offhand).

> >  If the
> > > affected users cannot make that work then maybe we should just remove the
> > > comment from the extension.
> >
> > Removing the comment as a way to deal with our deficiency in this area
> > strikes me as akin to adding planner hints.
>
> ​Maybe, but the proposal you are supporting has been around for years, with
> people generally in favor of it, and hasn't happened yet.  At some point
> I'd rather hold my nose and fix the problem myself than wait for the
> professional to arrive and do it right.  Any, hey, we've had multiple
> planner hints since 8.4 ;)

That's a fair point, and might allow a quick-fix (I seriously doubt
anyone would really miss the comment we have on plpgsql today), but I'm
going to continue to push back a bit on this as I'd really like a better
solution, even if it's a bit more work.  If such a solution doesn't
materialize before the last CF for PG 11, then I will support a motion
to simply remove the comment.

Thanks!

Stephen

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>> Having --no-comments seems generally useful to me, in any case.
>
> It smacks of being excessive to me.

It sounds perfectly sensible to me.  It's not exactly an elegant
solution to the original problem, but it's a reasonable switch on its
own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
> <david.g.johnston@gmail.com> wrote:
>>> Having --no-comments seems generally useful to me, in any case.

>> It smacks of being excessive to me.

> It sounds perfectly sensible to me.  It's not exactly an elegant
> solution to the original problem, but it's a reasonable switch on its
> own merits.

I dunno.  What's the actual use-case, other than as a bad workaround
to a problem we should fix a different way?
        regards, tom lane



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> >>> Having --no-comments seems generally useful to me, in any case.
>
> >> It smacks of being excessive to me.
>
> > It sounds perfectly sensible to me.  It's not exactly an elegant
> > solution to the original problem, but it's a reasonable switch on its
> > own merits.
>
> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

Perhaps it's a bit of a stretch, I'll admit, but certainly "minmization"
and "obfuscation" come to mind, which are often done in other fields and
might well apply in very specific cases to PG schemas.  I can certainly
also see a case being made that you'd like to extract a schema-only dump
which doesn't include any comments because the comments have information
that you'd rather not share publicly, while the schema itself is fine to
share.  Again, a bit of a stretch, but not unreasonable.

Otherwise, well, for my 2c anyway, feels like it's simply fleshing out
the options which correspond to the different components of an object.
We provide similar for ACLs, security labels, and tablespace
association.  If there are other components of an object which we should
consider adding an option to exclude, I'm all ears, personally
(indexes?).  Also, with the changes that I've made to pg_dump, I'm
hopeful that such options will end up requiring a very minor amount of
code to implement.  There's more work to be done in that area too,
certainly, but I do feel like it's better than it was.

I definitely would like to see more flexibility in this area in general.

Thanks!

Stephen

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Thu, Jun 1, 2017 at 10:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

Well, that's a fair point.  I don't have a specific use case in mind.
However, I also don't think that options for controlling what gets
dumped should be subjected to extreme vetting.  On the strength mostly
of my own experiences trying to solve database problems in the real
world, I generally think that pg_dump could benefit from significantly
more options to control what gets dumped.  The existing options are
pretty good, but it's not that hard to run into a situation that they
don't cover.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Fabrízio Mello
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

It's a very simple change and I have not to complain about source and documentation changes.

But I wonder the lack of tap tests of this new pg_dump behavior. Shouldn't we add tap tests?

The new status of this patch is: Waiting on Author

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
David Fetter
Дата:
On Thu, Jun 01, 2017 at 10:05:09PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> >>> Having --no-comments seems generally useful to me, in any case.
> 
> >> It smacks of being excessive to me.
> 
> > It sounds perfectly sensible to me.  It's not exactly an elegant
> > solution to the original problem, but it's a reasonable switch on
> > its own merits.
> 
> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

The one I run into frequently is in a proprietary fork, RDS Postgres.
It'll happily dump out COMMENT ON EXTENSION plpgsq IS ...
which is great as far as it goes, but errors out when you try to
reload it.

While bending over backwards to support proprietary forks strikes me
as a terrible idea, I'd like to enable pg_dump to produce and consume
ToCs just as pg_restore does with its -l/-L options.  This would
provide the finest possible grain.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Michael Paquier
Дата:
On Tue, Jul 18, 2017 at 3:45 AM, David Fetter <david@fetter.org> wrote:
> The one I run into frequently is in a proprietary fork, RDS Postgres.
> It'll happily dump out COMMENT ON EXTENSION plpgsq IS ...
> which is great as far as it goes, but errors out when you try to
> reload it.
>
> While bending over backwards to support proprietary forks strikes me
> as a terrible idea, I'd like to enable pg_dump to produce and consume
> ToCs just as pg_restore does with its -l/-L options.  This would
> provide the finest possible grain.

Let's add as well a similar switch to pg_dumpall that gets pushed down
to all the created pg_dump commands. If this patch gets integrated
as-is this is going to be asked. And tests would be welcome.
-- 
Michael



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
David Fetter
Дата:
On Tue, Jul 18, 2017 at 08:38:25AM +0200, Michael Paquier wrote:
> On Tue, Jul 18, 2017 at 3:45 AM, David Fetter <david@fetter.org> wrote:
> > The one I run into frequently is in a proprietary fork, RDS Postgres.
> > It'll happily dump out COMMENT ON EXTENSION plpgsq IS ...
> > which is great as far as it goes, but errors out when you try to
> > reload it.
> >
> > While bending over backwards to support proprietary forks strikes me
> > as a terrible idea, I'd like to enable pg_dump to produce and consume
> > ToCs just as pg_restore does with its -l/-L options.  This would
> > provide the finest possible grain.
> 
> Let's add as well a similar switch to pg_dumpall that gets pushed down
> to all the created pg_dump commands. If this patch gets integrated
> as-is this is going to be asked. And tests would be welcome.

Excellent point about pg_dumpall.  I'll see what I can draft up in the
next day or two and report back.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robins Tharakan
Дата:

On 18 July 2017 at 23:55, David Fetter <david@fetter.org> wrote:
Excellent point about pg_dumpall.  I'll see what I can draft up in the
next day or two and report back.


​Hi David,

You may want to consider this patch (attached) which additionally has the pg_dumpall changes.
It would be great if you could help with the tests though, am unsure how to go about them.

-
robins​
Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Fabrízio de Royes Mello
Дата:

On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan <tharakan@gmail.com> wrote:
>
>
> On 18 July 2017 at 23:55, David Fetter <david@fetter.org> wrote:
>>
>> Excellent point about pg_dumpall.  I'll see what I can draft up in the
>> next day or two and report back.
>
>
>
> Hi David,
>
> You may want to consider this patch (attached) which additionally has the pg_dumpall changes.
> It would be great if you could help with the tests though, am unsure how to go about them.
>

You should add the properly sgml docs for this pg_dumpall change also.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Michael Paquier
Дата:
On Wed, Jul 19, 2017 at 8:59 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
> On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan <tharakan@gmail.com> wrote:
>> You may want to consider this patch (attached) which additionally has the
>> pg_dumpall changes.
>> It would be great if you could help with the tests though, am unsure how
>> to go about them.
>
> You should add the properly sgml docs for this pg_dumpall change also.

Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.
--
Michael



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robins Tharakan
Дата:

On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jul 19, 2017 at 8:59 PM,
​​
Fabrízio de Royes Mello

> You should add the properly sgml docs for this pg_dumpall change also.

Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.

​Thanks Michael / 
Fabrízio.

Updated patch (attached) additionally adds SGML changes for pg_dumpall.
(I'll try to work on the tests, but sending this
​​
nonetheless
​).​


-
robins
Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robins Tharakan
Дата:

On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote:

On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jul 19, 2017 at 8:59 PM,
​​
Fabrízio de Royes Mello

> You should add the properly sgml docs for this pg_dumpall change also.

Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
extensions are in src/test/modules/test_pg_dump, but you just care
about the former with this patch. And if you implement some new tests,
look at the other tests and base your work on that.

​Thanks Michael / 
Fabrízio.

Updated patch (attached) additionally adds SGML changes for pg_dumpall.
(I'll try to work on the tests, but sending this
​​
nonetheless
​).​



Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.
(Have zipped it since patch size jumped to ~40kb).

-
robins

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Fabrízio de Royes Mello
Дата:

On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan <tharakan@gmail.com> wrote:
>
> On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote:
>>
>> On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com> wrote:
>>>
>>> On Wed, Jul 19, 2017 at 8:59 PM,
>>> Fabrízio de Royes Mello
>>> > You should add the properly sgml docs for this pg_dumpall change also.
>>>
>>> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
>>> extensions are in src/test/modules/test_pg_dump, but you just care
>>> about the former with this patch. And if you implement some new tests,
>>> look at the other tests and base your work on that.
>>
>>
>> Thanks Michael /
>> Fabrízio.
>>
>> Updated patch (attached) additionally adds SGML changes for pg_dumpall.
>> (I'll try to work on the tests, but sending this
>> nonetheless
>> ).
>>
>
> Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.
> (Have zipped it since patch size jumped to ~40kb).
>

The patch applies cleanly to current master and all tests run without failures.

I also test against all current supported versions (9.2 ... 9.6) and didn't find any issue.

Changed status to "ready for commiter".

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Simon Riggs
Дата:
On 7 August 2017 at 16:14, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
> On Mon, Aug 7, 2017 at 10:43 AM, Robins Tharakan <tharakan@gmail.com> wrote:
>>
>> On 20 July 2017 at 05:14, Robins Tharakan <tharakan@gmail.com> wrote:
>>>
>>> On 20 July 2017 at 05:08, Michael Paquier <michael.paquier@gmail.com>
>>> wrote:
>>>>
>>>> On Wed, Jul 19, 2017 at 8:59 PM,
>>>> Fabrízio de Royes Mello
>>>> > You should add the properly sgml docs for this pg_dumpall change also.
>>>>
>>>> Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in
>>>> extensions are in src/test/modules/test_pg_dump, but you just care
>>>> about the former with this patch. And if you implement some new tests,
>>>> look at the other tests and base your work on that.
>>>
>>>
>>> Thanks Michael /
>>> Fabrízio.
>>>
>>> Updated patch (attached) additionally adds SGML changes for pg_dumpall.
>>> (I'll try to work on the tests, but sending this
>>> nonetheless
>>> ).
>>>
>>
>> Attached is an updated patch (v4) with basic tests for pg_dump /
>> pg_dumpall.
>> (Have zipped it since patch size jumped to ~40kb).
>>
>
> The patch applies cleanly to current master and all tests run without
> failures.
>
> I also test against all current supported versions (9.2 ... 9.6) and didn't
> find any issue.
>
> Changed status to "ready for commiter".

I get the problem, but not this solution. It's too specific and of
zero other value, yet not even exactly specific to the issue. We
definitely don't want --no-extension-comments, but --no-comments
removes ALL comments just to solve a weird problem. (Meta)Data loss,
surely?

Thinking ahead, are we going to add a new --no-objecttype switch every
time someone wants it?

It would make more sense to add something more general and extensible
such as --exclude-objects=comment
or similar name

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
"David G. Johnston"
Дата:
On Mon, Aug 21, 2017 at 2:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> The patch applies cleanly to current master and all tests run without
> failures.
>
> I also test against all current supported versions (9.2 ... 9.6) and didn't
> find any issue.
>
> Changed status to "ready for commiter".

I get the problem, but not this solution. It's too specific and of
zero other value, yet not even exactly specific to the issue. We
definitely don't want --no-extension-comments, but --no-comments
removes ALL comments just to solve a weird problem. (Meta)Data loss,
surely?

​It was argued, successfully IMO, to have this capability independent of any particular problem to be solved.  In that context I haven't given the proposed implementation much thought.

I do agree that this is a very unappealing solution for the stated problem and am hoping someone will either have an ingenious idea to solve it the right way or is willing to hold their nose and put in a hack.

David J.

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Thinking ahead, are we going to add a new --no-objecttype switch every
> time someone wants it?

I'd personally be fine with --no-whatever for any whatever that might
be a subsidiary property of database objects.  We've got
--no-security-labels, --no-tablespaces, --no-owner, and
--no-privileges already, so what's wrong with --no-comments?

(We've also got --no-publications; I think it's arguable whether that
is the same kind of thing.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Michael Paquier
Дата:
On Sat, Sep 2, 2017 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Thinking ahead, are we going to add a new --no-objecttype switch every
>> time someone wants it?
>
> I'd personally be fine with --no-whatever for any whatever that might
> be a subsidiary property of database objects.  We've got
> --no-security-labels, --no-tablespaces, --no-owner, and
> --no-privileges already, so what's wrong with --no-comments?
>
> (We've also got --no-publications; I think it's arguable whether that
> is the same kind of thing.)

And --no-subscriptions in the same bucket.
-- 
Michael



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Simon Riggs
Дата:
On 1 September 2017 at 22:08, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Sat, Sep 2, 2017 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Thinking ahead, are we going to add a new --no-objecttype switch every
>>> time someone wants it?
>>
>> I'd personally be fine with --no-whatever for any whatever that might
>> be a subsidiary property of database objects.  We've got
>> --no-security-labels, --no-tablespaces, --no-owner, and
>> --no-privileges already, so what's wrong with --no-comments?
>>
>> (We've also got --no-publications; I think it's arguable whether that
>> is the same kind of thing.)
>
> And --no-subscriptions in the same bucket.

Yes, it is. I was suggesting that we remove those as well.

But back to the main point which is that --no-comments discards ALL
comments simply to exclude one pointless and annoying comment. That
runs counter to our stance that we do not allow silent data loss. I
want to solve the problem too. I accept that not everyone uses
comments, but if they do, spilling them all on the floor is a user
visible slip up that we should not be encouraging. Sorry y'all.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I'd personally be fine with --no-whatever for any whatever that might
>>> be a subsidiary property of database objects.  We've got
>>> --no-security-labels, --no-tablespaces, --no-owner, and
>>> --no-privileges already, so what's wrong with --no-comments?
>>>
>>> (We've also got --no-publications; I think it's arguable whether that
>>> is the same kind of thing.)
>>
>> And --no-subscriptions in the same bucket.
>
> Yes, it is. I was suggesting that we remove those as well.

That seems like a non-starter to me.  I have used those options many
times to solve real problems, and I'm sure other people have as well.
We wouldn't have ended up with all of these options if users didn't
want to control such things.

> But back to the main point which is that --no-comments discards ALL
> comments simply to exclude one pointless and annoying comment. That
> runs counter to our stance that we do not allow silent data loss. I
> want to solve the problem too. I accept that not everyone uses
> comments, but if they do, spilling them all on the floor is a user
> visible slip up that we should not be encouraging. Sorry y'all.

/me shrugs.

I agree that there could be better solutions to the original problem,
but I think there's no real argument that a user can't exclude
comments from a backup if they wish to do so.  As the OP already
pointed out, it can still be done without the switch; it's just more
annoying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Michael Paquier
Дата:
On Thu, Sep 7, 2017 at 1:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> I'd personally be fine with --no-whatever for any whatever that might
>>>> be a subsidiary property of database objects.  We've got
>>>> --no-security-labels, --no-tablespaces, --no-owner, and
>>>> --no-privileges already, so what's wrong with --no-comments?
>>>>
>>>> (We've also got --no-publications; I think it's arguable whether that
>>>> is the same kind of thing.)
>>>
>>> And --no-subscriptions in the same bucket.
>>
>> Yes, it is. I was suggesting that we remove those as well.

FWIW, I do too. They are useful for given application code paths.

> That seems like a non-starter to me.  I have used those options many
> times to solve real problems, and I'm sure other people have as well.
> We wouldn't have ended up with all of these options if users didn't
> want to control such things.

As there begins to be many switches of this kind and much code
duplication, I think that some refactoring into a more generic switch
infrastructure would be nicer.
-- 
Michael



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Michael,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> As there begins to be many switches of this kind and much code
> duplication, I think that some refactoring into a more generic switch
> infrastructure would be nicer.

I have been thinking about this also and agree that it would be nicer,
but then these options would just become "shorthand" for the generic
switch.

Thanks!

Stephen

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Sun, Sep 10, 2017 at 6:25 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> As there begins to be many switches of this kind and much code
>> duplication, I think that some refactoring into a more generic switch
>> infrastructure would be nicer.
>
> I have been thinking about this also and agree that it would be nicer,
> but then these options would just become "shorthand" for the generic
> switch.

I don't really like the "generic switch infrastructure" concept.  I
think it will make specifying behaviors harder without any
corresponding benefit.  There are quite a few --no-xxx switches now
but the total number of them that we could conceivably end up with
doesn't seem to be a lot bigger than what we have already.

Now, if we want switches to exclude a certain kind of object (e.g.
table, function, text search configuration) from the backup
altogether, that should be done in some generic way, like
--exclude-object-type=table.  But that's not what this is about.  This
is about excluding a certain kind of property (comments, in this case)
from being backed up.  And an individual kind of object doesn't have
many more properties than what we already handle.

So I think this is just an excuse for turning --no-security-labels
into --no-object-property=security-label.  To me, that's just plain
worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Michael Paquier
Дата:
On Mon, Sep 11, 2017 at 11:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> So I think this is just an excuse for turning --no-security-labels
> into --no-object-property=security-label.  To me, that's just plain
> worse.

It does not seem that my thoughts here have been correctly transmitted
to your brain. I do not mean to change the user-facing options, just
to refactor the code internally so as --no-foo switches can be more
easily generated, added and handled as they are associated with an
object type. A portion of the complains is caused by the fact that a
lot of similar code is duplicated.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Mon, Sep 11, 2017 at 6:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 11, 2017 at 11:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> So I think this is just an excuse for turning --no-security-labels
>> into --no-object-property=security-label.  To me, that's just plain
>> worse.
>
> It does not seem that my thoughts here have been correctly transmitted
> to your brain.

Dang, we really should have put more work into that inter-brain link.
PostgreSQL group mind FTW!

> I do not mean to change the user-facing options, just
> to refactor the code internally so as --no-foo switches can be more
> easily generated, added and handled as they are associated with an
> object type. A portion of the complains is caused by the fact that a
> lot of similar code is duplicated.

Ah, well.  No objection to refactoring away duplicate code, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
> I also test against all current supported versions (9.2 ... 9.6) and didn't
> find any issue.
>
> Changed status to "ready for commiter".

On a very fast read this patch looks OK to me, but I'm a bit concerned
about whether we have consensus for it.  By my count the vote is 6-3
in favor of proceeding.

+1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello,
Michael Paquier, Robert Haas
-1: David G. Johnston, Tom Lane, Simon Riggs

I guess that's probably sufficient to go forward, but does anyone wish
to dispute my characterization of their vote or the vote tally in
general?  Would anyone else like to cast a vote?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Robert, all,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> > I also test against all current supported versions (9.2 ... 9.6) and didn't
> > find any issue.
> >
> > Changed status to "ready for commiter".
>
> On a very fast read this patch looks OK to me, but I'm a bit concerned
> about whether we have consensus for it.  By my count the vote is 6-3
> in favor of proceeding.
>
> +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello,
> Michael Paquier, Robert Haas
> -1: David G. Johnston, Tom Lane, Simon Riggs
>
> I guess that's probably sufficient to go forward, but does anyone wish
> to dispute my characterization of their vote or the vote tally in
> general?  Would anyone else like to cast a vote?

This patch continues to drag on, so I'll try to spur it a bit.

David G. Jonston, you are listed as a '-1' on this but your last comment
on this thread was:

CAKFQuwZuYhh+KUzp76cdB4r8Z616ZwEWcHEEDq6vEqgdGP9QZw@mail.gmail.com

Where you commented:

> It was argued, successfully IMO, to have this capability independent of
> any particular problem to be solved.  In that context I haven't given the
> proposed implementation much thought.

> I do agree that this is a very unappealing solution for the stated problem
> and am hoping someone will either have an ingenious idea to solve it the
> right way or is willing to hold their nose and put in a hack.

Which didn't strike me as really being against this patch but rather a
complaint that making use of this feature to "fix" the non-superuser
Single Transaction restore was a hack, which is somewhat indepedent.
We have lots of features that people use in various very hacky ways, but
that doesn't mean those are bad features for us to have.

Simon, you are also a '-1' on this but your last comment also makes me
think you're unhappy with this feature being used in a hacky way but
perhaps aren't against the feature itself:

> But back to the main point which is that --no-comments discards ALL
> comments simply to exclude one pointless and annoying comment. That
> runs counter to our stance that we do not allow silent data loss. I
> want to solve the problem too. I accept that not everyone uses
> comments, but if they do, spilling them all on the floor is a user
> visible slip up that we should not be encouraging. Sorry y'all.

I really can't see accepting this as a data loss issue when this feature
is used only when requested by a user (it would *not* be enabled by
default; I'd be against that too).  We have other options would also be
considered data loss with this viewpoint but I can't imagine we would be
willing to remove them (eg: --no-privileges, or perhaps more clearly
--no-blobs, which was only recently added).

Tom, you're also listed as a '-1' on this but your last comment on this
thread was 16858.1496369109@sss.pgh.pa.us where you asked:

> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

which was answered by both Robert and I; Robert's reply being mainly
that it's likely to prove useful in the field even if we don't have a
specific use-case today, while I outlined at least feasible use-cases
(minimization, obfuscation).

For my 2c, I don't know that a large use-case is needed for this
particular feature to begin with.

In the end, I feel like this patch has actually been through the ringer
and back because it was brought up in the context of solving a problem
that we'd all like to fix in a better way.  Had it been simply dropped
on us as a "I'd like to not have comments in my pg_dump output and
here's a patch to allow me to do that" I suspect it would have been
committed without anywhere near this level of effort.

I'll just reiterate Robert's ask for people to clarify their positions.

If we're not moving forward, then the next step is to mark the patch as
Rejected.

Thanks!

Stephen

Вложения

[HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
"David G. Johnston"
Дата:
On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote:

In the end, I feel like this patch has actually been through the ringer
and back because it was brought up in the context of solving a problem
that we'd all like to fix in a better way.  Had it been simply dropped
on us as a "I'd like to not have comments in my pg_dump output and
here's a patch to allow me to do that" I suspect it would have been
committed without anywhere near this level of effort.

+0; but recognizing our users are going to remain annoyed by the existing problem and that telling them that this is the answer will not sit well.

David J.

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
> On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote:
>> In the end, I feel like this patch has actually been through the ringer
>> and back because it was brought up in the context of solving a problem
>> that we'd all like to fix in a better way.  Had it been simply dropped
>> on us as a "I'd like to not have comments in my pg_dump output and
>> here's a patch to allow me to do that" I suspect it would have been
>> committed without anywhere near this level of effort.

Indeed ... but it was *not* presented that way, and that's what makes
this somewhat of a difficult call.  You and Robert argued that there were
valid use-cases, but I feel like that was somewhat of an after-the-fact
justification, rather than an actual organic feature request.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> +0; but recognizing our users are going to remain annoyed by the existing
> problem and that telling them that this is the answer will not sit well.

FWIW, today's pg_dump refactoring got rid of one of the use-cases for
this, namely the COMMENT ON DATABASE aspect.  We got rid of another aspect
(creating/commenting on the public schema) some time ago, via a method
that was admittedly a bit of a hack but got the job done.  What seems to
be left is that given a default installation, pg_dump will emit a
"COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged
user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is
an utter kluge.  Maybe we can think of a rule that will exclude plpgsql
from dumps without causing too much havoc.

The most obvious hack is just to exclude PL objects with OIDs below 16384
from the dump.  An issue with that is that it'd lose any nondefault
changes to the ACL for plpgsql, which seems like a problem.  On the
other hand, I think the rule we have in place for the public schema is
preventing dumping local adjustments to that, and there have been no
complaints.  (Maybe I'm wrong and the initacl mechanism handles that
case?  If so, seems like we could get it to handle local changes to
plpgsql's ACL as well.)

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > On Monday, January 22, 2018, Stephen Frost <sfrost@snowman.net> wrote:
> >> In the end, I feel like this patch has actually been through the ringer
> >> and back because it was brought up in the context of solving a problem
> >> that we'd all like to fix in a better way.  Had it been simply dropped
> >> on us as a "I'd like to not have comments in my pg_dump output and
> >> here's a patch to allow me to do that" I suspect it would have been
> >> committed without anywhere near this level of effort.
>
> Indeed ... but it was *not* presented that way, and that's what makes
> this somewhat of a difficult call.  You and Robert argued that there were
> valid use-cases, but I feel like that was somewhat of an after-the-fact
> justification, rather than an actual organic feature request.

This was definitely the kind of follow-on work that I was anticipating
happening with the rework that was done to dump_contains in a9f0e8e5a2e.
I would think that we'd support enable/disable of all of the different
component types that pg_dump recognizes, and we're most of the way there
at this point.  The comment above the #define's in pg_dump.h even hints
at that- component types of an object which can be selected for dumping.

> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > +0; but recognizing our users are going to remain annoyed by the existing
> > problem and that telling them that this is the answer will not sit well.
>
> FWIW, today's pg_dump refactoring got rid of one of the use-cases for
> this, namely the COMMENT ON DATABASE aspect.  We got rid of another aspect
> (creating/commenting on the public schema) some time ago, via a method
> that was admittedly a bit of a hack but got the job done.  What seems to
> be left is that given a default installation, pg_dump will emit a
> "COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged
> user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is
> an utter kluge.  Maybe we can think of a rule that will exclude plpgsql
> from dumps without causing too much havoc.

Having a generic --exclude-extension=plpgsql might be interesting..  I
can certainly recall wishing for such an option when working with
postgis.

> The most obvious hack is just to exclude PL objects with OIDs below 16384
> from the dump.  An issue with that is that it'd lose any nondefault
> changes to the ACL for plpgsql, which seems like a problem.  On the
> other hand, I think the rule we have in place for the public schema is
> preventing dumping local adjustments to that, and there have been no
> complaints.  (Maybe I'm wrong and the initacl mechanism handles that
> case?  If so, seems like we could get it to handle local changes to
> plpgsql's ACL as well.)

Both the public schema and plpgsql's ACLs should be handled properly
(with local changes preserved) through the pg_init_privs system.  I'm
not sure what you're referring to regarding a rule preventing dumping
local adjustments to the public schema, as far as I can recall we've
basically always supported that.

Losing non-default ACLs for plpgsql seems like a step backwards.

Frankly, the comment on plpgsql is probably one of the most worthless
comments we have anyway- all it does is re-state information you
already know: it's a procedural language called 'plpgsql'.  I'd
suggest we could probably survive without it, though that is a long
route to "fixing" this, though at least we could tell people it's been
fixed in newer versions and there's a kludge available until then..

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> The most obvious hack is just to exclude PL objects with OIDs below 16384
>> from the dump.  An issue with that is that it'd lose any nondefault
>> changes to the ACL for plpgsql, which seems like a problem.  On the
>> other hand, I think the rule we have in place for the public schema is
>> preventing dumping local adjustments to that, and there have been no
>> complaints.  (Maybe I'm wrong and the initacl mechanism handles that
>> case?  If so, seems like we could get it to handle local changes to
>> plpgsql's ACL as well.)

> Both the public schema and plpgsql's ACLs should be handled properly
> (with local changes preserved) through the pg_init_privs system.  I'm
> not sure what you're referring to regarding a rule preventing dumping
> local adjustments to the public schema, as far as I can recall we've
> basically always supported that.

I went looking and realized that actually what we're interested in here
is the plpgsql extension, not the plpgsql language ... and in fact the
behavior I was thinking of is already there, except for some reason it's
only applied during binary upgrade.  So I think we should give serious
consideration to the attached, which just removes the binary_upgrade
condition (and adjusts nearby comments).  With this, I get empty dumps
for the default state of template1 and postgres, which is what one
really would expect.  And testing shows that if you modify the ACL
of language plpgsql, that does still come through as expected.

(Note: this breaks the parts of the pg_dump regression tests that
expect to see "CREATE LANGUAGE plpgsql".  I've not bothered to update
those, pending the decision on whether to do this.)

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d65ea54..87b15a1 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** selectDumpableAccessMethod(AccessMethodI
*** 1617,1637 ****
   * selectDumpableExtension: policy-setting subroutine
   *        Mark an extension as to be dumped or not
   *
!  * Normally, we dump all extensions, or none of them if include_everything
!  * is false (i.e., a --schema or --table switch was given).  However, in
!  * binary-upgrade mode it's necessary to skip built-in extensions, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
      /*
!      * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to
!      * change permissions on those objects, if they wish to, and have those
!      * changes preserved.
       */
!     if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
          extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
      else
          extinfo->dobj.dump = extinfo->dobj.dump_contains =
--- 1617,1637 ----
   * selectDumpableExtension: policy-setting subroutine
   *        Mark an extension as to be dumped or not
   *
!  * Built-in extensions should be skipped except for checking ACLs, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
+  * We dump all user-added extensions by default, or none of them if
+  * include_everything is false (i.e., a --schema or --table switch was given).
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
      /*
!      * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to
!      * change permissions on their member objects, if they wish to, and have
!      * those changes preserved.
       */
!     if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
          extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
      else
          extinfo->dobj.dump = extinfo->dobj.dump_contains =

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
I wrote:
> I went looking and realized that actually what we're interested in here
> is the plpgsql extension, not the plpgsql language ... and in fact the
> behavior I was thinking of is already there, except for some reason it's
> only applied during binary upgrade.  So I think we should give serious
> consideration to the attached, which just removes the binary_upgrade
> condition (and adjusts nearby comments).

In further testing of that, I noticed that it made the behavior of our
other bugaboo, the public schema, rather inconsistent.  With this
builtin-extensions hack, the plpgsql extension doesn't get dumped,
whether or not you say "clean".  But the public schema does get
dropped and recreated if you say "clean".  That's not helpful for
non-superuser users of pg_dump, so I think we should try to fix it.

Hence, the attached updated patch also makes selection of the public
schema use the DUMP_COMPONENT infrastructure rather than hardwired
code.

I note btw that the removed bit in getNamespaces() is flat out wrong
independently of these other considerations, because it makes the SQL
put into an archive file vary depending on whether --clean was specified
at pg_dump time.  That's not supposed to happen.

I've also updated the regression test this time, as I think this
is committable.

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f55aa36..fb03793 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*************** _printTocEntry(ArchiveHandle *AH, TocEnt
*** 3453,3481 ****
  {
      RestoreOptions *ropt = AH->public.ropt;

-     /*
-      * Avoid dumping the public schema, as it will already be created ...
-      * unless we are using --clean mode (and *not* --create mode), in which
-      * case we've previously issued a DROP for it so we'd better recreate it.
-      *
-      * Likewise for its comment, if any.  (We could try issuing the COMMENT
-      * command anyway; but it'd fail if the restore is done as non-super-user,
-      * so let's not.)
-      *
-      * XXX it looks pretty ugly to hard-wire the public schema like this, but
-      * it sits in a sort of no-mans-land between being a system object and a
-      * user object, so it really is special in a way.
-      */
-     if (!(ropt->dropSchema && !ropt->createDB))
-     {
-         if (strcmp(te->desc, "SCHEMA") == 0 &&
-             strcmp(te->tag, "public") == 0)
-             return;
-         if (strcmp(te->desc, "COMMENT") == 0 &&
-             strcmp(te->tag, "SCHEMA public") == 0)
-             return;
-     }
-
      /* Select owner, schema, and tablespace as necessary */
      _becomeOwner(AH, te);
      _selectOutputSchema(AH, te->namespace);
--- 3453,3458 ----
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d65ea54..50399c9 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** selectDumpableNamespace(NamespaceInfo *n
*** 1407,1412 ****
--- 1407,1425 ----
          /* Other system schemas don't get dumped */
          nsinfo->dobj.dump_contains = nsinfo->dobj.dump = DUMP_COMPONENT_NONE;
      }
+     else if (strcmp(nsinfo->dobj.name, "public") == 0)
+     {
+         /*
+          * The public schema is a strange beast that sits in a sort of
+          * no-mans-land between being a system object and a user object.  We
+          * don't want to dump creation or comment commands for it, because
+          * that complicates matters for non-superuser use of pg_dump.  But we
+          * should dump any ACL changes that have occurred for it, and of
+          * course we should dump contained objects.
+          */
+         nsinfo->dobj.dump = DUMP_COMPONENT_ACL;
+         nsinfo->dobj.dump_contains = DUMP_COMPONENT_ALL;
+     }
      else
          nsinfo->dobj.dump_contains = nsinfo->dobj.dump = DUMP_COMPONENT_ALL;

*************** selectDumpableAccessMethod(AccessMethodI
*** 1617,1637 ****
   * selectDumpableExtension: policy-setting subroutine
   *        Mark an extension as to be dumped or not
   *
!  * Normally, we dump all extensions, or none of them if include_everything
!  * is false (i.e., a --schema or --table switch was given).  However, in
!  * binary-upgrade mode it's necessary to skip built-in extensions, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
      /*
!      * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to
!      * change permissions on those objects, if they wish to, and have those
!      * changes preserved.
       */
!     if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
          extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
      else
          extinfo->dobj.dump = extinfo->dobj.dump_contains =
--- 1630,1650 ----
   * selectDumpableExtension: policy-setting subroutine
   *        Mark an extension as to be dumped or not
   *
!  * Built-in extensions should be skipped except for checking ACLs, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
+  * We dump all user-added extensions by default, or none of them if
+  * include_everything is false (i.e., a --schema or --table switch was given).
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
      /*
!      * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to
!      * change permissions on their member objects, if they wish to, and have
!      * those changes preserved.
       */
!     if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
          extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
      else
          extinfo->dobj.dump = extinfo->dobj.dump_contains =
*************** getNamespaces(Archive *fout, int *numNam
*** 4435,4463 ****
                            init_acl_subquery->data,
                            init_racl_subquery->data);

-         /*
-          * When we are doing a 'clean' run, we will be dropping and recreating
-          * the 'public' schema (the only object which has that kind of
-          * treatment in the backend and which has an entry in pg_init_privs)
-          * and therefore we should not consider any initial privileges in
-          * pg_init_privs in that case.
-          *
-          * See pg_backup_archiver.c:_printTocEntry() for the details on why
-          * the public schema is special in this regard.
-          *
-          * Note that if the public schema is dropped and re-created, this is
-          * essentially a no-op because the new public schema won't have an
-          * entry in pg_init_privs anyway, as the entry will be removed when
-          * the public schema is dropped.
-          *
-          * Further, we have to handle the case where the public schema does
-          * not exist at all.
-          */
-         if (dopt->outputClean)
-             appendPQExpBuffer(query, " AND pip.objoid <> "
-                               "coalesce((select oid from pg_namespace "
-                               "where nspname = 'public'),0)");
-
          appendPQExpBuffer(query, ") ");

          destroyPQExpBuffer(acl_subquery);
--- 4448,4453 ----
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 74730bf..3e9b4d9 100644
*** a/src/bin/pg_dump/t/002_pg_dump.pl
--- b/src/bin/pg_dump/t/002_pg_dump.pl
*************** qr/^ALTER (?!EVENT TRIGGER|LARGE OBJECT|
*** 1548,1577 ****
          all_runs  => 1,
          catch_all => 'COMMENT commands',
          regexp    => qr/^COMMENT ON EXTENSION plpgsql IS .*;/m,
!         like      => {
              clean                    => 1,
              clean_if_exists          => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
              exclude_test_table       => 1,
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
-             no_privs                 => 1,
              no_owner                 => 1,
              pg_dumpall_dbprivs       => 1,
              schema_only              => 1,
              section_pre_data         => 1,
!             with_oids                => 1, },
!         unlike => {
!             binary_upgrade         => 1,
!             column_inserts         => 1,
!             data_only              => 1,
!             only_dump_test_schema  => 1,
!             only_dump_test_table   => 1,
!             role                   => 1,
!             section_post_data      => 1,
!             test_schema_plus_blobs => 1, }, },

      'COMMENT ON TABLE dump_test.test_table' => {
          all_runs     => 1,
--- 1548,1578 ----
          all_runs  => 1,
          catch_all => 'COMMENT commands',
          regexp    => qr/^COMMENT ON EXTENSION plpgsql IS .*;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
!         unlike => {
!             binary_upgrade           => 1,
              clean                    => 1,
              clean_if_exists          => 1,
+             column_inserts           => 1,
              createdb                 => 1,
+             data_only                => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
              exclude_test_table       => 1,
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
              no_owner                 => 1,
+             no_privs                 => 1,
+             only_dump_test_schema    => 1,
+             only_dump_test_table     => 1,
              pg_dumpall_dbprivs       => 1,
+             role                     => 1,
              schema_only              => 1,
+             section_post_data        => 1,
              section_pre_data         => 1,
!             test_schema_plus_blobs   => 1,
!             with_oids                => 1, }, },

      'COMMENT ON TABLE dump_test.test_table' => {
          all_runs     => 1,
*************** qr/CREATE CAST \(timestamp with time zon
*** 2751,2783 ****
          regexp   => qr/^
              \QCREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;\E
              /xm,
!         like => {
              clean                    => 1,
              clean_if_exists          => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
              exclude_test_table       => 1,
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
-             no_privs                 => 1,
              no_owner                 => 1,
!             pg_dumpall_dbprivs       => 1,
!             schema_only              => 1,
!             section_pre_data         => 1,
!             with_oids                => 1, },
!         unlike => {
!             binary_upgrade           => 1,
!             column_inserts           => 1,
!             data_only                => 1,
              only_dump_test_schema    => 1,
              only_dump_test_table     => 1,
              pg_dumpall_globals       => 1,
              pg_dumpall_globals_clean => 1,
              role                     => 1,
              section_data             => 1,
              section_post_data        => 1,
!             test_schema_plus_blobs   => 1, }, },

      'CREATE AGGREGATE dump_test.newavg' => {
          all_runs     => 1,
--- 2752,2785 ----
          regexp   => qr/^
              \QCREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;\E
              /xm,
!         # this shouldn't ever get emitted anymore
!         like => {},
!         unlike => {
!             binary_upgrade           => 1,
              clean                    => 1,
              clean_if_exists          => 1,
+             column_inserts           => 1,
              createdb                 => 1,
+             data_only                => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
              exclude_test_table       => 1,
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
              no_owner                 => 1,
!             no_privs                 => 1,
              only_dump_test_schema    => 1,
              only_dump_test_table     => 1,
+             pg_dumpall_dbprivs       => 1,
              pg_dumpall_globals       => 1,
              pg_dumpall_globals_clean => 1,
              role                     => 1,
+             schema_only              => 1,
              section_data             => 1,
              section_post_data        => 1,
!             section_pre_data         => 1,
!             test_schema_plus_blobs   => 1,
!             with_oids                => 1, }, },

      'CREATE AGGREGATE dump_test.newavg' => {
          all_runs     => 1,
*************** qr/CREATE TRANSFORM FOR integer LANGUAGE
*** 4565,4575 ****
          all_runs  => 1,
          catch_all => 'CREATE ... commands',
          regexp    => qr/^CREATE SCHEMA public;/m,
!         like      => {
!             clean           => 1,
!             clean_if_exists => 1, },
          unlike => {
              binary_upgrade           => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_test_table       => 1,
--- 4567,4578 ----
          all_runs  => 1,
          catch_all => 'CREATE ... commands',
          regexp    => qr/^CREATE SCHEMA public;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
          unlike => {
              binary_upgrade           => 1,
+             clean                    => 1,
+             clean_if_exists          => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_test_table       => 1,
*************** qr/CREATE TRANSFORM FOR integer LANGUAGE
*** 5395,5402 ****
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP SCHEMA public;/m,
!         like      => { clean => 1 },
          unlike    => {
              clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

--- 5398,5407 ----
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP SCHEMA public;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
          unlike    => {
+             clean                    => 1,
              clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

*************** qr/CREATE TRANSFORM FOR integer LANGUAGE
*** 5404,5420 ****
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP SCHEMA IF EXISTS public;/m,
!         like      => { clean_if_exists => 1 },
          unlike    => {
              clean                    => 1,
              pg_dumpall_globals_clean => 1, }, },

      'DROP EXTENSION plpgsql' => {
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP EXTENSION plpgsql;/m,
!         like      => { clean => 1, },
          unlike    => {
              clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

--- 5409,5429 ----
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP SCHEMA IF EXISTS public;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
          unlike    => {
              clean                    => 1,
+             clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

      'DROP EXTENSION plpgsql' => {
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP EXTENSION plpgsql;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
          unlike    => {
+             clean => 1,
              clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

*************** qr/CREATE TRANSFORM FOR integer LANGUAGE
*** 5494,5502 ****
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP EXTENSION IF EXISTS plpgsql;/m,
!         like      => { clean_if_exists => 1, },
          unlike    => {
              clean                    => 1,
              pg_dumpall_globals_clean => 1, }, },

      'DROP FUNCTION IF EXISTS dump_test.pltestlang_call_handler()' => {
--- 5503,5513 ----
          all_runs  => 1,
          catch_all => 'DROP ... commands',
          regexp    => qr/^DROP EXTENSION IF EXISTS plpgsql;/m,
!         # this shouldn't ever get emitted anymore
!         like      => {},
          unlike    => {
              clean                    => 1,
+             clean_if_exists          => 1,
              pg_dumpall_globals_clean => 1, }, },

      'DROP FUNCTION IF EXISTS dump_test.pltestlang_call_handler()' => {
*************** qr/^GRANT SELECT ON TABLE measurement_y2
*** 6264,6274 ****
              \Q--\E\n\n
              \QGRANT USAGE ON SCHEMA public TO PUBLIC;\E
              /xm,
!         like => {
!             clean           => 1,
!             clean_if_exists => 1, },
          unlike => {
              binary_upgrade           => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
--- 6275,6286 ----
              \Q--\E\n\n
              \QGRANT USAGE ON SCHEMA public TO PUBLIC;\E
              /xm,
!         # this shouldn't ever get emitted anymore
!         like => {},
          unlike => {
              binary_upgrade           => 1,
+             clean                    => 1,
+             clean_if_exists          => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
*************** qr/^GRANT SELECT ON TABLE measurement_y2
*** 6537,6542 ****
--- 6549,6556 ----
              /xm,
          like => {
              binary_upgrade           => 1,
+             clean                    => 1,
+             clean_if_exists          => 1,
              createdb                 => 1,
              defaults                 => 1,
              exclude_dump_test_schema => 1,
*************** qr/^GRANT SELECT ON TABLE measurement_y2
*** 6549,6556 ****
              section_pre_data         => 1,
              with_oids                => 1, },
          unlike => {
-             clean                    => 1,
-             clean_if_exists          => 1,
              only_dump_test_schema    => 1,
              only_dump_test_table     => 1,
              pg_dumpall_globals_clean => 1,
--- 6563,6568 ----
*************** qr/^GRANT SELECT ON TABLE measurement_y2
*** 6576,6593 ****
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
              no_owner                 => 1,
              pg_dumpall_dbprivs       => 1,
              schema_only              => 1,
              section_pre_data         => 1,
              with_oids                => 1, },
          unlike => {
-             only_dump_test_schema    => 1,
-             only_dump_test_table     => 1,
              pg_dumpall_globals_clean => 1,
-             role                     => 1,
              section_data             => 1,
!             section_post_data        => 1,
!             test_schema_plus_blobs   => 1, }, },

      'REVOKE commands' => {    # catch-all for REVOKE commands
          all_runs => 0,               # catch-all
--- 6588,6605 ----
              exclude_test_table_data  => 1,
              no_blobs                 => 1,
              no_owner                 => 1,
+             only_dump_test_schema    => 1,
+             only_dump_test_table     => 1,
              pg_dumpall_dbprivs       => 1,
+             role                     => 1,
              schema_only              => 1,
              section_pre_data         => 1,
+             test_schema_plus_blobs   => 1,
              with_oids                => 1, },
          unlike => {
              pg_dumpall_globals_clean => 1,
              section_data             => 1,
!             section_post_data        => 1, }, },

      'REVOKE commands' => {    # catch-all for REVOKE commands
          all_runs => 0,               # catch-all

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I wrote:
> > I went looking and realized that actually what we're interested in here
> > is the plpgsql extension, not the plpgsql language ... and in fact the
> > behavior I was thinking of is already there, except for some reason it's
> > only applied during binary upgrade.  So I think we should give serious
> > consideration to the attached, which just removes the binary_upgrade
> > condition (and adjusts nearby comments).
>
> In further testing of that, I noticed that it made the behavior of our
> other bugaboo, the public schema, rather inconsistent.  With this
> builtin-extensions hack, the plpgsql extension doesn't get dumped,
> whether or not you say "clean".  But the public schema does get
> dropped and recreated if you say "clean".  That's not helpful for
> non-superuser users of pg_dump, so I think we should try to fix it.

I'm not entirely sure about trying to also support --clean for
non-superusers..  We've long had that the public schema is dropped and
recreated with --clean and it seems likely that at least some users are
depending on us doing that.  In any case, it's certainly not a change
that I think we could backpatch.  Perhaps we could just change it moving
forward (which would make me happier, really, since what I think we do
with these initdb-time things currently is a bit bizarre).

> Hence, the attached updated patch also makes selection of the public
> schema use the DUMP_COMPONENT infrastructure rather than hardwired
> code.
>
> I note btw that the removed bit in getNamespaces() is flat out wrong
> independently of these other considerations, because it makes the SQL
> put into an archive file vary depending on whether --clean was specified
> at pg_dump time.  That's not supposed to happen.

Yes, having that in getNamespaces() isn't correct but we need to do
something there, and I've been trying to figure out what.  The reason
it's there in the first place is that if you do --clean then the public
schema is dropped and recreated *but* the initdb-time ACL isn't put back
for it (and there's no ACL entries in the dump file, be it text or
custom, if the user didn't change the ACL from the initdb-time one).

What I was planning to do there was somehow inject the initdb-time ACL
for public into the set of things to restore when we're asked to do a
restore from a custom-format (or directory or other type which is
handled by pg_restore) dump.  We have to account for someone asking for
pg_dump --clean --no-privileges also though.  We would still need to do
something for text-based output though..

Note that this fix really needs to be back-patched and that we had
complaints about the --clean issue with the public schema on text-based
pg_dump's (which is why the hack was added to getNamespaces() in the
first place) and, more recently, for non-text based pg_dump's.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> In further testing of that, I noticed that it made the behavior of our
>> other bugaboo, the public schema, rather inconsistent.  With this
>> builtin-extensions hack, the plpgsql extension doesn't get dumped,
>> whether or not you say "clean".  But the public schema does get
>> dropped and recreated if you say "clean".  That's not helpful for
>> non-superuser users of pg_dump, so I think we should try to fix it.

> I'm not entirely sure about trying to also support --clean for
> non-superusers..  We've long had that the public schema is dropped and
> recreated with --clean and it seems likely that at least some users are
> depending on us doing that.  In any case, it's certainly not a change
> that I think we could backpatch.  Perhaps we could just change it moving
> forward (which would make me happier, really, since what I think we do
> with these initdb-time things currently is a bit bizarre).

Sure, I was not proposing this for back-patch --- it depends on the
other stuff we've committed recently, anyway.

> Yes, having that in getNamespaces() isn't correct but we need to do
> something there, and I've been trying to figure out what.

I claim this is what ;-)

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> In further testing of that, I noticed that it made the behavior of our
> >> other bugaboo, the public schema, rather inconsistent.  With this
> >> builtin-extensions hack, the plpgsql extension doesn't get dumped,
> >> whether or not you say "clean".  But the public schema does get
> >> dropped and recreated if you say "clean".  That's not helpful for
> >> non-superuser users of pg_dump, so I think we should try to fix it.
>
> > I'm not entirely sure about trying to also support --clean for
> > non-superusers..  We've long had that the public schema is dropped and
> > recreated with --clean and it seems likely that at least some users are
> > depending on us doing that.  In any case, it's certainly not a change
> > that I think we could backpatch.  Perhaps we could just change it moving
> > forward (which would make me happier, really, since what I think we do
> > with these initdb-time things currently is a bit bizarre).
>
> Sure, I was not proposing this for back-patch --- it depends on the
> other stuff we've committed recently, anyway.
>
> > Yes, having that in getNamespaces() isn't correct but we need to do
> > something there, and I've been trying to figure out what.
>
> I claim this is what ;-)

Well, that would make things in v11 better, certainly.  I suppose for
back-patching, I can just go make the change to pg_restore that I
outlined and that would deal with the complaints we've gotten there.

I'm afraid we may still get some push-back from existing users of
--clean since, with the change you're proposing, we wouldn't be cleaning
up anything that's been done to the public schema when it comes to
comment changes or ACL changes, right?

A typical use-case of pg_dump with --clean being to 'reset' a system
after, say, dumping out some subset of a production system and then
loading it into the development environment that all of the devs have
full access to, and where there might have been changes to the 'public'
that you want to revert (to get it back to looking like how 'prod' was).
In current versions this should result in at least the ACLs on public
matching what they are on prod, along with the comment and any other
changes done to it, but we would lose that if we stop doing drop/create
of the public schema on --clean.

Then again, the DROP SCHEMA will fail if any objects end up created in
the public schema anyway, so it isn't like that's a complete solution
regardless.  I suppose it's a bit surprising that we haven't been asked
for a --clean-cascade option.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I'm afraid we may still get some push-back from existing users of
> --clean since, with the change you're proposing, we wouldn't be cleaning
> up anything that's been done to the public schema when it comes to
> comment changes or ACL changes, right?

No, if you have a nondefault ACL, that will still get applied.  This
arrangement would drop comment changes, but I can't get excited about
that; it's certainly far less of an inconvenience in that scenario
than dumping the comment is in non-superuser-restore scenarios.

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I'm afraid we may still get some push-back from existing users of
> > --clean since, with the change you're proposing, we wouldn't be cleaning
> > up anything that's been done to the public schema when it comes to
> > comment changes or ACL changes, right?
>
> No, if you have a nondefault ACL, that will still get applied.  This
> arrangement would drop comment changes, but I can't get excited about
> that; it's certainly far less of an inconvenience in that scenario
> than dumping the comment is in non-superuser-restore scenarios.

That nondefault ACL from the system the pg_dump was run on will get
applied *over-top* of whatever the current ACL on the system that the
restore is being run on, which may or may not be what's expected.

That's different from the case where the public schema is dropped and
recreated because, in that case, the schema will start out with an
empty ACL and the resulting ACL should end up matching what was on the
source system (ignoring the current issue with a non-clean pg_dump into
a custom format dump being used with a pg_restore --clean).

Just to be clear, I'm not suggesting that's a huge deal, but it's
certainly a change and something we should at least recognize and
contemplate.

I'm also not really worried about losing the comment.  I definitely
don't think it's worth putting in the kind of infrastructure that we put
in for init ACLs to handle comments being changed on initdb-time
objects.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> No, if you have a nondefault ACL, that will still get applied.  This
>> arrangement would drop comment changes, but I can't get excited about
>> that; it's certainly far less of an inconvenience in that scenario
>> than dumping the comment is in non-superuser-restore scenarios.

> That nondefault ACL from the system the pg_dump was run on will get
> applied *over-top* of whatever the current ACL on the system that the
> restore is being run on, which may or may not be what's expected.

Fair point, but doesn't it apply equally to non-default ACLs on any
other system objects?  If you tweaked the permissions on say pg_ls_dir(),
then dump, then tweak them some more, you're going to get uncertain
results if you load that dump back into this database ... with or without
--clean, because we certainly aren't going to drop pinned objects.

I think we could jigger things so that we dump the definition of these
special quasi-system objects only if their ACLs are not default, but
it's not clear to me that that's really an improvement in the long run.
Seems like it's just making them even wartier.

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> No, if you have a nondefault ACL, that will still get applied.  This
> >> arrangement would drop comment changes, but I can't get excited about
> >> that; it's certainly far less of an inconvenience in that scenario
> >> than dumping the comment is in non-superuser-restore scenarios.
>
> > That nondefault ACL from the system the pg_dump was run on will get
> > applied *over-top* of whatever the current ACL on the system that the
> > restore is being run on, which may or may not be what's expected.
>
> Fair point, but doesn't it apply equally to non-default ACLs on any
> other system objects?  If you tweaked the permissions on say pg_ls_dir(),
> then dump, then tweak them some more, you're going to get uncertain
> results if you load that dump back into this database ... with or without
> --clean, because we certainly aren't going to drop pinned objects.

Yes, that's certainly true, the public schema is the only "special"
animal in this regard and making it less special (and more like
pg_ls_dir()) would definitely be nice.

> I think we could jigger things so that we dump the definition of these
> special quasi-system objects only if their ACLs are not default, but
> it's not clear to me that that's really an improvement in the long run.
> Seems like it's just making them even wartier.

Yeah, that would be worse, I agree.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Fair point, but doesn't it apply equally to non-default ACLs on any
>> other system objects?  If you tweaked the permissions on say pg_ls_dir(),
>> then dump, then tweak them some more, you're going to get uncertain
>> results if you load that dump back into this database ... with or without
>> --clean, because we certainly aren't going to drop pinned objects.

> Yes, that's certainly true, the public schema is the only "special"
> animal in this regard and making it less special (and more like
> pg_ls_dir()) would definitely be nice.

I wonder if it'd be worth the trouble to invent a variant of REVOKE
that means "restore this object's permissions to default" --- that is,
either the ACL recorded in pg_init_privs if there is one, or NULL if
there's no pg_init_privs entry.  Then you could imagine pg_dump emitting
that command before trying to assign an ACL to any object it hadn't
created earlier in the run, rather than guessing about the current state
of the object's ACL.  (I'm not volunteering.)

>> I think we could jigger things so that we dump the definition of these
>> special quasi-system objects only if their ACLs are not default, but
>> it's not clear to me that that's really an improvement in the long run.
>> Seems like it's just making them even wartier.

> Yeah, that would be worse, I agree.

So are we at a consensus yet?

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> Fair point, but doesn't it apply equally to non-default ACLs on any
> >> other system objects?  If you tweaked the permissions on say pg_ls_dir(),
> >> then dump, then tweak them some more, you're going to get uncertain
> >> results if you load that dump back into this database ... with or without
> >> --clean, because we certainly aren't going to drop pinned objects.
>
> > Yes, that's certainly true, the public schema is the only "special"
> > animal in this regard and making it less special (and more like
> > pg_ls_dir()) would definitely be nice.
>
> I wonder if it'd be worth the trouble to invent a variant of REVOKE
> that means "restore this object's permissions to default" --- that is,
> either the ACL recorded in pg_init_privs if there is one, or NULL if
> there's no pg_init_privs entry.  Then you could imagine pg_dump emitting
> that command before trying to assign an ACL to any object it hadn't
> created earlier in the run, rather than guessing about the current state
> of the object's ACL.  (I'm not volunteering.)

I actually like that idea quite a bit..  Not really high on my priority
list though.

> >> I think we could jigger things so that we dump the definition of these
> >> special quasi-system objects only if their ACLs are not default, but
> >> it's not clear to me that that's really an improvement in the long run.
> >> Seems like it's just making them even wartier.
>
> > Yeah, that would be worse, I agree.
>
> So are we at a consensus yet?

You had me at "make public less special", I was just trying to make sure
we all understand what that means.

+1 from me for moving forward.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> So are we at a consensus yet?

> You had me at "make public less special", I was just trying to make sure
> we all understand what that means.
> +1 from me for moving forward.

Applying this patch will leave us with the original pg_dump misbehavior
removed, AFAICS.  So now we are hard up against the question of whether
--no-comments can support its weight as a pure feature, rather than a
workaround for a misbehavior/bug.

I've been pretty much on the negative side of that question, but
I just posted a patch here:
https://postgr.es/m/32668.1516848577@sss.pgh.pa.us
that will have the effect of causing pg_restore to print ACLs,
comments, and seclabels in cases where it formerly didn't.
If you want to get back to the old behavior in those cases,
we have you covered with --no-acl and --no-security-labels ...
but not so much for comments.

Between that and Robert's point that we have --no-foo for every
other subsidiary object property, so why not comments, I'm prepared
to change my vote.

This isn't a positive review of the contents of the patch,
because I've not read it.  But I'm on board with the goal now.

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Robins Tharakan <tharakan@gmail.com> writes:
> Attached is an updated patch (v4) with basic tests for pg_dump / pg_dumpall.

I've reviewed and pushed this, after making some changes so that the
switch works in pg_restore as well, and minor cosmetic adjustments.

The changes in t/002_pg_dump.pl largely failed to apply, which is
partially due to the age of the patch but IMO speaks more to the
unmaintainability of that TAP test.  It still didn't run after I'd
manually fixed the merge failures, so I gave up in disgust and
did not push any of the test changes.  If someone is excited enough
about testing this, they can submit a followon patch for that,
but I judge it not really worth either the human effort or the
future testing cycles.

(Am I right in thinking that 002_pg_dump.pl is, by design, roughly
O(N^2) in the number of features tested?  Ick.)

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> The changes in t/002_pg_dump.pl largely failed to apply, which is
> partially due to the age of the patch but IMO speaks more to the
> unmaintainability of that TAP test.  It still didn't run after I'd
> manually fixed the merge failures, so I gave up in disgust and
> did not push any of the test changes.  If someone is excited enough
> about testing this, they can submit a followon patch for that,
> but I judge it not really worth either the human effort or the
> future testing cycles.
> 
> (Am I right in thinking that 002_pg_dump.pl is, by design, roughly
> O(N^2) in the number of features tested?  Ick.)

Yeah, that 002 test is pretty nasty stuff.  I think we only put up with
it because it's the only idea we've come up with; maybe there are better
ideas.

Crazy idea: maybe a large fraction of that test could be replaced with
comparisons of the "pg_restore -l" output file rather than pg_dump's
text output (i.e. the TOC entry for each object, rather than the
object's textual representation.)  Sounds easier in code than current
implementation.  Separately, verify that textual representation for each
TOC entry type is what we expect.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Alvaro, Tom,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Tom Lane wrote:
>
> > The changes in t/002_pg_dump.pl largely failed to apply, which is
> > partially due to the age of the patch but IMO speaks more to the
> > unmaintainability of that TAP test.  It still didn't run after I'd
> > manually fixed the merge failures, so I gave up in disgust and
> > did not push any of the test changes.  If someone is excited enough
> > about testing this, they can submit a followon patch for that,
> > but I judge it not really worth either the human effort or the
> > future testing cycles.
> >
> > (Am I right in thinking that 002_pg_dump.pl is, by design, roughly
> > O(N^2) in the number of features tested?  Ick.)

It certainly tries to cover a lot of combinations, but I did try to
write it to not completely cross everything against everything but
instead to try and cover the different code paths.

> Yeah, that 002 test is pretty nasty stuff.  I think we only put up with
> it because it's the only idea we've come up with; maybe there are better
> ideas.

I've discussed the 'better idea' for it previously, which is to
essentially break out the various parts of the big perl hash into
multilpe independent files that would be a lot easier to work with and
manage, and perhaps make them not be perl hashes but something else.  In
other words, make the core perl code similar to pg_regress and the
input/output files likewise similar to how pg_regress works.  Maybe we
should rework the whole thing to work with pg_regress directly but we'd
need to teach it about regexp's, I'd think..

> Crazy idea: maybe a large fraction of that test could be replaced with
> comparisons of the "pg_restore -l" output file rather than pg_dump's
> text output (i.e. the TOC entry for each object, rather than the
> object's textual representation.)  Sounds easier in code than current
> implementation.  Separately, verify that textual representation for each
> TOC entry type is what we expect.

I'm not sure how that's different..?  We do check that the textual
representation is what we expect, both directly from pg_dump and from
pg_restore output, and using the exact same code to check both (which I
generally think is a good thing since we want the results from both to
more-or-less match up).  What you're proposing here sounds like we're
throwing that away in favor of keeping all the same code to test the
textual representation but then only checking for listed contents from
pg_restore instead of checking that the textual representation is
correct.  That doesn't actually reduce the amount of code though..

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Alvaro Herrera
Дата:
Stephen Frost wrote:
> Alvaro, Tom,
> 
> * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

> > Crazy idea: maybe a large fraction of that test could be replaced with
> > comparisons of the "pg_restore -l" output file rather than pg_dump's
> > text output (i.e. the TOC entry for each object, rather than the
> > object's textual representation.)  Sounds easier in code than current
> > implementation.  Separately, verify that textual representation for each
> > TOC entry type is what we expect.
> 
> I'm not sure how that's different..?  We do check that the textual
> representation is what we expect, both directly from pg_dump and from
> pg_restore output, and using the exact same code to check both (which I
> generally think is a good thing since we want the results from both to
> more-or-less match up).  What you're proposing here sounds like we're
> throwing that away in favor of keeping all the same code to test the
> textual representation but then only checking for listed contents from
> pg_restore instead of checking that the textual representation is
> correct.  That doesn't actually reduce the amount of code though..

Well, the current implementation compares a dozen of pg_dump output text
files, three hundred lines apiece, against a thousand of regexes (give
or take).  Whenever there is a mismatch, what you get is "this regexp
failed to match <three hundred lines>" (or sometimes "matched when it
should have not"), so looking for the mismatch is quite annoying.

My proposal is that instead of looking at three hundred lines, you'd
look for 50 lines of `pg_restore -l` output -- is element XYZ in there
or not.  Quite a bit simpler for the guy adding a new test.  This tests
combinations of pg_dump switches: are we dumping the right set of
objects.

*Separately* test that each individual TOC entry type ("table data",
"index", "tablespace") is dumped as this or that SQL command, where you
create a separate dump file for each object type.  So you match a single
TOC entry to a dozen lines of SQL, half of them comments -- pretty easy
to see where a mismatch is.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Well, the current implementation compares a dozen of pg_dump output text
> files, three hundred lines apiece, against a thousand of regexes (give
> or take).  Whenever there is a mismatch, what you get is "this regexp
> failed to match <three hundred lines>" (or sometimes "matched when it
> should have not"), so looking for the mismatch is quite annoying.

Yeah, I've been getting my nose rubbed in that over the past couple
days of pg_dump hacking: when you get a failure, the output is really
unfriendly, even by the seriously low standards of the rest of our
TAP tests.

I'm not sure if Alvaro's idea is the best way to improve that, but
it's something to think about.

The thing that was annoying me though is that it seems like every
test case interacts with every other test case, or at least way more
of them than one could wish, due to the construction of that big
hash constant.  That's what I find unmaintainable.

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Alvaro,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Stephen Frost wrote:
> > Alvaro, Tom,
> >
> > * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
>
> > > Crazy idea: maybe a large fraction of that test could be replaced with
> > > comparisons of the "pg_restore -l" output file rather than pg_dump's
> > > text output (i.e. the TOC entry for each object, rather than the
> > > object's textual representation.)  Sounds easier in code than current
> > > implementation.  Separately, verify that textual representation for each
> > > TOC entry type is what we expect.
> >
> > I'm not sure how that's different..?  We do check that the textual
> > representation is what we expect, both directly from pg_dump and from
> > pg_restore output, and using the exact same code to check both (which I
> > generally think is a good thing since we want the results from both to
> > more-or-less match up).  What you're proposing here sounds like we're
> > throwing that away in favor of keeping all the same code to test the
> > textual representation but then only checking for listed contents from
> > pg_restore instead of checking that the textual representation is
> > correct.  That doesn't actually reduce the amount of code though..
>
> Well, the current implementation compares a dozen of pg_dump output text
> files, three hundred lines apiece, against a thousand of regexes (give
> or take).  Whenever there is a mismatch, what you get is "this regexp
> failed to match <three hundred lines>" (or sometimes "matched when it
> should have not"), so looking for the mismatch is quite annoying.

Yeah, the big output isn't fun, I agree with that.

> My proposal is that instead of looking at three hundred lines, you'd
> look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> or not.  Quite a bit simpler for the guy adding a new test.  This tests
> combinations of pg_dump switches: are we dumping the right set of
> objects.

That might be possible to do, though I'm not sure that it'd really be
only 50 lines.

> *Separately* test that each individual TOC entry type ("table data",
> "index", "tablespace") is dumped as this or that SQL command, where you
> create a separate dump file for each object type.  So you match a single
> TOC entry to a dozen lines of SQL, half of them comments -- pretty easy
> to see where a mismatch is.

Yikes.  If you're thinking that we would call pg_restore independently
for each object, I'd be worried about the runtime increasing quite a
bit.  There's also a few cases where we check dependencies between
objects that we'd need to be mindful of, though those we might be able
to take care of, if we can keep the runtime down.  Certainly, part of
the way the existing code works was specifically to try and minimize the
runtime.  Perhaps the approach taken of minimizing the invocations and
building a bunch of regexps to run against the resulting output is more
expensive than running pg_restore a bunch of times, but I'd want to see
that to be sure as some really simple timings locally on a 6.5k -Fc dump
file seems to take 30-40ms just to produce the TOC.

This also doesn't really help with the big perl hash, but these two
ideas don't seem to conflict, so perhaps we could possibly still tackle
the big perl hash with the approach I described up-thread and also find
a way to implement this, if it doesn't cause the regression tests to be
too much slower.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> My proposal is that instead of looking at three hundred lines, you'd
> look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> or not.  Quite a bit simpler for the guy adding a new test.  This tests
> combinations of pg_dump switches: are we dumping the right set of
> objects.

My counter-proposal is that we remove the test entirely.  It looks
like an unmaintainable and undocumented mess to me, and I doubt
whether the testing value is sufficient to justify the effort of
updating it every time anyone wants to change something in pg_dump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > My proposal is that instead of looking at three hundred lines, you'd
> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
> > combinations of pg_dump switches: are we dumping the right set of
> > objects.
>
> My counter-proposal is that we remove the test entirely.  It looks
> like an unmaintainable and undocumented mess to me, and I doubt
> whether the testing value is sufficient to justify the effort of
> updating it every time anyone wants to change something in pg_dump.

Considering it turned up multiple serious bugs, particularly in the
binary upgrade path, I can't disagree more.  If you have a counter
proposal which actually results in better test coverage, that'd be
fantastic, but I wholly reject the notion that we should be considering
reducing our test coverage of pg_dump.

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> > My proposal is that instead of looking at three hundred lines, you'd
>> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
>> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
>> > combinations of pg_dump switches: are we dumping the right set of
>> > objects.
>>
>> My counter-proposal is that we remove the test entirely.  It looks
>> like an unmaintainable and undocumented mess to me, and I doubt
>> whether the testing value is sufficient to justify the effort of
>> updating it every time anyone wants to change something in pg_dump.
>
> Considering it turned up multiple serious bugs, particularly in the
> binary upgrade path, I can't disagree more.  If you have a counter
> proposal which actually results in better test coverage, that'd be
> fantastic, but I wholly reject the notion that we should be considering
> reducing our test coverage of pg_dump.

I figured you would, but it's still my opinion.  I guess my basic
objection here is to the idea that we somehow know that the 6000+ line
test case file actually contains only correct tests.  That vastly
exceeds the ability of any normal human being to verify correctness,
especially given what's already been said about the interdependencies
between different parts of the file and the lack of adequate
documentation.

For example, I notice that the file contains 166 copies of
only_dump_test_schema  => 1 (with 4 different variations as to how
much whitespace is included). Some of those are in the "like" clause
and some are in the "unlike" clause.  If one of them were misplaced,
and pg_dump is actually producing the wrong output, the two errors
would cancel out and I suspect nobody would notice.  If somebody makes
a change to pg_dump that changes which things get dumped when --schema
is used, then a lot of those lines would need to moved around.  That
would be a huge nuisance.  If the author of such a patch just stuck
those lines where they make the tests pass, they might fail to notice
if one of them were actually in the wrong place, and a bug would go
undiscovered.  Some people probably have the mental stamina to audit
166 separate cases and make sure that each one is properly positioned,
but some people might not; and that's really just one test of many.
Really, every pg_dump change that alters behavior needs to
individually reconsider the position of every one of ~6000 lines in
this file, and nobody is really going to do that.

There's some rule that articulates what the effect of --schema is
supposed to be.  I don't know the exact rule, but it's probably
something like "global objects shouldn't get dumped and schema objects
should only get dumped if they're in that schema".  That rule ought to
be encoded into this file in some recognizable form.  It's encoded
there, but only in the positioning of hundreds of separate lines of
code that look identical but must be positioned differently based on a
human programmer's interpretation of how that rule applies to each
object type.  That's not a good way of implementing the rule; nobody
can look at this and say "oh, well I changed the rules for --schema,
so here's which things need to be updated".  You're not going to be
able to look at the diff somebody produces and have any idea whether
it's right, or at least not without a lot of very time-consuming
manual verification.  If you had just saved the output of pg_dump in a
file (maybe with OIDs and other variable bits stripped out) and
compared the new output against the old, it would give you just as
much code coverage but probably be a lot easier to maintain.  If you
had implemented the rules programmatically instead of encoding a giant
manually precomputed data structure in the test case file it would be
a lot more clearly correct and again easier to maintain.

I think you've chosen a terrible design and ought to throw the whole
thing away and start over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Robert Haas (robertmhaas@gmail.com) wrote:
> >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >> > My proposal is that instead of looking at three hundred lines, you'd
> >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> >> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
> >> > combinations of pg_dump switches: are we dumping the right set of
> >> > objects.
> >>
> >> My counter-proposal is that we remove the test entirely.  It looks
> >> like an unmaintainable and undocumented mess to me, and I doubt
> >> whether the testing value is sufficient to justify the effort of
> >> updating it every time anyone wants to change something in pg_dump.
> >
> > Considering it turned up multiple serious bugs, particularly in the
> > binary upgrade path, I can't disagree more.  If you have a counter
> > proposal which actually results in better test coverage, that'd be
> > fantastic, but I wholly reject the notion that we should be considering
> > reducing our test coverage of pg_dump.
>
> I figured you would, but it's still my opinion.  I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests.  That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.

I don't mean to discount your opinion at all, but I'm quite concerned
about the code coverage of pg_dump.  I certainly agree that the testing
of pg_dump can and should be improved and I'd like to find time to work
on that, but simply throwing this out strikes me as a step backwards,
not forwards, even for its difficulty.

> For example, I notice that the file contains 166 copies of
> only_dump_test_schema  => 1 (with 4 different variations as to how
> much whitespace is included). Some of those are in the "like" clause
> and some are in the "unlike" clause.  If one of them were misplaced,
> and pg_dump is actually producing the wrong output, the two errors
> would cancel out and I suspect nobody would notice.  If somebody makes
> a change to pg_dump that changes which things get dumped when --schema
> is used, then a lot of those lines would need to moved around.  That
> would be a huge nuisance.  If the author of such a patch just stuck
> those lines where they make the tests pass, they might fail to notice
> if one of them were actually in the wrong place, and a bug would go
> undiscovered.  Some people probably have the mental stamina to audit
> 166 separate cases and make sure that each one is properly positioned,
> but some people might not; and that's really just one test of many.
> Really, every pg_dump change that alters behavior needs to
> individually reconsider the position of every one of ~6000 lines in
> this file, and nobody is really going to do that.
>
> There's some rule that articulates what the effect of --schema is
> supposed to be.  I don't know the exact rule, but it's probably
> something like "global objects shouldn't get dumped and schema objects
> should only get dumped if they're in that schema".  That rule ought to
> be encoded into this file in some recognizable form.  It's encoded
> there, but only in the positioning of hundreds of separate lines of
> code that look identical but must be positioned differently based on a
> human programmer's interpretation of how that rule applies to each
> object type.  That's not a good way of implementing the rule; nobody
> can look at this and say "oh, well I changed the rules for --schema,
> so here's which things need to be updated".  You're not going to be
> able to look at the diff somebody produces and have any idea whether
> it's right, or at least not without a lot of very time-consuming
> manual verification.  If you had just saved the output of pg_dump in a
> file (maybe with OIDs and other variable bits stripped out) and
> compared the new output against the old, it would give you just as
> much code coverage but probably be a lot easier to maintain.  If you
> had implemented the rules programmatically instead of encoding a giant
> manually precomputed data structure in the test case file it would be
> a lot more clearly correct and again easier to maintain.

The existing code already has some of these type of rules encoded in it,
specifically to reduce the number of like/unlike cases where it's
possible to do so- see the discussion of catch-all tests, described on
about line 282.  While there may be other such rules which can be
encoded, I don't believe they'd end up being nearly as clean as you're
suggesting unless we go through and make changes to pg_dump itself to
consistently behave according to those rules.  Perhaps that's a way to
go, I'll admit that I was focusing more on making sure that the
documented behavior is what pg_dump was actually doing, and so I hadn't
considered how we could simplify the testing if pg_dump was stipulated
to operate in a more specific manner.

The notion of using the TOC and then splitting up the output as
suggested by Alvaro seems like a good approach to me.  Perhaps there's a
way to do that and to also encode more rules about how pg_dump is
expected to operate.

> I think you've chosen a terrible design and ought to throw the whole
> thing away and start over.

I'll all for throwing away the existing test once we've got something
that covers at least what it does (ideally more, of course).

Thanks!

Stephen

Вложения

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I figured you would, but it's still my opinion.  I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests.  That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.

Yeah, that's a problem.  In the last two times I touched that file,
I just moved things between "like" and "unlike" categories until the
test passed.  If there were anything useful it had to tell me, it was a
complete failure at doing so.  I frankly won't even think about adding
new test cases to it, either.

I don't know how to make it better exactly, but I concur with Robert that
that test needs fundamental redesign of some kind to be maintainable.

            regards, tom lane


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Robert Haas
Дата:
On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> I think you've chosen a terrible design and ought to throw the whole
>> thing away and start over.
>
> I'll all for throwing away the existing test once we've got something
> that covers at least what it does (ideally more, of course).

I'm for throwing this away now.  It's a nuisance for other people to
maintain, and as Tom's reply makes clear (and it matches my
suspicions), they are maintaining it without really knowing whether
the updates are making are *correct*, just knowing that they *make the
tests pass*.  It's nice to make things turn green on the code coverage
report, but if we're not really verifying that the results are
correct, we're just kidding ourselves.  We'd get the same amount of
green on the code coverage report by running the pg_dump commands and
sending the output to /dev/null, and it would be a lot less work to
keep up to date.

I'm glad this helped you find some bugs.  It is only worth keeping if
it prevents other hackers from introducing bugs in the future.  I
doubt that it will have that effect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

От
Stephen Frost
Дата:
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost <sfrost@snowman.net> wrote:
> >> I think you've chosen a terrible design and ought to throw the whole
> >> thing away and start over.
> >
> > I'll all for throwing away the existing test once we've got something
> > that covers at least what it does (ideally more, of course).
>
> I'm for throwing this away now.  It's a nuisance for other people to
> maintain, and as Tom's reply makes clear (and it matches my
> suspicions), they are maintaining it without really knowing whether
> the updates are making are *correct*, just knowing that they *make the
> tests pass*.  It's nice to make things turn green on the code coverage
> report, but if we're not really verifying that the results are
> correct, we're just kidding ourselves.  We'd get the same amount of
> green on the code coverage report by running the pg_dump commands and
> sending the output to /dev/null, and it would be a lot less work to
> keep up to date.

There's not much to argue about if committers are simply hacking away at
it without actually considering if the test are correct or not.  I
suspect it's not quite as bad as being outright wrong- if individuals
are at least testing their changes thoroughly first and then making sure
that the tests pass then it's at least more likely that what's recorded
in the test suite is actually correct, but that's not great.  Further,
having contributors include updates to the test suite which are then
willfully thrown away is outright bad.

If that's worse than not having this test coverage for pg_dump, I'm not
sure.  What strikes me as likely is that removing this test from pg_dump
will just result in bugs being introduced because, for as much as
updating these tests are painful, actually testing all of the different
variations of pg_dump by hand for even a simple change is quite a bit of
work too.  I don't have any doubt that the reason bugs were found with
this is because there were permutations of pg_dump that weren't being
tested, either by individuals making the change or through the other
regression tests we have.  Indeed, the tests included were specifically
to get code coverage of cases which weren't already being tested.

> I'm glad this helped you find some bugs.  It is only worth keeping if
> it prevents other hackers from introducing bugs in the future.  I
> doubt that it will have that effect.

I'm not convinced that it won't, but I agree that we want something that
everyone will feel comfortable in maintaining and improving moving
forward.  I'll work on it and either submit a rework to make it easier
to maintain or a patch to remove it before the next CF.

Thanks!

Stephen

Вложения