Обсуждение: The missing pg_get_*def functions

Поиск
Список
Период
Сортировка

The missing pg_get_*def functions

От
Joel Jacobson
Дата:
Existing:

pg_get_constraintdef
pg_get_indexdef
pg_get_viewdef
pg_get_triggerdef
pg_get_functiondef
pg_get_ruledef

Missing:

pg_get_ts_templatedef
pg_get_ts_parserdef
pg_get_ts_configdef
pg_get_ts_dictdef
pg_get_databasedef
pg_get_namespacedef
pg_get_languagedef
pg_get_conversiondef
pg_get_castdef
pg_get_amprocdef
pg_get_operatordef
pg_get_amopdef
pg_get_opfamilydef
pg_get_opclassdef
pg_get_tabledef
pg_get_sequencedef
pg_get_typedef
pg_get_attrdef

I'm planning to implement these. Is there any reason why the remaining functions cannot be written as SQL functions instead of C? The input data to the C-functions in pg_dump.c comes from pg_catalog anyway, so I thought it would be a lot easier just to construct the definitions in SQL where you have convenient access to the pg_catalog.


Re: The missing pg_get_*def functions

От
Noah Misch
Дата:
On Mon, Apr 29, 2013 at 10:16:22PM +0100, Joel Jacobson wrote:
> Existing:
> 
> pg_get_constraintdef
> pg_get_indexdef
> pg_get_viewdef
> pg_get_triggerdef
> pg_get_functiondef
> pg_get_ruledef
> 
> Missing:
> 
> pg_get_ts_templatedef
> pg_get_ts_parserdef
> pg_get_ts_configdef
> pg_get_ts_dictdef
> pg_get_databasedef
> pg_get_namespacedef
> pg_get_languagedef
> pg_get_conversiondef
> pg_get_castdef
> pg_get_amprocdef
> pg_get_operatordef
> pg_get_amopdef
> pg_get_opfamilydef
> pg_get_opclassdef
> pg_get_tabledef
> pg_get_sequencedef
> pg_get_typedef
> pg_get_attrdef
> 
> I'm planning to implement these. Is there any reason why the remaining
> functions cannot be written as SQL functions instead of C? The input data
> to the C-functions in pg_dump.c comes from pg_catalog anyway, so I thought
> it would be a lot easier just to construct the definitions in SQL where you
> have convenient access to the pg_catalog.

Note that while the sql procedural language is fair game, plpgsql currently is
not.  We install it by default, but the DBA is free to drop it.

Those existing functions give a mostly-SnapshotNow picture of their objects,
but an sql-language implementation would give a normally-snapshotted picture.
That status quo is perhaps more an implementation accident than a designed
behavior.  Before proliferating functions like this, we should pick a snapshot
policy and stick to it.  See the block comment at the top of pg_dump.c.

Note also that minor releases can readily fix bugs in C-language functions,
but we have no infrastructure to update sql-language functions after initdb.
That flexibility is unfortunate to lose, particularly for something that
pg_dump depends on.  Now, the right thing is probably to design a mechanism
for applying simple catalog updates in concert with a minor release.  In the
mean time, its absence puts the sql PL at a nontrivial disadvantage here.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: The missing pg_get_*def functions

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Note also that minor releases can readily fix bugs in C-language functions,
> but we have no infrastructure to update sql-language functions after initdb.
> That flexibility is unfortunate to lose, particularly for something that
> pg_dump depends on.

That alone would probably be sufficient reason why we would never allow
pg_dump to depend on any such thing (not that I see a compelling
argument for it to do so anyway...).

The long and the short of it here is that there isn't any very good
reason to migrate any of the existing pg_dump-side functionality into
server-side functions, and especially not server-side functions that
aren't in C.  One of the things that we frequently recommend when doing
upgrades is that you do the dump with the newer version's pg_dump, so
as to get the benefits of any bug fixes that are in it.  The more
dump functionality is on the server side, the less opportunity we have
to repair things that way.

It may be that the functions Joel proposes are worth having for other
tools to use, but I'm not in favor of making pg_dump use them.
        regards, tom lane



Re: The missing pg_get_*def functions

От
Robert Haas
Дата:
On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The long and the short of it here is that there isn't any very good
> reason to migrate any of the existing pg_dump-side functionality into
> server-side functions, ....

There are a number of good reasons to do just that, which have been
previously discussed.

> and especially not server-side functions that
> aren't in C.

+1.

> One of the things that we frequently recommend when doing
> upgrades is that you do the dump with the newer version's pg_dump, so
> as to get the benefits of any bug fixes that are in it.  The more
> dump functionality is on the server side, the less opportunity we have
> to repair things that way.

But why wouldn't we be able to fix the version in the server, if it
turns out to be buggy?  I suppose we wouldn't fix bugs discovered
after EOL, but I'm not sure that's a sufficient objection.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The missing pg_get_*def functions

От
Joel Jacobson
Дата:
On Tue, Apr 30, 2013 at 12:46 AM, Noah Misch <noah@leadboat.com> wrote:
> Note that while the sql procedural language is fair game, plpgsql currently is
> not.  We install it by default, but the DBA is free to drop it.

Right. The "sql" procedural language was what I had in mind.

> Those existing functions give a mostly-SnapshotNow picture of their objects,
> but an sql-language implementation would give a normally-snapshotted picture.

I assume "normally" is better than "mostly"?

> That status quo is perhaps more an implementation accident than a designed
> behavior.  Before proliferating functions like this, we should pick a snapshot
> policy and stick to it.  See the block comment at the top of pg_dump.c.

I didn't think there would be any reason to migrate the existing
functions from C to SQL, but this snapshot problem seems like a good
motive to do it. If they would all be written in SQL, the snapshot
problem would be solved, right?

> Note also that minor releases can readily fix bugs in C-language functions,
> but we have no infrastructure to update sql-language functions after initdb.
> That flexibility is unfortunate to lose, particularly for something that
> pg_dump depends on.  Now, the right thing is probably to design a mechanism
> for applying simple catalog updates in concert with a minor release.  In the
> mean time, its absence puts the sql PL at a nontrivial disadvantage here.

What do you mean with "infrastructure"? Isn't it as simple as CREATE
OR REPLACE FUNCTION? As long as the interface the pg_get_*def
functions don't change, I cannot see how simply replacing the existing
functions in a minor release upgrade could do any harm.



Re: The missing pg_get_*def functions

От
Joel Jacobson
Дата:
On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Noah Misch <noah@leadboat.com> writes:
>> Note also that minor releases can readily fix bugs in C-language functions,
>> but we have no infrastructure to update sql-language functions after initdb.
>> That flexibility is unfortunate to lose, particularly for something that
>> pg_dump depends on.
>
> That alone would probably be sufficient reason why we would never allow
> pg_dump to depend on any such thing (not that I see a compelling
> argument for it to do so anyway...).

It would be better to find a way to update sql-language functions in
minor upgrades, instead of shutting that door entirely for all future
implementation ideas involving sql-language functions in the
pg_catalog.

> The long and the short of it here is that there isn't any very good
> reason to migrate any of the existing pg_dump-side functionality into
> server-side functions, and especially not server-side functions that
> aren't in C.  One of the things that we frequently recommend when doing
> upgrades is that you do the dump with the newer version's pg_dump, so
> as to get the benefits of any bug fixes that are in it.  The more
> dump functionality is on the server side, the less opportunity we have
> to repair things that way.

There are two very good reasons in particular of doing exactly that:

a) Make it possible to develop tools which need to be able or create
(and/or drop) schema objects, such as a schema migration tool, capable
of migrating between different revisions of the schema. I started
working on such a project two years ago, but realised most of the
actual code I wrote was code to define how to create/drop objects, and
in what order based on pg_depend, and I thought such code would be
better to put in the core, and put the project on hold awaiting a
complete coverage of the pg_get_*def functions, but they never
happened, so here we are two years later. This is the project I'm
talking about: https://github.com/gluefinance/pov

b) Reduce the amount of hard-coded sql queries in pg_dump.c. Won't
help for previous versions, but perhaps the sql queries to support
older versions can be hidden in some separately included file at least
so you don't have to see it, which would make sense since the sql for
the old versions won't change and won't need the same amount of active
development as the code for the latest version. Makes no sense of
keeping the sql for all versions at the same place.

> It may be that the functions Joel proposes are worth having for other
> tools to use, but I'm not in favor of making pg_dump use them.

I don't agree with you as I don't think it's a very hard problem to
deal with updates of sql-language functions in the pg_catalog schema.
If we fix that first, this and a whole range of problems, where the
solution involves taking in-data from pg_catalog, and producing
out-data based on that, becomes super easy to achieve by writing
simple sql-language functions with direct and simple access to the
pg_catalog, instead of writing C functions with inline hard-coded sql.



Re: The missing pg_get_*def functions

От
Andrew Dunstan
Дата:
On 04/29/2013 11:47 PM, Joel Jacobson wrote:
>> Note also that minor releases can readily fix bugs in C-language functions,
>> but we have no infrastructure to update sql-language functions after initdb.
>> That flexibility is unfortunate to lose, particularly for something that
>> pg_dump depends on.  Now, the right thing is probably to design a mechanism
>> for applying simple catalog updates in concert with a minor release.  In the
>> mean time, its absence puts the sql PL at a nontrivial disadvantage here.
> What do you mean with "infrastructure"? Isn't it as simple as CREATE
> OR REPLACE FUNCTION? As long as the interface the pg_get_*def
> functions don't change, I cannot see how simply replacing the existing
> functions in a minor release upgrade could do any harm.
>


Minor releases are supposed not to require any such operations. You 
should normally be able to drop the binaries in place and restart. For C 
language functions that is indeed all you have to do, but that's not the 
case for SQL language functions, where the definition is contained in 
the catalogs, not the binary.

If all you want is SQL language functions, there is nothing to stop you 
from writing them and publishing them today as an extension.

cheers

andrew



Re: The missing pg_get_*def functions

От
Joel Jacobson
Дата:
On Tue, Apr 30, 2013 at 11:22 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Minor releases are supposed not to require any such operations. You should
> normally be able to drop the binaries in place and restart. For C language
> functions that is indeed all you have to do, but that's not the case for SQL
> language functions, where the definition is contained in the catalogs, not
> the binary.

Minor releases don't even require pg_dump/pg_restore, so I really
don't see the issue.

Nothing in the versioning policy* appears to be in conflict with fixing bugs in
pg_catalog sql-language functions. Maybe it's an unwritten rule minor versions
mustn't affect the pg_catalog. I thought the rule was to require no
data relational
modifications of pg_catalog between minor versions, but simply modifying or
adding functions, without breaking their interfaces, is not a modification of
the pg_catalog as I see it, as it couldn't affect any old code, since the tables
and columns would be untouched. Old code couldn't be affected of new functions,
or the inner-details of existing pg_catalog functions, as long as they
don't change
the interface, i.e., the IN/OUT parameters.

I would actually want to go a bit further and propose to move the sql queries
within the "else if (fout->remoteVersion >= ...) { ... }" statements also for
_old_ versions, and put them in sql-language functions, which, if you would
use a future version of pg_dump to dump an old version, would install
sql-language functions for the old version in pg_catalog, so they could
be called by the new pg_dump.

That way we could get rid of these kind of sql queries for _all_ versions,
and not just for future versions.

These functions would need to adhere to exactly the same interface,
where the OUT params must fit perfectly with the args passed to ArchiveEntry().

The existing pg_get_*def() are not comprehensive enough, as they still require
a lot of sql code to extract various other necessary fields.

*) http://www.postgresql.org/support/versioning/

> If all you want is SQL language functions, there is nothing to stop you from
> writing them and publishing them today as an extension.

Yes, maybe I will, as a proof-of-concept and test of how complex or simple it
would be, and how many lines of code pg_dump.c could be reduced with.



Re: The missing pg_get_*def functions

От
Andrew Dunstan
Дата:
On 04/30/2013 07:34 AM, Joel Jacobson wrote:
> On Tue, Apr 30, 2013 at 11:22 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Minor releases are supposed not to require any such operations. You should
>> normally be able to drop the binaries in place and restart. For C language
>> functions that is indeed all you have to do, but that's not the case for SQL
>> language functions, where the definition is contained in the catalogs, not
>> the binary.
> Minor releases don't even require pg_dump/pg_restore, so I really
> don't see the issue.


You have already been told what the issue is. There is no provision for 
updating the catalogs. Our users expect to be able to drop a minor 
upgrade binary in place and have nothing more to do. It's what they are 
used to and I at least have no interest in changing that. If you change 
the definition of provided SQL language functions you would need to 
apply those changes to each database in each cluster. Otherwise, a user 
will think they are on version X which has a fix for function Y and in 
fact they won't have the fix. That's a recipe for utter confusion.

All the existing pg_catalog SQL functions are one liners, apart from 
ts_debug().

>> If all you want is SQL language functions, there is nothing to stop you from
>> writing them and publishing them today as an extension.
> Yes, maybe I will, as a proof-of-concept and test of how complex or simple it
> would be, and how many lines of code pg_dump.c could be reduced with.
>


pg_dump does bulk operations in many cases, and one way we could make it 
faster would be to increase that, not reduce it (see comments where we 
get table attributes in pg_dump.c, for example). Providing singleton 
operations like this will not help it at all.

If your aim is to be able to replace all the code pg_dump runs by these 
singleton operations then I think it's almost certainly doomed to failure.

Having undertaken some of the exercise, I can also assure you that 
writing pg_get_table_def() in a single pure SQL statement will be .... 
challenging.

There is a case for having functions like these, but probably not for 
use by pg_dump, and I suspect they would best be done in C.

cheers

andrew




Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
Tom, all,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> It may be that the functions Joel proposes are worth having for other
> tools to use, but I'm not in favor of making pg_dump use them.

I care very little about migrating the pg_dump functionality into
server-side functions and a great deal about *having* such functionality
available server-side.  We've been around this time and time again:
there are use-cases beyond pg_dump for being able to get the definition
of an object.  Having to first figure out and then replicate what pg_dump
(or psql) does is no trivial feat.

In short, I think I agree w/ Tom here (though I contend that there *are*
such use-cases, not that 'it may be'.. :).  Let's drop the discussion
about changing pg_dump and/or psql and instead simply go implement these
functions.  We can revisit the pg_dump discussion 5 or 10 years down the
road, after we've seen how well these functions work and what uses they
are put to outside of our (relatively small) world.
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > One of the things that we frequently recommend when doing
> > upgrades is that you do the dump with the newer version's pg_dump, so
> > as to get the benefits of any bug fixes that are in it.  The more
> > dump functionality is on the server side, the less opportunity we have
> > to repair things that way.
>
> But why wouldn't we be able to fix the version in the server, if it
> turns out to be buggy?  I suppose we wouldn't fix bugs discovered
> after EOL, but I'm not sure that's a sufficient objection.

There are other things beyond bugs here..  Changes in reserved keywords
is actually the biggest reason, ime, to use the newer pg_dump when
you're trying to move to a newer PG version.  I don't think we'd want to
either go to quoteing everything (yuck), or having a point release
suddenly change what gets quoted and what doesn't in a pg_dump..
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
* Joel Jacobson (joel@trustly.com) wrote:
> On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That alone would probably be sufficient reason why we would never allow
> > pg_dump to depend on any such thing (not that I see a compelling
> > argument for it to do so anyway...).
>
> It would be better to find a way to update sql-language functions in
> minor upgrades, instead of shutting that door entirely for all future
> implementation ideas involving sql-language functions in the
> pg_catalog.

Go for it? :)  I don't think you'll find much disagreement here, but
it's no trivial thing to do either..  Perhaps some kind of one-time
startup "script" that gets run?  Or maybe an internal identifier that
tracks the version of the catalog and runs a script which updates it on
first start when there are changes to the catalog?  But what about any
dependencies which exist?  Are we confident enough in the existing
infrastructure for 'create-or-replace' that, if we don't change the API,
it'll just replace the contents and we won't hurt user objects?

> > The long and the short of it here is that there isn't any very good
> > reason to migrate any of the existing pg_dump-side functionality into
> > server-side functions, and especially not server-side functions that
> > aren't in C.  One of the things that we frequently recommend when doing
> > upgrades is that you do the dump with the newer version's pg_dump, so
> > as to get the benefits of any bug fixes that are in it.  The more
> > dump functionality is on the server side, the less opportunity we have
> > to repair things that way.
>
> There are two very good reasons in particular of doing exactly that:

[... good reasons to have pg_dump-like capability in the server ...]

There's a difference between 'migrate' and 'implement' that I'm not sure
you saw..  I agree w/ Tom that there's probably no really good reason to
*remove* the existing code in pg_dump.  Adding code which is similar to
the server is quite a different thing and, as Tom commented later, might
be useful for other tools to use.

> b) Reduce the amount of hard-coded sql queries in pg_dump.c.

Unless you're hacking on pg_dump quite a bit (which, if you are, you
probably need to understand/see the queries that it's making anyway), I
don't see why this is actually a problem..?  I'm thinking that you've
simply had to be in the guts of pg_dump enough, as some others of us
have also, that you grow tired of having to extract out the useful bits
from it.  If those bits are replicated as clean functions in the
backend, you won't need to be groking around in pg_dump anymore.. :)

> Won't
> help for previous versions, but perhaps the sql queries to support
> older versions can be hidden in some separately included file at least
> so you don't have to see it, which would make sense since the sql for
> the old versions won't change and won't need the same amount of active
> development as the code for the latest version. Makes no sense of
> keeping the sql for all versions at the same place.

I'd be all for a rework of pg_dump which more cleanly sets up the SQL
for each server version- feel free to work on such. :)

> > It may be that the functions Joel proposes are worth having for other
> > tools to use, but I'm not in favor of making pg_dump use them.
>
> I don't agree with you as I don't think it's a very hard problem to
> deal with updates of sql-language functions in the pg_catalog schema.

The above doesn't really seem like a response to the piece quoted, so
I'll respond to just what you wrote- I agree.  Let's figure out a way to
update sql-language functions.  Once we have that, it'll certainly open
up the door to having more of them.  I don't think we can punt on this
as a "well, we'll do it when we need it"; we've needed it in the past
(as I recall, we've had to tell people in the release notes to hack the
catalog for some issue or another at least once in most major revs..)
and it'd be great to have the problem solved.

> If we fix that first, this and a whole range of problems, where the
> solution involves taking in-data from pg_catalog, and producing
> out-data based on that, becomes super easy to achieve by writing
> simple sql-language functions with direct and simple access to the
> pg_catalog, instead of writing C functions with inline hard-coded sql.

Sounds fantastic.
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Joel Jacobson
Дата:
On Tue, Apr 30, 2013 at 2:12 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Go for it? :)  I don't think you'll find much disagreement here, but
> it's no trivial thing to do either..  Perhaps some kind of one-time
> startup "script" that gets run?  Or maybe an internal identifier that

> The above doesn't really seem like a response to the piece quoted, so
> I'll respond to just what you wrote- I agree.  Let's figure out a way to
> update sql-language functions.  Once we have that, it'll certainly open
> up the door to having more of them.  I don't think we can punt on this
> as a "well, we'll do it when we need it"; we've needed it in the past
> (as I recall, we've had to tell people in the release notes to hack the
> catalog for some issue or another at least once in most major revs..)
> and it'd be great to have the problem solved.

> Sounds fantastic.

Thanks for the motivational speech! :) I'll start working on it today!



Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
* Joel Jacobson (joel@trustly.com) wrote:
> Thanks for the motivational speech! :) I'll start working on it today!

Great, but you should really come up with an initial design and get
feedback on it before you start coding up something. :)  I've outlined a
few very, very high-level ideas about what could be done, but there's a
*lot* of details that would need to be worked out here.  You might also
ask some of the packagers (eg: Martin Pitt w/ Ubuntu/Debian) about how
they've handled point-upgrades which have required catalog hackery.
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Andres Freund
Дата:
On 2013-04-30 05:14:15 +0100, Joel Jacobson wrote:
> On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Noah Misch <noah@leadboat.com> writes:
> >> Note also that minor releases can readily fix bugs in C-language functions,
> >> but we have no infrastructure to update sql-language functions after initdb.
> >> That flexibility is unfortunate to lose, particularly for something that
> >> pg_dump depends on.
> >
> > That alone would probably be sufficient reason why we would never allow
> > pg_dump to depend on any such thing (not that I see a compelling
> > argument for it to do so anyway...).
> 
> It would be better to find a way to update sql-language functions in
> minor upgrades, instead of shutting that door entirely for all future
> implementation ideas involving sql-language functions in the
> pg_catalog.

I'd be very careful with jumping on this task. I am pretty sure its a
very good way to get very, very frustrated if you don't present a widely
accepted design beforehand. Doing this correctly is *far far* from easy.

Just a little collection of problems:
* You need to connect to all databases, not just one. There's no infrastructure for this.
* You need to do the update *before* allowing any external connections. Otherwise the feature won't be useful to fix
actualproblems. Again, there is no infrastructure for this.
 
* You need to do it in a way that a) doesn't slow down normal startup b) doesn't break if the update has only been
appliedto 9999 of 10000 databases.
 


FWIW I really, really doubt you can do all the functions referred to
upthread sensibly and correctly from SQL functions. The infrastructure
to do this just isn't there.

Besides, I really don't buy this helping pg_dump. I think the far more
realistic course here is to put some parts that are required by pg_dump
and by such functions in some common library that then can *also* be
used by such backend functions. But thats a humongous task that, besides
deep technical knowledge in lots of areas, requires quite a bit of
politics.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: The missing pg_get_*def functions

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> If all you want is SQL language functions, there is nothing to stop you 
> from writing them and publishing them today as an extension.

It's worth noting also that we actually *have* infrastructure for
updating extensions without initdb; unlike the initial contents of
pg_proc.  So this approach is more attractive than it might seem
on its face, assuming you are going to do this as SQL functions.
(I share the doubts expressed elsewhere as to how feasible that
actually is.)
        regards, tom lane



Re: The missing pg_get_*def functions

От
Andres Freund
Дата:
On 2013-04-30 15:57:02 +0200, Andres Freund wrote:
> On 2013-04-30 05:14:15 +0100, Joel Jacobson wrote:
> > On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Noah Misch <noah@leadboat.com> writes:
> > >> Note also that minor releases can readily fix bugs in C-language functions,
> > >> but we have no infrastructure to update sql-language functions after initdb.
> > >> That flexibility is unfortunate to lose, particularly for something that
> > >> pg_dump depends on.
> > >
> > > That alone would probably be sufficient reason why we would never allow
> > > pg_dump to depend on any such thing (not that I see a compelling
> > > argument for it to do so anyway...).
> > 
> > It would be better to find a way to update sql-language functions in
> > minor upgrades, instead of shutting that door entirely for all future
> > implementation ideas involving sql-language functions in the
> > pg_catalog.
> 
> I'd be very careful with jumping on this task. I am pretty sure its a
> very good way to get very, very frustrated if you don't present a widely
> accepted design beforehand. Doing this correctly is *far far* from easy.
> 
> Just a little collection of problems:
> * You need to connect to all databases, not just one. There's no
>   infrastructure for this.
> * You need to do the update *before* allowing any external
>   connections. Otherwise the feature won't be useful to fix actual
>   problems. Again, there is no infrastructure for this.
> * You need to do it in a way that a) doesn't slow down normal startup b)
>   doesn't break if the update has only been applied to 9999 of 10000
>   databases.

Another rather fundamental problem:

This obviously cannot be done directly on a standby. So either we cannot
rely on those updates having been performed or you need to update the
standby in lockstep with the primary. Neither seems acceptable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: The missing pg_get_*def functions

От
Robert Haas
Дата:
On Tue, Apr 30, 2013 at 9:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > One of the things that we frequently recommend when doing
>> > upgrades is that you do the dump with the newer version's pg_dump, so
>> > as to get the benefits of any bug fixes that are in it.  The more
>> > dump functionality is on the server side, the less opportunity we have
>> > to repair things that way.
>>
>> But why wouldn't we be able to fix the version in the server, if it
>> turns out to be buggy?  I suppose we wouldn't fix bugs discovered
>> after EOL, but I'm not sure that's a sufficient objection.
>
> There are other things beyond bugs here..  Changes in reserved keywords
> is actually the biggest reason, ime, to use the newer pg_dump when
> you're trying to move to a newer PG version.

Oh.  Good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The missing pg_get_*def functions

От
Noah Misch
Дата:
On Tue, Apr 30, 2013 at 04:47:58AM +0100, Joel Jacobson wrote:
> On Tue, Apr 30, 2013 at 12:46 AM, Noah Misch <noah@leadboat.com> wrote:
> > Those existing functions give a mostly-SnapshotNow picture of their objects,
> > but an sql-language implementation would give a normally-snapshotted picture.
> 
> I assume "normally" is better than "mostly"?

Inconsistent snapshot usage ("mostly"-anything) is bad, especially within the
confines of a single function.  pg_get_viewdef() grabs pg_rewrite.ev_action
using an MVCC snapshot, then interprets it relative to SnapshotNow.  That's a
paradox waiting to happen.  (Granted, the same could be said for *any* use of
SnapshotNow in the absence of locking.)

Whether all-normally-snapshotted (all-MVCC) beats all-SnapshotNow is less
clear-cut, but I tentatively think it would.

> > That status quo is perhaps more an implementation accident than a designed
> > behavior.  Before proliferating functions like this, we should pick a snapshot
> > policy and stick to it.  See the block comment at the top of pg_dump.c.
> 
> I didn't think there would be any reason to migrate the existing
> functions from C to SQL, but this snapshot problem seems like a good
> motive to do it. If they would all be written in SQL, the snapshot
> problem would be solved, right?

Nominally yes, but not because of difficulty using a normal MVCC snapshot from
C.  It's just that the sql PL uses nothing but normal MVCC snapshots.  So,
this isn't a sound reason to translate C to SQL.  In any case, I can't fathom
a prudent 100% sql implementation of pg_get_viewdef(), which needs to deparse
arbitrary queries.

> > Note also that minor releases can readily fix bugs in C-language functions,
> > but we have no infrastructure to update sql-language functions after initdb.
> > That flexibility is unfortunate to lose, particularly for something that
> > pg_dump depends on.  Now, the right thing is probably to design a mechanism
> > for applying simple catalog updates in concert with a minor release.  In the
> > mean time, its absence puts the sql PL at a nontrivial disadvantage here.
> 
> What do you mean with "infrastructure"? Isn't it as simple as CREATE
> OR REPLACE FUNCTION? As long as the interface the pg_get_*def
> functions don't change, I cannot see how simply replacing the existing
> functions in a minor release upgrade could do any harm.

Stephen described the sort of infrastructure I had in mind.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: The missing pg_get_*def functions

От
Dimitri Fontaine
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-04-30 05:14:15 +0100, Joel Jacobson wrote:
>> It would be better to find a way to update sql-language functions in
>> minor upgrades, instead of shutting that door entirely for all future
>> implementation ideas involving sql-language functions in the
>> pg_catalog.
>
> I'd be very careful with jumping on this task. I am pretty sure its a
> very good way to get very, very frustrated if you don't present a widely
> accepted design beforehand. Doing this correctly is *far far* from easy.
>
> Just a little collection of problems:
> * You need to connect to all databases, not just one. There's no
>   infrastructure for this.

I wonder if it wouldn't be possible to have a per database catalog
version and do the work either at first database connection or first use
of the modified function, with some more smarts (catversion changed, and
a list of tableoid, oid changes in the catalogs somewhere).

But basically, that means that I agree that doing this correctly is very
far from being easy. And maybe improving the extension feature list is
an easier way forward.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Just a little collection of problems:
> > * You need to connect to all databases, not just one. There's no
> >   infrastructure for this.
>
> I wonder if it wouldn't be possible to have a per database catalog
> version and do the work either at first database connection or first use
> of the modified function, with some more smarts (catversion changed, and
> a list of tableoid, oid changes in the catalogs somewhere).

I like the idea of having a per-database catversion, but we'd have to
have a cluster-wide version for the shared catalogs too.  I don't think
it's a good idea to wait until the first connection to a DB to apply the
update either..  My thinking was that we'd just kick off a backend for
each database (perhaps using the background worker structure) which
would then apply the update and exit, while the postmaster would wait
until all of these workers have finished before allowing general
connections.

> But basically, that means that I agree that doing this correctly is very
> far from being easy. And maybe improving the extension feature list is
> an easier way forward.

It likely is, but it would really be nice to be able to do catalog
updates like these in a better fashion than sticking some update command
into the release notes and hoping that someone reads them and runs the
command..
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Dimitri Fontaine
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> It likely is, but it would really be nice to be able to do catalog
> updates like these in a better fashion than sticking some update command
> into the release notes and hoping that someone reads them and runs the
> command..

Agreed.

Another advantage of using more the extension infrastructure is that
shipping bug fixes in the C or SQL parts of them would allow a hot fix
to be shipped without restart when limited to an extension.

In-core installed-by-default extensions, anyone?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: The missing pg_get_*def functions

От
Stephen Frost
Дата:
* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:
> Another advantage of using more the extension infrastructure is that
> shipping bug fixes in the C or SQL parts of them would allow a hot fix
> to be shipped without restart when limited to an extension.

I'm actually not thrilled with the security update situation when it
comes to extensions.  It really is a disservice to our users to ask them
to independently manage each and every extension and upgrade each one of
them by hand.

> In-core installed-by-default extensions, anyone?

I'm not against this idea- but we *still* need to solve the problem of
how we update the catalog during a point release and, imv anyway, we
would need to be able to upgrade any in-core installed-by-default
extensions during a point release too, to address any security or other
issues from them.  Right now we get to slide by on upgrading extensions
by not having any in-core / installed-by-default ones and punting to the
user with "well, you installed it, not us, therefore you have to manage
it by hand for eternity"; that doesn't work when we're installing it for
them and they may not even know they've got it..
Thanks,
    Stephen

Re: The missing pg_get_*def functions

От
Andres Freund
Дата:
On 2013-05-06 14:34:52 +0200, Dimitri Fontaine wrote:
> In-core installed-by-default extensions, anyone?

We already have that in plpgsql ...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: The missing pg_get_*def functions

От
Dimitri Fontaine
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> I'm not against this idea- but we *still* need to solve the problem of
> how we update the catalog during a point release and, imv anyway, we
> would need to be able to upgrade any in-core installed-by-default
> extensions during a point release too, to address any security or other

In case it wasn't clear, I agree with your view here and consider the
capability to auto-upgrade extensions a must have. What I say is that if
you ship the .so part of an extension in a live system, the next backend
that starts will use that code, without a restart.

That does not allow us not to provide a way to force-reload modules
currently used in live backends, and we still need to be able to upgrade
the system catalogs and "extension catalogs" too, either at startup in
live operations.

Separating away some code and SQL into in-core installed-by-default
extensions means we have new problems and abilities, not that some
problem are solved by themselves.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support