Обсуждение: BUG #15344: pg_proc.proisagg was removed incompatibly in PostgreSQL 11

Поиск
Список
Период
Сортировка

BUG #15344: pg_proc.proisagg was removed incompatibly in PostgreSQL 11

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15344
Logged by:          Lukas Eder
Email address:      lukas.eder@gmail.com
PostgreSQL version: 11beta2
Operating system:   Debian
Description:

When comparing the current version (10) [1] and the developer version (11)
[2] of the pg_proc documentation, then it can be seen that the
pg_proc.proisagg column was removed backwards incompatibly. The
documentation states for [1]:

> The table contains data for aggregate functions as well as plain
functions. If proisagg is true, there should be a matching row in
pg_aggregate.

And for [2]:

> If prokind indicates that the entry is for an aggregate function, there
should be a matching row in pg_aggregate.

While I appreciate that an encoded enumeration is more powerful than a
simple boolean value, I think it would have been better if this change had
been done backwards compatibly. For example, there could still be a
pg_proc.proisagg column that defaults to (prokind = 'a').

[1] https://www.postgresql.org/docs/10/static/catalog-pg-proc.html
[2] https://www.postgresql.org/docs/11/static/catalog-pg-proc.html


Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Andres Freund
Дата:
Hi,

On 2018-08-21 14:23:45 +0000, PG Bug reporting form wrote:
> When comparing the current version (10) [1] and the developer version (11)
> [2] of the pg_proc documentation, then it can be seen that the
> pg_proc.proisagg column was removed backwards incompatibly. The
> documentation states for [1]:

Please note that the pg_catalog.* tables (and views) are *NOT* intended
to backwards compatible between major versions. We change them in ways
backward incompatible all the time.


> While I appreciate that an encoded enumeration is more powerful than a
> simple boolean value, I think it would have been better if this change had
> been done backwards compatibly. For example, there could still be a
> pg_proc.proisagg column that defaults to (prokind = 'a').

When could we have removed that? The few people querying it would've
continued to rely on it until we've removed it, leading to the same
outcry.  And supporting both would've required more code, because we
would have to make sure they're consistent, and raise errors etc if not.

Greetings,

Andres Freund


Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Lukas Eder
Дата:


On Tue, Aug 21, 2018 at 4:28 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-08-21 14:23:45 +0000, PG Bug reporting form wrote:
> When comparing the current version (10) [1] and the developer version (11)
> [2] of the pg_proc documentation, then it can be seen that the
> pg_proc.proisagg column was removed backwards incompatibly. The
> documentation states for [1]:

Please note that the pg_catalog.* tables (and views) are *NOT* intended
to backwards compatible between major versions. We change them in ways
backward incompatible all the time.

The pg_catalog tables do seem to be the only way to reverse engineer some more sophisticated things in the database. I imagine that this is being done by tool vendors like myself (jOOQ) quite a bit. And there are tons of Stack Overflow answers that show how to query the pg_catalog tables, all of them risking to be outdated between major versions. These queries are probably used by quite a few people in some home grown build tool, reverse engineering tool, etc.

I understand that backwards compatibility is quite a bit of extra work, but in cases like this particular one, the price to pay seems relatively low. Perhaps a new strategy could be to break things only if there is really no other solution?

Thanks,
Lukas

Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Andres Freund
Дата:
On 2018-08-21 16:39:18 +0200, Lukas Eder wrote:
> On Tue, Aug 21, 2018 at 4:28 PM Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2018-08-21 14:23:45 +0000, PG Bug reporting form wrote:
> > > When comparing the current version (10) [1] and the developer version
> > (11)
> > > [2] of the pg_proc documentation, then it can be seen that the
> > > pg_proc.proisagg column was removed backwards incompatibly. The
> > > documentation states for [1]:
> >
> > Please note that the pg_catalog.* tables (and views) are *NOT* intended
> > to backwards compatible between major versions. We change them in ways
> > backward incompatible all the time.
> >
> 
> The pg_catalog tables do seem to be the only way to reverse engineer some
> more sophisticated things in the database.

Yes, there's some things that aren't represented in a standardized way
in information_schema.*.  If that's the case it's good for tool vendors
to pipe up and ask for something intended to be externally visible.


> I imagine that this is being
> done by tool vendors like myself (jOOQ) quite a bit. And there are tons of
> Stack Overflow answers that show how to query the pg_catalog tables, all of
> them risking to be outdated between major versions.

People doing bad things on stackoverflow isn't very convincing.... ;)


> I understand that backwards compatibility is quite a bit of extra work, but
> in cases like this particular one, the price to pay seems relatively low.
> Perhaps a new strategy could be to break things only if there is really no
> other solution?

I mean we don't break things willy-nilly already. And there's plenty
cases where we kept things around for backward compatibility. But
usually the problem is that that means you have to keep the shim around
forever. Even if better solutions have been around for many years,
there'll be complaints about removing them.  So IMO the backward compat
price for pg_catalog.* has to be low enough that there's essentially no
point in keeping it around forever, or so painful that it'd cost a lot
of people a lot.

Greetings,

Andres Freund


Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Lukas Eder
Дата:


On Tue, Aug 21, 2018 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-08-21 16:39:18 +0200, Lukas Eder wrote:
> On Tue, Aug 21, 2018 at 4:28 PM Andres Freund <andres@anarazel.de> wrote:
>
> > Hi,
> >
> > On 2018-08-21 14:23:45 +0000, PG Bug reporting form wrote:
> > > When comparing the current version (10) [1] and the developer version
> > (11)
> > > [2] of the pg_proc documentation, then it can be seen that the
> > > pg_proc.proisagg column was removed backwards incompatibly. The
> > > documentation states for [1]:
> >
> > Please note that the pg_catalog.* tables (and views) are *NOT* intended
> > to backwards compatible between major versions. We change them in ways
> > backward incompatible all the time.
> >
>
> The pg_catalog tables do seem to be the only way to reverse engineer some
> more sophisticated things in the database.

Yes, there's some things that aren't represented in a standardized way
in information_schema.*.  If that's the case it's good for tool vendors
to pipe up and ask for something intended to be externally visible.

I have mixed feelings about this. The information_schema is part of the SQL standard. I reckon that this particular information would belong in INFORMATION_SCHEMA.ROUTINES.ROUTINE_TYPE. The SQL:2011 standard (I don't have newer versions of the SQL/Schemata document) mentions these possible values for that column:

'PROCEDURE', 'FUNCTION', 'INSTANCE METHOD', 'STATIC METHOD', 'CONSTRUCTOR METHOD'

So, no aggregate functions or window functions as in pg_proc.prokind. When extending the standard, the functionality becomes a bit less standard, and risks breaking as well in the future, e.g. when the standard *does* add aggregate functions as a possible value, but not using the name PostgreSQL chooses now.

Which is why the vendor specific pg_catalog is so useful. Any value is acceptable in those tables as you do not have to coordinate their layout with the standard committee.

In Oracle, the dictionary views aren't following the information_schema standard, but are vastly richer than what the standard supports - just like pg_catalog. They are definitely kept backwards compatible for the same reason I've mentioned: Tool support.

A database product thrives on the quality of the tools supporting it. Making it hard for the tool vendors might mean there's less support for advanced features. From a market adoption perspective, in the long run, there is no option but to be more backwards compatible.
 
> I imagine that this is being
> done by tool vendors like myself (jOOQ) quite a bit. And there are tons of
> Stack Overflow answers that show how to query the pg_catalog tables, all of
> them risking to be outdated between major versions.

People doing bad things on stackoverflow isn't very convincing.... ;)

Your perception of "bad" is biased of course, just like mine. People use what's available, this has always been the case with any product. If there were "internal" and "public" catalog tables / views, then it would be more understandable that using the (internal) pg_catalog is being dismissed, but given that there is no option...

Here, have a quick google search for questions on Stack Overflow involving pg_proc.proisagg:

It returns 107 results on Stack Overflow alone. Some examples:


All of these answers are now outdated with PostgreSQL 11. Not only are they outdated, but if the answers were fixed for PostgreSQL 11, the fixed version wouldn't work on older PostgreSQL versions, because pg_proc.prokind didn't exist earlier.

I find that a relatively high price to pay in this case for the relatively easy solution to keep a pg_proc.proisagg computed column around for backwards compatibility.
 
> I understand that backwards compatibility is quite a bit of extra work, but
> in cases like this particular one, the price to pay seems relatively low.
> Perhaps a new strategy could be to break things only if there is really no
> other solution?

I mean we don't break things willy-nilly already. And there's plenty
cases where we kept things around for backward compatibility. But
usually the problem is that that means you have to keep the shim around
forever. Even if better solutions have been around for many years,
there'll be complaints about removing them.  So IMO the backward compat
price for pg_catalog.* has to be low enough that there's essentially no
point in keeping it around forever, or so painful that it'd cost a lot
of people a lot.

I agree that at some point, the price to pay is too high. But in this case, a boolean value has been replaced by an enumeration, so it would be really easy to maintain the boolean value as a computed column, right?

Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Pavel Stehule
Дата:


Dne út 21. 8. 2018 17:04 uživatel Lukas Eder <lukas.eder@gmail.com> napsal:


On Tue, Aug 21, 2018 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-08-21 16:39:18 +0200, Lukas Eder wrote:
> On Tue, Aug 21, 2018 at 4:28 PM Andres Freund <andres@anarazel.de> wrote:
>
> > Hi,
> >
> > On 2018-08-21 14:23:45 +0000, PG Bug reporting form wrote:
> > > When comparing the current version (10) [1] and the developer version
> > (11)
> > > [2] of the pg_proc documentation, then it can be seen that the
> > > pg_proc.proisagg column was removed backwards incompatibly. The
> > > documentation states for [1]:
> >
> > Please note that the pg_catalog.* tables (and views) are *NOT* intended
> > to backwards compatible between major versions. We change them in ways
> > backward incompatible all the time.
> >
>
> The pg_catalog tables do seem to be the only way to reverse engineer some
> more sophisticated things in the database.

Yes, there's some things that aren't represented in a standardized way
in information_schema.*.  If that's the case it's good for tool vendors
to pipe up and ask for something intended to be externally visible.

I have mixed feelings about this. The information_schema is part of the SQL standard. I reckon that this particular information would belong in INFORMATION_SCHEMA.ROUTINES.ROUTINE_TYPE. The SQL:2011 standard (I don't have newer versions of the SQL/Schemata document) mentions these possible values for that column:

'PROCEDURE', 'FUNCTION', 'INSTANCE METHOD', 'STATIC METHOD', 'CONSTRUCTOR METHOD'

So, no aggregate functions or window functions as in pg_proc.prokind. When extending the standard, the functionality becomes a bit less standard, and risks breaking as well in the future, e.g. when the standard *does* add aggregate functions as a possible value, but not using the name PostgreSQL chooses now.

Which is why the vendor specific pg_catalog is so useful. Any value is acceptable in those tables as you do not have to coordinate their layout with the standard committee.

In Oracle, the dictionary views aren't following the information_schema standard, but are vastly richer than what the standard supports - just like pg_catalog. They are definitely kept backwards compatible for the same reason I've mentioned: Tool support.

A database product thrives on the quality of the tools supporting it. Making it hard for the tool vendors might mean there's less support for advanced features. From a market adoption perspective, in the long run, there is no option but to be more backwards compatible.
 
> I imagine that this is being
> done by tool vendors like myself (jOOQ) quite a bit. And there are tons of
> Stack Overflow answers that show how to query the pg_catalog tables, all of
> them risking to be outdated between major versions.

People doing bad things on stackoverflow isn't very convincing.... ;)

Your perception of "bad" is biased of course, just like mine. People use what's available, this has always been the case with any product. If there were "internal" and "public" catalog tables / views, then it would be more understandable that using the (internal) pg_catalog is being dismissed, but given that there is no option...

Here, have a quick google search for questions on Stack Overflow involving pg_proc.proisagg:

It returns 107 results on Stack Overflow alone. Some examples:


All of these answers are now outdated with PostgreSQL 11. Not only are they outdated, but if the answers were fixed for PostgreSQL 11, the fixed version wouldn't work on older PostgreSQL versions, because pg_proc.prokind didn't exist earlier.

I find that a relatively high price to pay in this case for the relatively easy solution to keep a pg_proc.proisagg computed column around for backwards compatibility.
 
> I understand that backwards compatibility is quite a bit of extra work, but
> in cases like this particular one, the price to pay seems relatively low.
> Perhaps a new strategy could be to break things only if there is really no
> other solution?

I mean we don't break things willy-nilly already. And there's plenty
cases where we kept things around for backward compatibility. But
usually the problem is that that means you have to keep the shim around
forever. Even if better solutions have been around for many years,
there'll be complaints about removing them.  So IMO the backward compat
price for pg_catalog.* has to be low enough that there's essentially no
point in keeping it around forever, or so painful that it'd cost a lot
of people a lot.

I agree that at some point, the price to pay is too high. But in this case, a boolean value has been replaced by an enumeration, so it would be really easy to maintain the boolean value as a computed column, right?

now, the logic of pgproc is different, and this column has not sense. Probably nobody is happy, but I dont see any benefit holds garbage in system catalogue. Pgproc is table, not view, so computed column is not possible feature, and if can be, then it has not any consistency there. 


Re: BUG #15344: pg_proc.proisagg was removed incompatibly inPostgreSQL 11

От
Bruce Momjian
Дата:
On Tue, Aug 21, 2018 at 05:04:13PM +0200, Lukas Eder wrote:
> On Tue, Aug 21, 2018 at 4:45 PM Andres Freund <andres@anarazel.de> wrote:
>      > I understand that backwards compatibility is quite a bit of extra work,
>     but
>     > in cases like this particular one, the price to pay seems relatively low.
>     > Perhaps a new strategy could be to break things only if there is really
>     no
>     > other solution?
> 
>     I mean we don't break things willy-nilly already. And there's plenty
>     cases where we kept things around for backward compatibility. But
>     usually the problem is that that means you have to keep the shim around
>     forever. Even if better solutions have been around for many years,
>     there'll be complaints about removing them.  So IMO the backward compat
>     price for pg_catalog.* has to be low enough that there's essentially no
>     point in keeping it around forever, or so painful that it'd cost a lot
>     of people a lot.
> 
> 
> I agree that at some point, the price to pay is too high. But in this case, a
> boolean value has been replaced by an enumeration, so it would be really easy
> to maintain the boolean value as a computed column, right?

We realize we are breaking some things by changing any API.  The
question is whether the breakage is worth the cleaner API for users
going forward.

People regularly complain when we break APIs, and ask us to justify
every decision, which we are willing to do.  However, few people
complain when we keep old APIs around.  The big issue is that keeping
old APIs around has a much higher cost than the API breakage we do, in
most cases.  This is because API breakage is a big hurt for a few
people, but keeping around an old API is a small hurt for many, and for
a long duration.  So, you hear about the big hurts, but not about the
many small ones, but that doesn't mean we should ignore the small ones.

Anyone who has used an API that allows you to do things in
several old and new ways knows the cost of such API complexity, and the
server programming complexity of keeping old APIs working properly.

Being specific, we almost never change the library API (libpq), and we
rarely change the SQL API, though there are cases listed the major
release notes.  We often change the administrators API and system system
catalog contents, as you have seen.  These decisions are all related to
the ability of users to detect and adjust to API changes, the ability to
keep the old API active, and the number of users effected.

Telling us we broke an API and that we were wrong really doesn't tell us
anything --- we already knew we broke it, and suspected some people
would be disappointed.  You need to walk into that calculus and tell us
why our estimates were wrong.  However, most users just don't have the
level of Postgres knowledge to really argue such cases, even if we are
wrong.

We have also made the decision long ago that we would rather present a
clean API to future users than avoid reasonable breakage for current
users.  People who want that policy changed must get quite involved so
they can argue with the calculus we are already using.  There also might
be cases where our API breakage is too extreme and another relational
database product might be a better fit.

I am thinking I should create a wiki page explaining how we make API
breakage decisions so we don't need to re-litigate this each time.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +