Обсуждение: extensions are hitting the ceiling

Поиск
Список
Период
Сортировка

extensions are hitting the ceiling

От
Eric Hanson
Дата:
Hi folks,

After months and years of really trying to make EXTENSIONs meet the requirements of my machinations, I have come to the conclusion that either a) I am missing something or b) they are architecturally flawed.  Or possibly both.

Admittedly, I might be trying to push extensions beyond what the great elephant in the sky ever intended. The general bent here is to try to achieve a level of modular reusable components similar to those in "traditional" programming environments like pip, gem, npm, cpan, etc. Personally, I am trying to migrate as much of my dev stack as possible away from the filesystem and into the database. Files, especially code, configuration, templates, permissions, manifests and other development files, would be much happier in a database where they have constraints and an information model and can be queried!

Regardless, it would be really great to be able to install an extension, and have it cascade down to multiple other extensions, which in turn cascade down to more, and have everything just work. Clearly, this was considered in the extension architecture, but I'm running into some problems making it a reality.  So here they are.


#1: Dependencies

Let's say we have two extensions, A and B, both of which depend on a third extension C, let's just say C is hstore.  A and B are written by different developers, and both contain in their .control file the line

        requires = 'hstore'

When A is installed, if A creates a schema, it puts hstore in that schema. If not, hstore is already installed, it uses it in that location.  How does the extension know where to reference hstore?

Then, when B is installed, it checks to see if extension hstore is installed, sees that it is, and moves on.  What if it expects it in a different place than A does? The hstore extension can only be installed once, in a single schema, but if multiple extensions depend on it and look for it in different places, they are incompatible.

I have heard talk of a way to write extensions so that they dynamically reference the schema of their dependencies, but sure don't know how that would work if it's possible.  The @extschema@ variable references the *current* extension's schema, but not there is no dynamic variable to reference the schema of a dependency.

Also it is possible in theory to dynamically set search_path to contain every schema of every dependency in play and then just not specify a schema when you use something in a dependency.  But this ANDs together all the scopes of all the dependencies of an extension, introducing potential for collisions, and is generally kind of clunky.


#2:  Data in Extensions

Extensions that are just a collection of functions and types seem to be the norm.  Extensions can contain what the docs call "configuration" data, but rows are really second class citizens:  They aren't tracked with pg_catalog.pg_depend, they aren't deleted when the extension is dropped, etc.

Sometimes it would make sense for an extension to contain *only* data, or insert some rows in a table that the extension doesn't "own", but has as a dependency.  For example, a "webserver" extension might contain a "resource" table that serves up the content of resources in the table at a specified path. But then, another extension, say an app, might want to just list the webserver extension as a dependency, and insert a few resource rows into it.  This is really from what I can tell beyond the scope of what extensions are capable of.


#3 pg_dump and Extensions

Tables created by extensions are skipped by pg_dump unless they are flagged at create time with:

        pg_catalog.pg_extension_config_dump('my_table', 'where id < 20')

However, there's no way that I can tell to mix and match rows and tables across multiple extensions, so pg_dump can't keep track of multiple extensions that contain rows in the same table.


I'd like an extension framework that can contain data as first class citizens, and can gracefully handle a dependency chain and share dependencies.  I have some ideas for a better approach, but they are pretty radical.  I thought I would send this out and see what folks think.

Thanks,
Eric
--
http://aquameta.org/

Re: extensions are hitting the ceiling

От
Chapman Flack
Дата:
On 03/18/19 22:38, Eric Hanson wrote:
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.

This. You have other interests as well, but this is the one I was thinking
about a few years ago in [1] (starting at "Ok, how numerous would be the
problems with this:").

Nobody ever chimed in to say how numerous they did or didn't think the
problems would be. I was actually thinking recently about sitting down
and trying to write that patch, as no one had exactly stood up to say
"oh heavens no, don't write that." But my round tuits are all deployed
elsewhere at the moment.

I'd still like to discuss the ideas.

-Chap

[1] https://www.postgresql.org/message-id/5685A2E7.6080209%40anastigmatix.net


Re: extensions are hitting the ceiling

От
Chapman Flack
Дата:
On 03/19/19 00:56, Chapman Flack wrote:
> Nobody ever chimed in to say how numerous they did or didn't think the
> problems would be. I was actually thinking recently about sitting down
> and trying to write that patch, as no one had exactly stood up to say
> "oh heavens no, don't write that."

Of course, one notable thing that has happened since I wrote that design
was that Oids have stopped being magical, or supported in user tables.
So a bit of "mutatis mutandis" is needed when reading it in 2019.

Regards,
-Chap


Re: extensions are hitting the ceiling

От
Jiří Fejfar
Дата:
Hi all!

We are also facing some issues when using extensions. We are using
them quite intensively as a tool for maintaining our custom "DB
applications" with versioning, all tables, data, regression tests...
We find extensions great! We do not need other tool like flyway. My
colleague already posted some report to bug mailing list [1] but with
no response.

Our observations correspond well with your outline:

#1: Dependencies

* It is not possible to specify the version of extension we are
dependent on in .control file.

#2: Data in Extensions

I am not sure about the name "Configuration" Tables. From my point of
view extensions can hold two sorts of data:
1) "static" data: delivered with extension, inserted by update
scripts; the same "static" data are present across multiple
installation of extension in the same version. This data are not
supposed to be dumped.
2) "dynamic" data: inserted by users, have to be included in dumps,
are marked with pg_extension_config_dump and are called
"configuration" tables/data ... but why "configuration"?

#3 pg_dump and Extensions

* We have described some behavior of pg_dump, which we believe are in
fact bugs: see [1] "1) pg_dump with --schema parameter" and "2)
Hanging OID in extconfig".
* Maybe it would be good to introduce new switch pg_dump --extension
extA dumping all "dynamic" data from extension tables regardless on
schema

#4: Extension owned

* It is not possible to alter extension owner

Thanks, Jiří & Ivo.

[1] https://www.postgresql.org/message-id/15616-260dc9cb3bec7e7e@postgresql.org


Re: extensions are hitting the ceiling

От
Eric Hanson
Дата:
On Mon, Mar 18, 2019 at 11:56 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 03/18/19 22:38, Eric Hanson wrote:
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.

This. You have other interests as well, but this is the one I was thinking
about a few years ago in [1] (starting at "Ok, how numerous would be the
problems with this:").

Cool!

First thoughts, it seems like a sensible way to go given the premise that extensions are immutable.  But -- I'd be a little concerned about the performance ramifications.  Usually there are not jillions of database objects in a extension, but if they started containing data, there sure could be jillions of rows.  Every row would have to be checked for existence as part of an extension on every insert or update, no?

Nobody ever chimed in to say how numerous they did or didn't think the
problems would be. I was actually thinking recently about sitting down
and trying to write that patch, as no one had exactly stood up to say
"oh heavens no, don't write that." But my round tuits are all deployed
elsewhere at the moment.

Likewise, if nobody tells me "oh sheeze extensions can already do all this" I'm going to assume they can't. :-)

I'd still like to discuss the ideas.

Me too!

Ok, I should probably come out and say it:  I think the user story of "There is some kind of packaging system that can contain both schema and data, and these packages can be installed and removed along with their dependencies atomically" is fairly obvious and desirable.  But getting there while accepting the premises that are currently baked into extensions might be a tall order.

Extensions have a middleware-ish aspect to them -- they are immutable and that immutability is checked and enforced at runtime.  That might scale just fine to a few dozen database objects that only check pg_depends on DDL operations, but if we introduce record tracking and start sticking sticks into the wheels of the DML, things could go south really quickly it seems.

I really like a more git-like pattern, where you are free to modify the working copy of a repository (or in this case an extension), and instead of being blocked from changing things, the system tells the user what has changed and how, and gives sensible options for what to do about it.  That way it doesn't incur a performance hit, and the user can do a kind of "git status" on their extension to show any changes.

How about an extension system whose first principle is that an extension is made up of rows, period.  What about the DDL you ask?  Well...

Imagine a system catalog whose sole purpose is to contain database object definitions like "CREATE VIEW ...", similar to those produced by pg_catalog.pg_get_viewdef(), pg_catalog.get_functiondef(), etc.  Let's call this catalog `def`. There is exactly one VIEW for every type of database object in PostgreSQL. def.table, def.role, def.sequence, def.operator, def.type, etc. Each def.* VIEW contains only two columns, `id` and `definition`.  The `id` column contains a unique identifier for the object, and the `definition` column contains the SQL statement that will recreate the object.

So, inside this system catalog is the SQL definition statement of every database object.  In theory, the contents of all the `definition` columns together would be similar to the contents of pg_dump --schema-only.

Now, imagine all these def.* views had insert triggers, so that on insert, it actually executes the contents of the `definition` column.  In theory, we could pg_restore the data in the def.* views, and it would recreate all the database objects. It could shift all that logic out of pg_dump and into the database.

So using the def.* catalog, we could package both "regular" table data and system objects via the contents of the def.* catalog views.  Packages are a collection rows, period. Build up from there.

I'm working on a prototype called bundle [1], it still has a ways to go but it's showing some promise.  It is going to require brining into PostgreSQL the missing pg_get_*def functions, as folks have talked about before [2].

Thanks,
Eric


Re: extensions are hitting the ceiling

От
Noah Misch
Дата:
On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
> #1: Dependencies
> 
> Let's say we have two extensions, A and B, both of which depend on a third
> extension C, let's just say C is hstore.  A and B are written by different
> developers, and both contain in their .control file the line
> 
>         requires = 'hstore'
> 
> When A is installed, if A creates a schema, it puts hstore in that schema.
> If not, hstore is already installed, it uses it in that location.  How does
> the extension know where to reference hstore?
> 
> Then, when B is installed, it checks to see if extension hstore is
> installed, sees that it is, and moves on.  What if it expects it in a
> different place than A does? The hstore extension can only be installed
> once, in a single schema, but if multiple extensions depend on it and look
> for it in different places, they are incompatible.
> 
> I have heard talk of a way to write extensions so that they dynamically
> reference the schema of their dependencies, but sure don't know how that
> would work if it's possible.  The @extschema@ variable references the
> *current* extension's schema, but not there is no dynamic variable to
> reference the schema of a dependency.

If desperate, you can do it like this:

  DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
    (SELECT nspname FROM pg_namespace n
     JOIN pg_extension ON n.oid = extnamespace
     WHERE extname = 'earthdistance' )); END $$;

Needless to say, that's too ugly.  Though probably unimportant in practice, it
also has a race condition vs. ALTER EXTENSION SET SCHEMA.

> Also it is possible in theory to dynamically set search_path to contain
> every schema of every dependency in play and then just not specify a schema
> when you use something in a dependency.  But this ANDs together all the
> scopes of all the dependencies of an extension, introducing potential for
> collisions, and is generally kind of clunky.

That's how it works today, and it has the problems you describe.  I discussed
some solution candidates here:
https://www.postgresql.org/message-id/20180710014308.GA805781@rfd.leadboat.com

The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.  I'm
attaching the proof of concept, for your information.

> #2:  Data in Extensions
> 
> Extensions that are just a collection of functions and types seem to be the
> norm.  Extensions can contain what the docs call "configuration" data, but
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.
> 
> Sometimes it would make sense for an extension to contain *only* data, or
> insert some rows in a table that the extension doesn't "own", but has as a
> dependency.  For example, a "webserver" extension might contain a
> "resource" table that serves up the content of resources in the table at a
> specified path. But then, another extension, say an app, might want to just
> list the webserver extension as a dependency, and insert a few resource
> rows into it.  This is really from what I can tell beyond the scope of what
> extensions are capable of.

I never thought of this use case.  Interesting.

Вложения

Re: extensions are hitting the ceiling

От
Eric Hanson
Дата:


On Tue, Apr 16, 2019 at 12:47 AM Noah Misch <noah@leadboat.com> wrote:
On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
> I have heard talk of a way to write extensions so that they dynamically
> reference the schema of their dependencies, but sure don't know how that
> would work if it's possible.  The @extschema@ variable references the
> *current* extension's schema, but not there is no dynamic variable to
> reference the schema of a dependency.

If desperate, you can do it like this:

  DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
    (SELECT nspname FROM pg_namespace n
     JOIN pg_extension ON n.oid = extnamespace
     WHERE extname = 'earthdistance' )); END $$;

Needless to say, that's too ugly.  Though probably unimportant in practice, it
also has a race condition vs. ALTER EXTENSION SET SCHEMA.

> Also it is possible in theory to dynamically set search_path to contain
> every schema of every dependency in play and then just not specify a schema
> when you use something in a dependency.  But this ANDs together all the
> scopes of all the dependencies of an extension, introducing potential for
> collisions, and is generally kind of clunky.

That's how it works today, and it has the problems you describe.  I discussed
some solution candidates here:
https://www.postgresql.org/message-id/20180710014308.GA805781@rfd.leadboat.com

The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.  I'm
attaching the proof of concept, for your information.

Interesting.

Why shelved?  I like it.  You said you lean toward 2b in the link above, but there is no 2b :-) but 1b was this option, which maybe you meant?

The other approach would be to have each extension be in it's own schema, whose name is fixed for life.  Then there are no collisions and no ambiguity about their location.   I don't use NPM but was just reading about how they converted their package namespace from a single global namespace with I think it was 30k packages in it, to @organization/packagename.  I don't know how folks would feel about a central namespace registry, I don't love the idea if we can find a way around it, but would settle for it if there's no better solution.  Either that or use a UUID as the schema name.  Truly hideous.  But it seems like your approach above with just dynamically looking up the extension's schema as a variable would solve everything.

There is the problem of sequencing, where extension A installs dependency extension B in it's own schema.  Then extension C also wants to use dependency B, but extension A is uninstalled and extension B is now still hanging around in A's old schema.  Not ideal but at least everything would still function.

I'll keep thinking about it...
 
> #2:  Data in Extensions
>
> Extensions that are just a collection of functions and types seem to be the
> norm.  Extensions can contain what the docs call "configuration" data, but
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.
>
> Sometimes it would make sense for an extension to contain *only* data, or
> insert some rows in a table that the extension doesn't "own", but has as a
> dependency.  For example, a "webserver" extension might contain a
> "resource" table that serves up the content of resources in the table at a
> specified path. But then, another extension, say an app, might want to just
> list the webserver extension as a dependency, and insert a few resource
> rows into it.  This is really from what I can tell beyond the scope of what
> extensions are capable of.

I never thought of this use case.  Interesting.

It's a *really* powerful pattern.  I am sure of this because I've been exploring it while developing a row packaging system modeled after git [1], and using it in conjunction with EXTENSIONs with extreme joy.  But one does rows, and the other does DDL, and this is not ideal.

Cheers,
Eric

Re: extensions are hitting the ceiling

От
Eric Hanson
Дата:


On Tue, Apr 16, 2019 at 4:24 AM Eric Hanson <eric@aquameta.com> wrote:


On Tue, Apr 16, 2019 at 12:47 AM Noah Misch <noah@leadboat.com> wrote:
On Mon, Mar 18, 2019 at 09:38:19PM -0500, Eric Hanson wrote:
> I have heard talk of a way to write extensions so that they dynamically
> reference the schema of their dependencies, but sure don't know how that
> would work if it's possible.  The @extschema@ variable references the
> *current* extension's schema, but not there is no dynamic variable to
> reference the schema of a dependency.

If desperate, you can do it like this:

  DO $$ BEGIN EXECUTE format('SELECT %I.earth()',
    (SELECT nspname FROM pg_namespace n
     JOIN pg_extension ON n.oid = extnamespace
     WHERE extname = 'earthdistance' )); END $$;

Needless to say, that's too ugly.  Though probably unimportant in practice, it
also has a race condition vs. ALTER EXTENSION SET SCHEMA.

> Also it is possible in theory to dynamically set search_path to contain
> every schema of every dependency in play and then just not specify a schema
> when you use something in a dependency.  But this ANDs together all the
> scopes of all the dependencies of an extension, introducing potential for
> collisions, and is generally kind of clunky.

That's how it works today, and it has the problems you describe.  I discussed
some solution candidates here:
https://www.postgresql.org/message-id/20180710014308.GA805781@rfd.leadboat.com

The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.  I'm
attaching the proof of concept, for your information.

Interesting.

Why shelved?  I like it.  You said you lean toward 2b in the link above, but there is no 2b :-) but 1b was this option, which maybe you meant?

The other approach would be to have each extension be in it's own schema, whose name is fixed for life.  Then there are no collisions and no ambiguity about their location.   I don't use NPM but was just reading about how they converted their package namespace from a single global namespace with I think it was 30k packages in it, to @organization/packagename.  I don't know how folks would feel about a central namespace registry, I don't love the idea if we can find a way around it, but would settle for it if there's no better solution.  Either that or use a UUID as the schema name.  Truly hideous.  But it seems like your approach above with just dynamically looking up the extension's schema as a variable would solve everything.

There is the problem of sequencing, where extension A installs dependency extension B in it's own schema.  Then extension C also wants to use dependency B, but extension A is uninstalled and extension B is now still hanging around in A's old schema.  Not ideal but at least everything would still function.

I'll keep thinking about it...

We would probably be wise to learn from what has gone (so I hear) terribly wrong with the Node / NPM packaging system (and I'm sure many before it), namely versioning.  What happens when two extensions require different versions of the same extension?  At a glance it almost seems unsolvable, given the constraint that an extension can only be installed once, and only at a single version.  I don't understand why that constraint exists though.

Eric




Re: extensions are hitting the ceiling

От
Eric Hanson
Дата:
On Tue, Apr 16, 2019 at 4:47 AM Eric Hanson <eric@aquameta.com> wrote:
We would probably be wise to learn from what has gone (so I hear) terribly wrong with the Node / NPM packaging system (and I'm sure many before it), namely versioning.  What happens when two extensions require different versions of the same extension?  At a glance it almost seems unsolvable, given the constraint that an extension can only be installed once, and only at a single version.  I don't understand why that constraint exists though.

How about this:

1. Extension can be installed once *per-version*.
2. Each version of an extension that is installed is assigned by the system a unique, hidden schema (similar to temp table schemas) whose name doesn't matter because the extension user will never need to know it.
3. There exists a dynamic variable, like you proposed above, but it includes version number as well.  @DEPNAME_VERSION_schema@ perhaps.  This variable would resolve to the system-assigned schema name of the extension specified, at the version specified.
4. Since sprinkling ones code with version numbers is awful, there exists a way (which I haven't thought of) to set a kind of search_path-type setting which sets in the current scope the version number of the extension that should be dereferenced, so developers can still use @DEPNAME_schema@.

This would allow multiple versions of extensions to coexist, and it would solve the problem with two extensions wanting the same dependency in different places.

It's radical, but extensions are radically broken.  A critique of the above would be that extensions still have a single global namespace, so personally I don't think it even goes far enough.

Cheers,
Eric

Re: extensions are hitting the ceiling

От
Jiří Fejfar
Дата:
Hi all!

I am sending our comments to mentioned issues. I was trying to send it
month ago
(https://www.postgresql.org/message-id/CA%2B8wVNUOt2Bh4x7YQEVoq5BfP%3DjM-F6cDYKxJiTODG_VCGhUVQ%40mail.gmail.com),
but it somehow doesn't append in the "thread" (sorry, I am new in
mailing list practice...).

My colleague already posted some report to bug mailing list
(https://www.postgresql.org/message-id/15616-260dc9cb3bec7e7e@postgresql.org)
but with no response.

On Tue, 19 Mar 2019 at 02:38, Eric Hanson <eric@aquameta.com> wrote:
>
> Hi folks,
>
> After months and years of really trying to make EXTENSIONs meet the requirements of my machinations, I have come to
theconclusion that either a) I am missing something or b) they are architecturally flawed.  Or possibly both. 
>
> Admittedly, I might be trying to push extensions beyond what the great elephant in the sky ever intended. The general
benthere is to try to achieve a level of modular reusable components similar to those in "traditional" programming
environmentslike pip, gem, npm, cpan, etc. Personally, I am trying to migrate as much of my dev stack as possible away
fromthe filesystem and into the database. Files, especially code, configuration, templates, permissions, manifests and
otherdevelopment files, would be much happier in a database where they have constraints and an information model and
canbe queried! 
>
> Regardless, it would be really great to be able to install an extension, and have it cascade down to multiple other
extensions,which in turn cascade down to more, and have everything just work. Clearly, this was considered in the
extensionarchitecture, but I'm running into some problems making it a reality.  So here they are. 
>
>
> #1: Dependencies
>
> Let's say we have two extensions, A and B, both of which depend on a third extension C, let's just say C is hstore.
Aand B are written by different developers, and both contain in their .control file the line 
>
>         requires = 'hstore'
>
> When A is installed, if A creates a schema, it puts hstore in that schema. If not, hstore is already installed, it
usesit in that location.  How does the extension know where to reference hstore? 
>
> Then, when B is installed, it checks to see if extension hstore is installed, sees that it is, and moves on.  What if
itexpects it in a different place than A does? The hstore extension can only be installed once, in a single schema, but
ifmultiple extensions depend on it and look for it in different places, they are incompatible. 
>
> I have heard talk of a way to write extensions so that they dynamically reference the schema of their dependencies,
butsure don't know how that would work if it's possible.  The @extschema@ variable references the *current* extension's
schema,but not there is no dynamic variable to reference the schema of a dependency. 
>
> Also it is possible in theory to dynamically set search_path to contain every schema of every dependency in play and
thenjust not specify a schema when you use something in a dependency.  But this ANDs together all the scopes of all the
dependenciesof an extension, introducing potential for collisions, and is generally kind of clunky. 
>

It is not possible to specify the version of extension we are
dependent on in .control file.

> #2:  Data in Extensions
>
> Extensions that are just a collection of functions and types seem to be the norm.  Extensions can contain what the
docscall "configuration" data, but rows are really second class citizens:  They aren't tracked with
pg_catalog.pg_depend,they aren't deleted when the extension is dropped, etc. 
>
> Sometimes it would make sense for an extension to contain *only* data, or insert some rows in a table that the
extensiondoesn't "own", but has as a dependency.  For example, a "webserver" extension might contain a "resource" table
thatserves up the content of resources in the table at a specified path. But then, another extension, say an app, might
wantto just list the webserver extension as a dependency, and insert a few resource rows into it.  This is really from
whatI can tell beyond the scope of what extensions are capable of. 
>

I am not sure about the name "Configuration" Tables. From my point of
view extensions can hold two sorts of data:
1) "static" data: delivered with extension, inserted by update
scripts; the same "static" data are present across multiple
installation of extension in the same version. This data are not
supposed to be dumped.
2) "dynamic" data: inserted by users, have to be included in dumps,
are marked with pg_extension_config_dump and are called
"configuration" tables/data ... but why "configuration"?

>
> #3 pg_dump and Extensions
>
> Tables created by extensions are skipped by pg_dump unless they are flagged at create time with:
>
>         pg_catalog.pg_extension_config_dump('my_table', 'where id < 20')
>
> However, there's no way that I can tell to mix and match rows and tables across multiple extensions, so pg_dump can't
keeptrack of multiple extensions that contain rows in the same table. 
>

We have described some behavior of pg_dump, which we believe are in
fact bugs: see [1] "1) pg_dump with --schema parameter" and "2)
Hanging OID in extconfig".
Maybe it would be good to introduce new switch pg_dump --extension
extA dumping all "dynamic" data from extension tables regardless on
schema

>
> I'd like an extension framework that can contain data as first class citizens, and can gracefully handle a dependency
chainand share dependencies.  I have some ideas for a better approach, but they are pretty radical.  I thought I would
sendthis out and see what folks think. 
>
> Thanks,
> Eric
> --
> http://aquameta.org/

#4: Extension owned

It is not possible to alter extension owner

Thanks for consideration, Jiří & Ivo.

[1] https://www.postgresql.org/message-id/15616-260dc9cb3bec7e7e@postgresql.org



Re: extensions are hitting the ceiling

От
Noah Misch
Дата:
On Tue, Apr 16, 2019 at 04:24:20AM -0500, Eric Hanson wrote:
> On Tue, Apr 16, 2019 at 12:47 AM Noah Misch <noah@leadboat.com> wrote:
> > https://www.postgresql.org/message-id/20180710014308.GA805781@rfd.leadboat.com
> >
> > The @DEPNAME_schema@ thing was trivial to implement, but I shelved it.
> > I'm attaching the proof of concept, for your information.

> Why shelved?  I like it.  You said you lean toward 2b in the link above,
> but there is no 2b :-) but 1b was this option, which maybe you meant?

(2) is a mutation of (1), so (2b) exists by mutating (1b) according to the
description of (2).  In other words, (2b) would be this:

  Drop relocatable=true from extensions that have cause to do so (by adding a
  new version number and versioned control file): cube, earthdistance,
  pageinspect, pg_freespacemap, xml2.  Do likewise for others as needed in the
  future.  To relocate an affected extension, drop and recreate it.  Warn
  about relocatable=true in non-core extensions.  Expand @DEPNAME_schema@ in
  extension SQL files.  Use @cube_schema@ to refer to the right objects.

I shelved it because thread
http://postgr.es/m/flat/20180830070609.GA1485875@rfd.leadboat.com did not
accept it as a solution for contrib/ extensions.  If it's not good enough for
contrib/, it's not good enough for this problem space.

> The other approach would be to have each extension be in it's own schema,
> whose name is fixed for life.  Then there are no collisions and no
> ambiguity about their location.   I don't use NPM but was just reading
> about how they converted their package namespace from a single global
> namespace with I think it was 30k packages in it,
> to @organization/packagename.  I don't know how folks would feel about a
> central namespace registry, I don't love the idea if we can find a way
> around it, but would settle for it if there's no better solution.  Either
> that or use a UUID as the schema name.  Truly hideous.  But it seems like
> your approach above with just dynamically looking up the extension's schema
> as a variable would solve everything.

That's like how C/C++/Java identifiers work, turning each @DEPNAME_schema@
into a constant.  If we were starting from scratch, that's attractive.
Unfortunately, folks have applications that expect to use e.g. public.earth().
We'd need a big benefit to justify obligating those users to migrate.  If we
had @DEPNAME_schema@, communities of users could decide to adopt a local
convention of a fixed schema per extension.  Other communities of users,
particularly those with substantial stable code, could retain their current
schema usage patterns.

Thanks,
nm