Re: extensions are hitting the ceiling

Поиск
Список
Период
Сортировка
От Eric Hanson
Тема Re: extensions are hitting the ceiling
Дата
Msg-id CACA6kxjUMciJN_Y95xR6XWTLrJeiciOL8WNxinHLDoi3yNiE_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: extensions are hitting the ceiling  (Chapman Flack <chap@anastigmatix.net>)
Список pgsql-hackers
On Mon, Mar 18, 2019 at 11:56 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 03/18/19 22:38, Eric Hanson wrote:
> rows are really second class citizens:  They aren't tracked with
> pg_catalog.pg_depend, they aren't deleted when the extension is dropped,
> etc.

This. You have other interests as well, but this is the one I was thinking
about a few years ago in [1] (starting at "Ok, how numerous would be the
problems with this:").

Cool!

First thoughts, it seems like a sensible way to go given the premise that extensions are immutable.  But -- I'd be a little concerned about the performance ramifications.  Usually there are not jillions of database objects in a extension, but if they started containing data, there sure could be jillions of rows.  Every row would have to be checked for existence as part of an extension on every insert or update, no?

Nobody ever chimed in to say how numerous they did or didn't think the
problems would be. I was actually thinking recently about sitting down
and trying to write that patch, as no one had exactly stood up to say
"oh heavens no, don't write that." But my round tuits are all deployed
elsewhere at the moment.

Likewise, if nobody tells me "oh sheeze extensions can already do all this" I'm going to assume they can't. :-)

I'd still like to discuss the ideas.

Me too!

Ok, I should probably come out and say it:  I think the user story of "There is some kind of packaging system that can contain both schema and data, and these packages can be installed and removed along with their dependencies atomically" is fairly obvious and desirable.  But getting there while accepting the premises that are currently baked into extensions might be a tall order.

Extensions have a middleware-ish aspect to them -- they are immutable and that immutability is checked and enforced at runtime.  That might scale just fine to a few dozen database objects that only check pg_depends on DDL operations, but if we introduce record tracking and start sticking sticks into the wheels of the DML, things could go south really quickly it seems.

I really like a more git-like pattern, where you are free to modify the working copy of a repository (or in this case an extension), and instead of being blocked from changing things, the system tells the user what has changed and how, and gives sensible options for what to do about it.  That way it doesn't incur a performance hit, and the user can do a kind of "git status" on their extension to show any changes.

How about an extension system whose first principle is that an extension is made up of rows, period.  What about the DDL you ask?  Well...

Imagine a system catalog whose sole purpose is to contain database object definitions like "CREATE VIEW ...", similar to those produced by pg_catalog.pg_get_viewdef(), pg_catalog.get_functiondef(), etc.  Let's call this catalog `def`. There is exactly one VIEW for every type of database object in PostgreSQL. def.table, def.role, def.sequence, def.operator, def.type, etc. Each def.* VIEW contains only two columns, `id` and `definition`.  The `id` column contains a unique identifier for the object, and the `definition` column contains the SQL statement that will recreate the object.

So, inside this system catalog is the SQL definition statement of every database object.  In theory, the contents of all the `definition` columns together would be similar to the contents of pg_dump --schema-only.

Now, imagine all these def.* views had insert triggers, so that on insert, it actually executes the contents of the `definition` column.  In theory, we could pg_restore the data in the def.* views, and it would recreate all the database objects. It could shift all that logic out of pg_dump and into the database.

So using the def.* catalog, we could package both "regular" table data and system objects via the contents of the def.* catalog views.  Packages are a collection rows, period. Build up from there.

I'm working on a prototype called bundle [1], it still has a ways to go but it's showing some promise.  It is going to require brining into PostgreSQL the missing pg_get_*def functions, as folks have talked about before [2].

Thanks,
Eric


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: Willing to fix a PQexec() in libpq module
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Concurrency bug with vacuum full (cluster) and toast