Re: Custom Scan APIs (Re: Custom Plan node)

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Custom Scan APIs (Re: Custom Plan node)
Дата
Msg-id CA+U5nMJeZK=av0_QVZeeapYqt41GRB1ocDJgLzO9+GhVDauKvQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Custom Scan APIs (Re: Custom Plan node)  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Ответы Re: Custom Scan APIs (Re: Custom Plan node)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Custom Scan APIs (Re: Custom Plan node)  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers
On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Brief summary of the current approach that has been revised from my
> original submission through the discussion on pgsql-hackers:
>
> The plannode was renamed to CustomPlan, instead of CustomScan, because
> it dropped all the hardcoded portion that assumes the custom-node shall
> perform as alternative scan or join method, because it prevents this
> custom-node to perform as other stuff; like sort or append potentially.
> According to the suggestion by Tom, I put a structure that contains
> several function pointers on the new CustomPlan node, and extension will
> allocate an object that extends CustomPlan node.
> It looks like polymorphism in object oriented programming language.
> The core backend knows abstracted set of methods defined in the
> tables of function pointers, and extension can implement its own logic
> on the callback, using private state on the extended object.

I just wanted to add some review comments here. I also apologise for
not reviewing this earlier; I had misunderstood the maturity of the
patch and had assumed it was a request for comments/WIP.

Overall, I very much support the concept of providing for alternate
scans. I like the placement of calls in the optimizer and we'll be
able to do much with that. Other comments in order that I consider
them important.

* There is no declarative structure for this at all. I was expecting
to see a way to declare that a specific table might have an alternate
scan path, but we just call the plugin always and it has to separately
make a cache lookup to see if anything extra is needed. The Index AM
allows us to perform scans, yet indexes are very clearly declared and
easily and clearly identifiable. We need the same thing for alternate
plans.

* There is no mechanism at all for maintaining other data structures.
Are we supposed to use the Index AM? Triggers? Or? The lack of clarity
there is disturbing, though I could be simply missing something big
and obvious.

* There is no catalog support. Complex APIs in Postgres typically have
a structure like pg_am which allows the features to be clearly
identified. I'd be worried about our ability to keep track of so many
calls in such pivotal places without that. I want to be able to know
what a plugin is doing, especially when it will likely come in binary
form. I don't see an easy way to have plugins partially override each
other or work together. What happens when I want to use Mr.X's clever
new join plugin at the same time as Mr.Y's GPU accelerator?

* How do we control security? What stops the Custom Scan API from
overriding privileges? Shouldn't the alternate data structures be
recognised as objects so we can grant privileges? Or do we simply say
if an alternate data structure is linked to a heap then has implied
privileges. It would be a shame to implement better security in one
patch and then ignore it in another (from the same author).

All of the above I can let pass in this release, but in the longer
term we need to look for more structure around these ideas so we can
manage and control what happens. The way this is now is quite raw -
suitable for R&D but not for longer term production usage by a wider
audience, IMHO. I wouldn't like to make commitments about the
longevity of this API either; if we accept it, it should have a big
"may change radically" sign hanging on it. Having said that, I am
interested in progress here and I accept that things will look like
this at this stage of the ideas process, so these are not things to
cause delay.

Some things I would like to see change on in this release are...

* It's not clear to me when we load/create the alternate data
structures. That can't happen at _init time. I was expecting this to
look like an infrastructure for unlogged indexes, but it doesn't look
like that either.

* The name Custom makes me nervous. It sounds too generic, as if the
design or objectives for this is are a little unclear. AlternateScan
sounds like a better name since its clearer that we are scanning an
alternate data structure rather than the main heap.

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.



Two additional use cases I will be looking to explore will be

* Code to make Mat Views recognised as alternate scan targets
* Code to allow queries access to sampled data rather the fully
detailed data, if the result would be within acceptable tolerance for
user

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Problem with txid_snapshot_in/out() functionality
Следующее
От: David Rowley
Дата:
Сообщение: Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses