Re: [HACKERS] Faster methods for getting SPI results

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: [HACKERS] Faster methods for getting SPI results
Дата
Msg-id a9de0b47-7cbd-a30b-1a22-5b631bbd3fae@BlueTreble.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Faster methods for getting SPI results  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: [HACKERS] Faster methods for getting SPI results  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On 12/21/16 8:21 AM, Jim Nasby wrote:
> On 12/20/16 10:14 PM, Jim Nasby wrote:
>> It would be a lot more efficient if we could just grab datums from the
>> executor and make a single copy into plpython (or R), letting the PL
>> deal with all the memory management overhead.
>>
>> I briefly looked at using SPI cursors to do just that, but that looks
>> even worse: every fetch is executed in a subtransaction, and every fetch
>> creates an entire tuplestore even if it's just going to return a single
>> value. (But hey, we never claimed cursors were fast...)
>>
>> Is there any way to avoid all of this? I'm guessing one issue might be
>> that we don't want to call an external interpreter while potentially
>> holding page pins, but even then couldn't we just copy a single tuple at
>> a time and save a huge amount of palloc overhead?
>
> AFAICT that's exactly how DestRemote works: it grabs a raw slot from the
> executor, makes sure it's fully expanded, and sends it on it's way via
> pq_send*(). So presumably the same could be done for SPI, by creating a
> new CommandDest (ISTM none of the existing ones would do what we want).
>
> I'm not sure what the API for this should look like. One possibility is
> to have SPI_execute and friends accept a flag that indicates not to
> build a tupletable. I don't think a query needs to be read-only to allow
> for no tuplestore, so overloading read_only seems like a bad idea.
>
> Another option is to treat this as a "lightweight cursor" that only
> allows forward fetches. One nice thing about that option is it leaves
> open the possibility of using a small tuplestore for each "fetch",
> without all the overhead that a full blown cursor has. This assumes
> there are some use cases where you want to operate on relatively small
> sets of tuples at a time, but you don't need to materialize the whole
> thing in one shot.

I've looked at this some more, and ITSM that the only way to do this 
without some major surgery is to create a new type of Destination 
specifically for SPI that allows for the execution of an arbitrary C 
function for each tuple to be sent. AFAICT this should be fairly safe, 
since DestRemote can potentially block while sending a tuple and also 
runs output functions (which presumably could themselves generate errors).

_SPI_execute_plan() would need to accept an arbitrary DestReceiver 
struct, and use that (if specified) instead of creating it's own.

Once that's done, my plan is to allow plpy to use this functionality 
with a receiver function that adds tuple fields to corresponding python 
lists. This should result in significantly less overhead than going 
through a tuplestore when dealing with a large number of rows.

Before I go code this up, I'd like to know if there's some fatal flaw in 
this, or if there's an easier way to hack this up just to test my 
performance theory.

Suggestions?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: [HACKERS] Hooks
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity