Re: [HACKERS] Faster methods for getting SPI results

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: [HACKERS] Faster methods for getting SPI results
Дата
Msg-id a0df5cc1-d613-0039-5426-3fb115e9c5f1@BlueTreble.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Faster methods for getting SPI results  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: [HACKERS] Faster methods for getting SPI results (460%improvement)  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers
On 12/28/16 3:14 AM, Craig Ringer wrote:
> On 28 December 2016 at 12:32, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 12/27/16 9:10 PM, Craig Ringer wrote:
>>>
>>> On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>>
>>>> I've looked at this some more, and ITSM that the only way to do this
>>>> without
>>>> some major surgery is to create a new type of Destination specifically
>>>> for
>>>> SPI that allows for the execution of an arbitrary C function for each
>>>> tuple
>>>> to be sent.
>>>
>>>
>>> That sounds a lot more sensible than the prior proposals. Callback driven.
>>
>>
>> Are there other places this would be useful? I'm reluctant to write all of
>> this just to discover it doesn't help performance at all, but if it's useful
>> on it's own I can just submit it as a stand-alone patch.
>
> I don't have a use for it personally. In BDR and pglogical anything
> that does work with nontrivial numbers of tuples uses lower level
> scans anyway.
>
> I expect anything that uses the SPI to run arbitrary user queries
> could have a use for something like this though. Any PL, for one.

Just a quick update on this: I've gotten this working well enough in 
plpython to do some performance testing. This patch does change python 
results from being a list of dicts to a dict of lists, but I suspect the 
vast majority of the speed improvement is from not creating a tuplestore.

The attached sample (from OS X /usr/bin/sample) is interesting. The 
highlight is:

> !       3398 SPI_execute_callback  (in postgres) + 163  [0x110125793]
> !         3394 _SPI_execute_plan  (in postgres) + 1262  [0x1101253fe]
> !         : 2043 standard_ExecutorRun  (in postgres) + 288  [0x1100f9a40]
> !         : | 1990 ExecProcNode  (in postgres) + 250  [0x1100fd62a]

The top line is the entry into SPI from plpython. The bottom line is 
generate_series into a tuplestore and then reading from that tuplestore. 
Almost all the time being spent in standard_ExecutorRun is in 
PLy_CSreceive, which is appending values to a set of python lists as 
it's getting tuples.

The other noteworthy item in the sample is this:

> 535 list_dealloc  (in Python) + 116,103,...  [0x11982b1b4,0x11982b1a7,...]

that's how long it's taking python to free the 3 lists (each with 
9999999 python int objects).

In short (and at best*), this makes plpython just as fast at processing 
results as SELECT count(SELECT s, s, s FROM generate_series()).

The * on that is there's something odd going on where plpython starts 
out really fast at this, then gets 100% slower. I've reached out to some 
python folks about that. Even so, the overall results from a quick test 
on my laptop are (IMHO) impressive:

             Old Code        New Code    Improvement
Pure SQL    2 sec          2 sec
plpython    12.7-14 sec    4-10 sec     ~1.3-3x
plpython - SQL 10.7-12 sec 2-8 sec      ~1.3-6x

Pure SQL is how long an equivalent query takes to run with just SQL.
plpython - SQL is simply the raw python times minus the pure SQL time.

I suspect other PL languages that have fairly fast object alloc and 
dealloc would see a similar benefit.

BTW, the patch currently breaks on nested calls to plpython, but I don't 
think that should change the performance much.

The test function:
> CREATE OR REPLACE FUNCTION test_series(
>     iter int
> ) RETURNS int LANGUAGE plpythonu AS $body$
> d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM
generate_series(1,{})s'.format(iter) )
 
> return len(d['some_table_id'])
> $body$;
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: [HACKERS] Hash support for grouping sets
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] Logical decoding on standby