Re: [HACKERS] Faster methods for getting SPI results
От | Jim Nasby |
---|---|
Тема | Re: [HACKERS] Faster methods for getting SPI results |
Дата | |
Msg-id | a0df5cc1-d613-0039-5426-3fb115e9c5f1@BlueTreble.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Faster methods for getting SPI results (Craig Ringer <craig@2ndquadrant.com>) |
Ответы |
Re: [HACKERS] Faster methods for getting SPI results (460%improvement)
(Jim Nasby <Jim.Nasby@BlueTreble.com>)
|
Список | pgsql-hackers |
On 12/28/16 3:14 AM, Craig Ringer wrote: > On 28 December 2016 at 12:32, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> On 12/27/16 9:10 PM, Craig Ringer wrote: >>> >>> On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >>> >>>> I've looked at this some more, and ITSM that the only way to do this >>>> without >>>> some major surgery is to create a new type of Destination specifically >>>> for >>>> SPI that allows for the execution of an arbitrary C function for each >>>> tuple >>>> to be sent. >>> >>> >>> That sounds a lot more sensible than the prior proposals. Callback driven. >> >> >> Are there other places this would be useful? I'm reluctant to write all of >> this just to discover it doesn't help performance at all, but if it's useful >> on it's own I can just submit it as a stand-alone patch. > > I don't have a use for it personally. In BDR and pglogical anything > that does work with nontrivial numbers of tuples uses lower level > scans anyway. > > I expect anything that uses the SPI to run arbitrary user queries > could have a use for something like this though. Any PL, for one. Just a quick update on this: I've gotten this working well enough in plpython to do some performance testing. This patch does change python results from being a list of dicts to a dict of lists, but I suspect the vast majority of the speed improvement is from not creating a tuplestore. The attached sample (from OS X /usr/bin/sample) is interesting. The highlight is: > ! 3398 SPI_execute_callback (in postgres) + 163 [0x110125793] > ! 3394 _SPI_execute_plan (in postgres) + 1262 [0x1101253fe] > ! : 2043 standard_ExecutorRun (in postgres) + 288 [0x1100f9a40] > ! : | 1990 ExecProcNode (in postgres) + 250 [0x1100fd62a] The top line is the entry into SPI from plpython. The bottom line is generate_series into a tuplestore and then reading from that tuplestore. Almost all the time being spent in standard_ExecutorRun is in PLy_CSreceive, which is appending values to a set of python lists as it's getting tuples. The other noteworthy item in the sample is this: > 535 list_dealloc (in Python) + 116,103,... [0x11982b1b4,0x11982b1a7,...] that's how long it's taking python to free the 3 lists (each with 9999999 python int objects). In short (and at best*), this makes plpython just as fast at processing results as SELECT count(SELECT s, s, s FROM generate_series()). The * on that is there's something odd going on where plpython starts out really fast at this, then gets 100% slower. I've reached out to some python folks about that. Even so, the overall results from a quick test on my laptop are (IMHO) impressive: Old Code New Code Improvement Pure SQL 2 sec 2 sec plpython 12.7-14 sec 4-10 sec ~1.3-3x plpython - SQL 10.7-12 sec 2-8 sec ~1.3-6x Pure SQL is how long an equivalent query takes to run with just SQL. plpython - SQL is simply the raw python times minus the pure SQL time. I suspect other PL languages that have fairly fast object alloc and dealloc would see a similar benefit. BTW, the patch currently breaks on nested calls to plpython, but I don't think that should change the performance much. The test function: > CREATE OR REPLACE FUNCTION test_series( > iter int > ) RETURNS int LANGUAGE plpythonu AS $body$ > d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM generate_series(1,{})s'.format(iter) ) > return len(d['some_table_id']) > $body$; -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: