[HACKERS] Faster methods for getting SPI results

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема [HACKERS] Faster methods for getting SPI results
Дата
Msg-id 015627b7-882c-390a-93d8-7b1d984001f6@BlueTreble.com
обсуждение исходный текст
Ответы Re: [HACKERS] Faster methods for getting SPI results  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: [HACKERS] Faster methods for getting SPI results  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
I've been looking at the performance of SPI calls within plpython. 
There's a roughly 1.5x difference from equivalent python code just in 
pulling data out of the SPI tuplestore. Some of that is due to an 
inefficiency in how plpython is creating result dictionaries, but fixing 
that is ultimately a dead-end: if you're dealing with a lot of results 
in python, you want a tuple of arrays, not an array of tuples.

While we could just brute-force a tuple of arrays by plowing through the 
SPI tuplestore (this is what pl/r does), there's still a lot of extra 
work involved in doing that. AFAICT there's at least 2 copies that 
happen between the executor producing a tuple and it making it into the 
tuplestore, plus the tuplestore is going to consume a potentially very 
large amount of memory for a very short period of time, before all the 
data gets duplicated (again) into python objects.

It would be a lot more efficient if we could just grab datums from the 
executor and make a single copy into plpython (or R), letting the PL 
deal with all the memory management overhead.

I briefly looked at using SPI cursors to do just that, but that looks 
even worse: every fetch is executed in a subtransaction, and every fetch 
creates an entire tuplestore even if it's just going to return a single 
value. (But hey, we never claimed cursors were fast...)

Is there any way to avoid all of this? I'm guessing one issue might be 
that we don't want to call an external interpreter while potentially 
holding page pins, but even then couldn't we just copy a single tuple at 
a time and save a huge amount of palloc overhead?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] pg_background contrib module proposal
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] bigint vs txid user confusion