Re: [HACKERS] Faster methods for getting SPI results (460%improvement)

Поиск

Список

Период

Сортировка

От	Jim Nasby
Тема	Re: [HACKERS] Faster methods for getting SPI results (460%improvement)
Дата	24 января 2017 г. 09:23:07
Msg-id	4f11b9c9-4b2a-0552-faa7-24d255173679@BlueTreble.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Faster methods for getting SPI results (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы	Re: [HACKERS] Faster methods for getting SPI results (460% improvement) (Craig Ringer <craig@2ndquadrant.com>) Re: [HACKERS] Faster methods for getting SPI results (460%improvement) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список	pgsql-hackers

Дерево обсуждения

On 1/5/17 9:50 PM, Jim Nasby wrote:
> The * on that is there's something odd going on where plpython starts
> out really fast at this, then gets 100% slower. I've reached out to some
> python folks about that. Even so, the overall results from a quick test
> on my laptop are (IMHO) impressive:
>
>             Old Code        New Code    Improvement
> Pure SQL    2 sec          2 sec
> plpython    12.7-14 sec    4-10 sec     ~1.3-3x
> plpython - SQL 10.7-12 sec 2-8 sec      ~1.3-6x
>
> Pure SQL is how long an equivalent query takes to run with just SQL.
> plpython - SQL is simply the raw python times minus the pure SQL time.

I finally got all the kinks worked out and did some testing with python 
3. Performance for my test [1] improved ~460% when returning a dict of 
lists (as opposed to the current list of dicts). Based on previous 
testing, I expect that using this method to return a list of dicts will 
be about 8% slower. The inconsistency in results on 2.7 has to do with 
how python 2 handles ints.

Someone who's familiar with pl/perl should take a look at this and see 
if it would apply there. I've attached the SPI portion of this patch.

I think the last step here is to figure out how to support switching 
between the current behavior and the "columnar" behavior of a dict of 
lists. I believe the best way to do that is to add two optional 
arguments to the execution functions: container=[] and members={}, and 
then copy those to produce the output objects. That means you can get 
the new behavior by doing something like:

plpy.execute('...', container={}, members=[])

Or, more interesting, you could do:

plpy.execute('...', container=Pandas.DataFrame, members=Pandas.Series)

since that's what a lot of people are going to want anyway.

In the future we could also add a GUC to change the default behavior.

Any concerns with that approach?

1:
> d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM
generate_series(1,{})s'.format(iter) )

> return len(d['some_table_id'])
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

spi_callback.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Merlin Moncure
Дата: 24 января 2017 г., 09:11:37
Сообщение: Re: [HACKERS] Checksums by default?

Следующее

От: Peter van Hardenberg
Дата: 24 января 2017 г., 09:42:56
Сообщение: Re: [HACKERS] GSoC 2017

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Faster methods for getting SPI results (460%improvement)

Вложения

Предыдущее

Следующее