Re: short-cutting if sum()>constant

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: short-cutting if sum()>constant
Дата
Msg-id 162867790912230352v5b6e395fm1880111c04eaa626@mail.gmail.com
обсуждение исходный текст
Ответ на Re: short-cutting if sum()>constant  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Ответы Re: short-cutting if sum()>constant  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Re: short-cutting if sum()>constant  (Craig Ringer <craig@postnewspapers.com.au>)
Список pgsql-sql
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> On Wed, 23 Dec 2009 08:38:52 +0100
> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> > As even more unexpected... when all row are >0 and most of them
>> > are equal to 1 the generate_series performs appreciably better
>> > (roughly 15% faster).
>> > And I think your version can be further optimised:
>> > select count(*) from (select (generate_series(1,a))::int from
>> > data limit 90000000) s;
>> > This perform 30% faster.
>> >
>> > So what's so slow in the plpgsql version?
>>
>> don't forget - plpgsql is interpret - it is best as glue for SQL
>
> Yeah but how many times is it interpreted in a 1M cycle expecially
> when you ran it more than one time and the interpreter should be
> "hot"? I don't know how much "interpreter" work can be saved from a
> previous run of the function... but still there should be something
> left [1].
> plpgsql isn't duck typed and that function doesn't leave too much
> space for "interpretation".
> Unless I'm missing something, in a typed language like plpgsql that
> function could be easily turned into its C equivalent by the
> interpreter.
> I really thought the cost of running plpgsql was much lower in such
> kind of situation.

The problem is massive cycle. Plpgsql really isn't language for this.
This interpret missing own arithmetic unit. So every expression is
translated to SELECT statement

IF a > c ~ SELECT a > c
a = a + 1 ~ SELECT a + 1

these queries are executed in some special mode, but still it is more
expensive than C a = a + 1

>
> Consider that the cost doesn't come from the initial cost of
> interpreting the function but it is proportional to the numbers of
> cycles.
>
> Even hard coding the LIMIT in the plpgsql version doesn't make it
> faster enough. So it is not the planner.
>
> In a 10M rows dataset where 8332885 are =1 and the rest are =2:
>
> If I'm testing for >900000
> I get 940.580 ms vs 1302.162 ms
> If I'm testing for >9000000
> I get 8816.263 ms vs 12306.483 ms
>
> BTW
> select sum(a) from data; takes 1999.492 ms.
> select count(*) from data; takes 1612.039 ms
>

it is slower, because there is one NULL test more.

> While smart your way is definitively more convoluted and should
> require much more cpu cycles and memory.
>
> The plperl version:
> create or replace function totano(int)
> returns int as
> $$
>        my $b = 0;
>        my $row;
>        my $sth = spi_query("select a from data");
>        while(defined($row=spi_fetchrow($sth))) {
>                $b += $row->{a};
>                if($b>=$_[0]) {
>                        return $b;
>                }
>        }
>        return $b;
> $$ LANGUAGE plperl;
>
> Is 10 times slower than the plpgsql version.
>

PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses normal
standard recordset - it means so your query generate full scan. In
your plperl code, you have to use explicit CURSORS. Then the speed
should be comparable. Plperlu is little bit faster, but plpgsql has
direct cursor support.


>> statement. I don't thing so plpgsql is slow - speed is similar to
>> using buildin functionality. But I am sure, rewritening your
>> function to C could help. If you need maximal speed.
>
> In my case it is not worth. I was just looking into something that
> didn't have to scan all the rows.
> With the real distribution of data and no real workload on the box
> your solution is faster but the difference while unexpectedly
> appreciable is nearly negligible.
>
> Anyway I'm not able to justify the difference in speed between
> plpgsql and your solution in such case unless plpgsql is switching
> back and forward between binary data and their text representation.
> It would be nice if someone that know plpgsql internals explain
> where the cost comes from.

I thing so most expensive part in your plpgsql code is two expression.

Regards
Pavel

>
>> I thing, so there are other trick, I am not sure if it is faster.
>> You can create own aggregate. In state function you can calculate
>> and check state value. If it is over your limit, then you can raise
>> exception. So if your query will be finished with custom exception,
>> then sum(c) > n is true.
>
> I may test it later.
>
> thanks
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


В списке pgsql-sql по дате отправления:

Предыдущее
От: Ivan Sergio Borgonovo
Дата:
Сообщение: Re: short-cutting if sum()>constant
Следующее
От: Ivan Sergio Borgonovo
Дата:
Сообщение: Re: short-cutting if sum()>constant