Обсуждение: Does FILTER in SEQSCAN short-circuit AND?

Поиск
Список
Период
Сортировка

Does FILTER in SEQSCAN short-circuit AND?

От
"Carlo Stonebanks"
Дата:
Sample code:

SELECT *
FROM MyTable
WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar'

Let's say this required a SEQSCAN because there were no indexes to support
column foo. For every row where foo <> 'bar' would the filter on the SEQSCAN
short-circuit the AND return false right away, or would it still execute
MySlowFunc('foo') ?

Thanks!

Carlo


Re: Does FILTER in SEQSCAN short-circuit AND?

От
"Kevin Grittner"
Дата:
"Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

> SELECT *
> FROM MyTable
> WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar'
>
> Let's say this required a SEQSCAN because there were no indexes to
> support column foo. For every row where foo <> 'bar' would the
> filter on the SEQSCAN short-circuit the AND return false right
> away, or would it still execute MySlowFunc('foo') ?

For that example, I'm pretty sure it will skip the slow function for
rows which fail the first test.  A quick test confirmed that for me.
If you create a sufficiently slow function, you shouldn't have much
trouble testing that yourself.  :-)

-Kevin

Re: Does FILTER in SEQSCAN short-circuit AND?

От
Craig James
Дата:
On 5/18/10 3:28 PM, Carlo Stonebanks wrote:
> Sample code:
>
> SELECT *
> FROM MyTable
> WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar'
>
> Let's say this required a SEQSCAN because there were no indexes to
> support column foo. For every row where foo <> 'bar' would the filter on
> the SEQSCAN short-circuit the AND return false right away, or would it
> still execute MySlowFunc('foo') ?

I asked a similar question a few years back, and the answer is that the planner just makes a guess and applies it to
allfunctions.  It has no idea whether your function is super fast or incredibly slow, they're all assigned the same
cost.

In this fairly simple case, the planner might reasonably guess that "foo = 'bar'" will always be faster than
"AnyFunc(foo)= 'bar'".  But for real queries, that might not be the case. 

In my case, I have a function that is so slow that it ALWAYS is good to avoid it.  Unfortunately, there's no way to
explainthat to Postgres, so I have to use other tricks to force the planner not to use it. 

   select * from
     (select * from MyTable where foo = 'bar' offset 0)
     where MySlowFunc(foo) = 'bar';

The "offset 0" prevents the planner from collapsing this query back into your original syntax.  It will only apply
MySlowFunc()to rows where you already know that foo = 'bar'. 

It would be nice if Postgres had a way to assign a cost to every function. Until then, you have to use convoluted SQL
ifyou have a really slow function. 

Craig

Re: Does FILTER in SEQSCAN short-circuit AND?

От
Thomas Kellerer
Дата:
Craig James wrote on 27.05.2010 23:13:
> It would be nice if Postgres had a way to assign a cost to every
> function.

Isn't that what the COST parameter is intended to be:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

Thomas

Re: Does FILTER in SEQSCAN short-circuit AND?

От
"Kevin Grittner"
Дата:
Craig James <craig_james@emolecules.com> wrote:

> It would be nice if Postgres had a way to assign a cost to every
> function.

The COST clause of CREATE FUNCTION doesn't do what you want?

http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html

-Kevin

Re: Does FILTER in SEQSCAN short-circuit AND?

От
Craig James
Дата:
On 5/27/10 2:28 PM, Kevin Grittner wrote:
> Craig James<craig_james@emolecules.com>  wrote:
>
>> It would be nice if Postgres had a way to assign a cost to every
>> function.
>
> The COST clause of CREATE FUNCTION doesn't do what you want?
>
> http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html

Cool ... I must have missed it when this feature was added.  Nice!

Craig

Performance costs of various PL languages

От
"Carlo Stonebanks"
Дата:
We are currently using pltclu as our PL of choice AFTER plpgSql.

I'd like to know if anyone can comment on the performance costs of the
various PL languages BESIDES C. For example, does pltclu instantiate faster
than pltcl (presumably because it uses a shared interpreter?) Is Perl more
lightweight?

I know that everything depends on context - what you are doing with it, e.g.
choose Tcl for string handling vs. Perl for number crunching - but for those
who know about this, is there a clear performance advantage for any of the
various PL languages - and if so, is it a difference so big to be worth
switching?

I ask this because I had expected to see pl/pgsql as a clear winner in terms
of performance over pltclu, but my initial test showed the opposite. I know
this may be an apples vs oranges problem and I will test further, but if
anyone has any advice or insight, I would appreciate it so I can tailor my
tests accordingly.


Thanks,

Carlo




Re: Performance costs of various PL languages

От
Pavel Stehule
Дата:
Hello

2011/12/27 Carlo Stonebanks <stonec.register@sympatico.ca>:
> We are currently using pltclu as our PL of choice AFTER plpgSql.
>
> I'd like to know if anyone can comment on the performance costs of the
> various PL languages BESIDES C. For example, does pltclu instantiate faster
> than pltcl (presumably because it uses a shared interpreter?) Is Perl more
> lightweight?
>
> I know that everything depends on context - what you are doing with it, e.g.
> choose Tcl for string handling vs. Perl for number crunching - but for those
> who know about this, is there a clear performance advantage for any of the
> various PL languages - and if so, is it a difference so big to be worth
> switching?
>
> I ask this because I had expected to see pl/pgsql as a clear winner in terms
> of performance over pltclu, but my initial test showed the opposite. I know
> this may be an apples vs oranges problem and I will test further, but if
> anyone has any advice or insight, I would appreciate it so I can tailor my
> tests accordingly.
>

A performance strongly depends on use case.

PL/pgSQL has fast start but any expression is evaluated as simple SQL
expression - and some repeated operation should be very expensive -
array update, string update. PL/pgSQL is best as SQL glue. Positive to
performance is type compatibility between plpgsql and Postgres.
Interpret plpgsql is very simply - there are +/- zero optimizations -
plpgsql code should be minimalistic, but when you don't do some really
wrong, then a speed is comparable with PHP.

http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language

PL/Perl has slower start - but string or array operations are very
fast. Perl has own expression evaluator - faster than expression
evaluation in plpgsql. On second hand - any input must be transformed
from postgres format to perl format and any result must be transformed
too. Perl and other languages doesn't use data type compatible with
Postgres.

Regards

Pavel Stehule


>
> Thanks,
>
> Carlo
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance costs of various PL languages

От
Merlin Moncure
Дата:
On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> 2011/12/27 Carlo Stonebanks <stonec.register@sympatico.ca>:
>> We are currently using pltclu as our PL of choice AFTER plpgSql.
>>
>> I'd like to know if anyone can comment on the performance costs of the
>> various PL languages BESIDES C. For example, does pltclu instantiate faster
>> than pltcl (presumably because it uses a shared interpreter?) Is Perl more
>> lightweight?
>>
>> I know that everything depends on context - what you are doing with it, e.g.
>> choose Tcl for string handling vs. Perl for number crunching - but for those
>> who know about this, is there a clear performance advantage for any of the
>> various PL languages - and if so, is it a difference so big to be worth
>> switching?
>>
>> I ask this because I had expected to see pl/pgsql as a clear winner in terms
>> of performance over pltclu, but my initial test showed the opposite. I know
>> this may be an apples vs oranges problem and I will test further, but if
>> anyone has any advice or insight, I would appreciate it so I can tailor my
>> tests accordingly.
>>
>
> A performance strongly depends on use case.
>
> PL/pgSQL has fast start but any expression is evaluated as simple SQL
> expression - and some repeated operation should be very expensive -
> array update, string update. PL/pgSQL is best as SQL glue. Positive to
> performance is type compatibility between plpgsql and Postgres.
> Interpret plpgsql is very simply - there are +/- zero optimizations -
> plpgsql code should be minimalistic, but when you don't do some really
> wrong, then a speed is comparable with PHP.
>
> http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language
>
> PL/Perl has slower start - but string or array operations are very
> fast. Perl has own expression evaluator - faster than expression
> evaluation in plpgsql. On second hand - any input must be transformed
> from postgres format to perl format and any result must be transformed
> too. Perl and other languages doesn't use data type compatible with
> Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.

merlin

Re: Performance costs of various PL languages

От
Andrew Dunstan
Дата:

On 12/27/2011 05:54 PM, Merlin Moncure wrote:
> On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule<pavel.stehule@gmail.com>  wrote:
>> Hello
>>
>> 2011/12/27 Carlo Stonebanks<stonec.register@sympatico.ca>:
>>> We are currently using pltclu as our PL of choice AFTER plpgSql.
>>>
>>> I'd like to know if anyone can comment on the performance costs of the
>>> various PL languages BESIDES C. For example, does pltclu instantiate faster
>>> than pltcl (presumably because it uses a shared interpreter?) Is Perl more
>>> lightweight?
>>>
>>> I know that everything depends on context - what you are doing with it, e.g.
>>> choose Tcl for string handling vs. Perl for number crunching - but for those
>>> who know about this, is there a clear performance advantage for any of the
>>> various PL languages - and if so, is it a difference so big to be worth
>>> switching?
>>>
>>> I ask this because I had expected to see pl/pgsql as a clear winner in terms
>>> of performance over pltclu, but my initial test showed the opposite. I know
>>> this may be an apples vs oranges problem and I will test further, but if
>>> anyone has any advice or insight, I would appreciate it so I can tailor my
>>> tests accordingly.
>>>
>> A performance strongly depends on use case.
>>
>> PL/pgSQL has fast start but any expression is evaluated as simple SQL
>> expression - and some repeated operation should be very expensive -
>> array update, string update. PL/pgSQL is best as SQL glue. Positive to
>> performance is type compatibility between plpgsql and Postgres.
>> Interpret plpgsql is very simply - there are +/- zero optimizations -
>> plpgsql code should be minimalistic, but when you don't do some really
>> wrong, then a speed is comparable with PHP.
>>
>> http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language
>>
>> PL/Perl has slower start - but string or array operations are very
>> fast. Perl has own expression evaluator - faster than expression
>> evaluation in plpgsql. On second hand - any input must be transformed
>> from postgres format to perl format and any result must be transformed
>> too. Perl and other languages doesn't use data type compatible with
>> Postgres.
> One big advantage pl/pgsql has over scripting languages is that it
> understands postgresql types natively.  It knows what a postgres array
> is, and can manipulate one directly.  pl/perl would typically have to
> have the database convert it to a string, parse it into a perl
> structure, do the manipulation, then send it to the database to be
> parsed again.  If your procedure code is mainly moving data between
> tables and doing minimal intermediate heavy processing, this adds up
> to a big advantage.  Which pl to go with really depends on what you
> need to do.  pl/pgsql is always my first choice though.
>
> perl and tcl are not particularly fast languages in the general case
> -- you are largely at the mercy of how well the language's syntax or
> library features map to the particular problem you're solving.  if you
> need a fast general purpose language in the backend and are (very
> understandably) skeptical about C, I'd look at pl/java.
>


PLV8, which is not yet ready for prime time, maps many common Postgres
types into native JS types without the use of Input/Output functions,
which means the conversion  is very fast. It's work which could very
well do with repeating for the other PL's.

cheers

andrew

Re: Performance costs of various PL languages

От
"Carlo Stonebanks"
Дата:
Thanks guys.

Ah, Pl/java - of course. I would miss writing the code right in the SQL
script, but that would have been true of C as well.

None of these procedures really qualify as stored procs that move data;
rather they are scalar functions used for fuzzy string comparisons based on
our own domain logic - imagine something like,

   SELECT *
   FROM fathers AS f, sons AS s
   WHERE same_name(f.last_name, s.last_name)

... and same_name had business logic that corrected for O'reilly vs oreilly,
Van De Lay vs Vandelay, etc.

The point is that as we learn about the domain, we would add the rules into
the function same_name() so that all apps would benefit from the new rules.

Some of the functions are data-driven, for example a table of common
abbreviations with regex or LIKE expressions that would be run against both
strings so that each string is reduced to common abbreviations (i.e. lowest
common denominator) then compared, e.g.

   SELECT *
   FROM companies AS c
   WHERE same_business_name(s, 'ACME Business Supplies, Incorporated')

Would reduce both parameters down to the most common abbreviation and then
compare again with fuzzy logic.

Of course, even if this was written in C, the function would be data-bound
as it read from the abbreviation table - unless you guys tell that there is
a not inconsiderable cost involved in type conversion from PG to internal
vars.

Carlo


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: December 27, 2011 5:54 PM
To: Pavel Stehule
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance costs of various PL languages

On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:
> Hello
>
> 2011/12/27 Carlo Stonebanks <stonec.register@sympatico.ca>:
>> We are currently using pltclu as our PL of choice AFTER plpgSql.
>>
>> I'd like to know if anyone can comment on the performance costs of the
>> various PL languages BESIDES C. For example, does pltclu instantiate
faster
>> than pltcl (presumably because it uses a shared interpreter?) Is Perl
more
>> lightweight?
>>
>> I know that everything depends on context - what you are doing with it,
e.g.
>> choose Tcl for string handling vs. Perl for number crunching - but for
those
>> who know about this, is there a clear performance advantage for any of
the
>> various PL languages - and if so, is it a difference so big to be worth
>> switching?
>>
>> I ask this because I had expected to see pl/pgsql as a clear winner in
terms
>> of performance over pltclu, but my initial test showed the opposite. I
know
>> this may be an apples vs oranges problem and I will test further, but if
>> anyone has any advice or insight, I would appreciate it so I can tailor
my
>> tests accordingly.
>>
>
> A performance strongly depends on use case.
>
> PL/pgSQL has fast start but any expression is evaluated as simple SQL
> expression - and some repeated operation should be very expensive -
> array update, string update. PL/pgSQL is best as SQL glue. Positive to
> performance is type compatibility between plpgsql and Postgres.
> Interpret plpgsql is very simply - there are +/- zero optimizations -
> plpgsql code should be minimalistic, but when you don't do some really
> wrong, then a speed is comparable with PHP.
>
>
http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.
2FpgSQL_language
>
> PL/Perl has slower start - but string or array operations are very
> fast. Perl has own expression evaluator - faster than expression
> evaluation in plpgsql. On second hand - any input must be transformed
> from postgres format to perl format and any result must be transformed
> too. Perl and other languages doesn't use data type compatible with
> Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.

merlin