Обсуждение: immutable functions vs. join for lookups ?

От:
Enrico Weigelt
Дата:

Hi folks,

I like to use (immutable) functions for looking up serveral
(almost constant) things, i.e fetching a username by id.
This makes my queries more clear.

But is this really performant ?

Lets imagine:

We've got an table with user accounts (uid,name,...). Then we've
got another one which contains some items assigned to users, and
so are linked to them by an uid field.
Now want to view the items with usernames instead of just uid:

a) SELECT items.a, items.b, ..., users.username FROM items, users
    WHERE items.uid = users.uid;

c) CREATE FUNCTION id2username(oid) RETURNS text
    LANGUAGE 'SQL' IMMUTABLE AS '
    SELECT username AS RESULT FROM users WHERE uid = $1';

   SELECT items.a, items.b, ..., id2username(users.uid);


Which one is faster with
    a) only a few users (<50)
    b) many users ( >1k )
while we have several 10k of items ?


thx
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

От:
Tom Lane
Дата:

Enrico Weigelt <> writes:
> c) CREATE FUNCTION id2username(oid) RETURNS text
>     LANGUAGE 'SQL' IMMUTABLE AS '
>     SELECT username AS RESULT FROM users WHERE uid = $1';

This is simply dangerous.  The function is *NOT* immutable (it is
stable though).  When ... not if ... your application breaks because
you got the wrong answers, you'll get no sympathy from anyone.

The correct question to ask was "if I make a stable function like
this, is it likely to be faster than the join?".  The answer is
"probably not; at best it will be equal to the join".  The best the
planner is likely to be able to do with the function-based query
is equivalent to a nestloop with inner indexscan (assuming there is
an index on users.uid).  If that's the best plan then the join case
should find it too ... but if you are selecting a lot of items rows
then it won't be the best plan.

            regards, tom lane

От:
Enrico Weigelt
Дата:

* Tom Lane <> wrote:
> Enrico Weigelt <> writes:
> > c) CREATE FUNCTION id2username(oid) RETURNS text
> >     LANGUAGE 'SQL' IMMUTABLE AS '
> >     SELECT username AS RESULT FROM users WHERE uid = $1';
>
> This is simply dangerous.  The function is *NOT* immutable (it is
> stable though).  When ... not if ... your application breaks because
> you got the wrong answers, you'll get no sympathy from anyone.

In my case it is immutable. The username never changes.


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

От:
Jaime Casanova
Дата:

On 4/17/05, Enrico Weigelt <> wrote:
> * Tom Lane <> wrote:
> > Enrico Weigelt <> writes:
> > > c) CREATE FUNCTION id2username(oid) RETURNS text
> > >     LANGUAGE 'SQL' IMMUTABLE AS '
> > >     SELECT username AS RESULT FROM users WHERE uid = $1';
> >
> > This is simply dangerous.  The function is *NOT* immutable (it is
> > stable though).  When ... not if ... your application breaks because
> > you got the wrong answers, you'll get no sympathy from anyone.
>
> In my case it is immutable. The username never changes.
>
Even if your data never changes it *can* change so the function should
be at most stable not immutable.

regards,
Jaime Casanova

От:
Dawid Kuroczko
Дата:

On 4/15/05, Enrico Weigelt <> wrote:
> a) SELECT items.a, items.b, ..., users.username FROM items, users
>         WHERE items.uid = users.uid;
>
> c) CREATE FUNCTION id2username(oid) RETURNS text
>     LANGUAGE 'SQL' IMMUTABLE AS '
>         SELECT username AS RESULT FROM users WHERE uid = $1';

You will be told that this function is not immutable but stable, and this
is quite right.  But consider such a function:

CREATE OR REPLACE FUNCTION id2username (oid int) RETURNS TEXT AS $$
  BEGIN
    IF oid = 0 THEN RETURN 'foo';
    ELSIF oid = 1 THEN RETURN 'bar';
    END IF;
  END;
$$ LANGUAGE plpgsql IMMUTABLE;

versus a lookup table with similar data.  Logic suggests it should be faster
than a table...  It got me worried when I added: "RAISE WARNING 'Called'"
after begin and I got lots of "Called" warnings when using this IMMUTABLE
function in select...  And the timings for ~6000 values in aaa table
(and two values in lookup table) are:

There is a query, output of the EXPLAIN ANALYZE, Time of EXPLAIN
ANALYZE and "Real time" of SELECT (without EXPLAIN ANALYZE):

a) simple select from temp table, and a lookup cost:
  EXPLAIN ANALYZE SELECT n FROM aaa;
 Seq Scan on aaa  (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.011..24.849 rows=6144 loops=1)
 Total runtime: 51.881 ms
(2 rows)
Time: 52,882 ms
Real time: 16,261 ms

  EXPLAIN ANALYZE SELECT id2username(n) FROM aaa limit 2;
Limit  (cost=0.00..0.03 rows=2 width=4) (actual time=0.111..0.150
rows=2 loops=1)
   ->  Seq Scan on aaa  (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.102..0.129 rows=2 loops=1)
 Total runtime: 0.224 ms
(3 rows)
Time: 1,308 ms
Real time: 1,380 ms

b) natural join with lookup table:
  EXPLAIN ANALYZE SELECT username FROM aaa NATURAL JOIN lookup;
 Hash Join  (cost=2.45..155.09 rows=3476 width=32) (actual
time=0.306..83.677 rows=6144 loops=1)
   Hash Cond: ("outer".n = "inner".n)
   ->  Seq Scan on aaa  (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.006..25.517 rows=6144 loops=1)
   ->  Hash  (cost=2.16..2.16 rows=116 width=36) (actual
time=0.237..0.237 rows=0 loops=1)
         ->  Seq Scan on lookup  (cost=0.00..2.16 rows=116 width=36)
(actual time=0.016..0.034 rows=2 loops=1)
 Total runtime: 107.378 ms
(6 rows)
Time: 109,040 ms
Real time: 25,364 ms

c) IMMUTABLE "static" lookup function:
  EXPLAIN ANALYZE SELECT id2username(n) FROM aaa;
Seq Scan on aaa  (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.089..116.397 rows=6144 loops=1)
 Total runtime: 143.800 ms
(2 rows)
Time: 144,869 ms
Real time: 102,428 ms

d) self-join with a function ;)
  EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
aaa USING (n);
 Hash Left Join  (cost=506.82..688.42 rows=6144 width=36) (actual
time=102.382..182.661 rows=6144 loops=1)
   Hash Cond: ("outer".n = "inner".n)
   ->  Seq Scan on aaa  (cost=0.00..89.44 rows=6144 width=4) (actual
time=0.012..24.360 rows=6144 loops=1)
   ->  Hash  (cost=506.82..506.82 rows=2 width=36) (actual
time=102.217..102.217 rows=0 loops=1)
         ->  Subquery Scan v_lookup  (cost=476.05..506.82 rows=2
width=36) (actual time=53.626..102.057 rows=2 loops=1)
               ->  Subquery Scan "values"  (cost=476.05..506.80 rows=2
width=4) (actual time=53.613..102.023 rows=2 loops=1)
                     ->  Unique  (cost=476.05..506.77 rows=2 width=4)
(actual time=53.456..101.772 rows=2 loops=1)
                           ->  Sort  (cost=476.05..491.41 rows=6144
width=4) (actual time=53.440..76.710 rows=6144 loops=1)
                                 Sort Key: n
                                 ->  Seq Scan on aaa
(cost=0.00..89.44 rows=6144 width=4) (actual time=0.013..26.626
rows=6144 loops=1)
 Total runtime: 209.378 ms
(11 rows)
Time: 211,460 ms
Real time:  46,682 ms

...so this IMMUTABLE is twice as slow (~100 ms) as the query joining
itself with a SELECT DISTINCT on an IMMUTABLE function (~50 ms),
which is twice as slow as JOIN against lookup table (~25 ms), and I feel
this IMMUTABLE function could be around ~20 ms (~16 ms plus
calling the function two times plus giving the values).

Ah, and this is PostgreSQL 8.0.1 running under FreeBSD on a
CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU).

   Regards,
      Dawid

PS: I have a feeling that IMMUTABLE functions worked better in 7.4,
yet I am unable to confirm this.

От:
"Merlin Moncure"
Дата:

> d) self-join with a function ;)
>   EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> aaa USING (n);

That's pretty clever.
It sure seems like the server was not caching the results of the
function...maybe the server thought it was to small a table to bother?

Merlin

От:
Dawid Kuroczko
Дата:

On 4/18/05, Merlin Moncure <> wrote:
> > d) self-join with a function ;)
> >   EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> > FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> > aaa USING (n);
>
> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?

Nah, I don't thinks so.  Having around 2 097 152 rows of 1s and 0s takes
48 seconds for id2username() query.
The "self join" you've quoted above takes 32 seconds.
SELECT n FROM aaa; takes 7 seconds.

Thinking further...
SELECT CASE n WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END FROM aaa;
takes 9 seconds.

CREATE OR REPLACE FUNCTION id2un_case(oid int) RETURNS text AS $$
BEGIN RETURN CASE oid WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END; END; $$
LANGUAGE plpgsql IMMUTABLE;
SELECT id2un_case(n) FROM aaa;
...takes 36 seconds

...and to see how it depends on flags used:
SELECT count(id2un_case(n)) FROM aaa;
...id2un_case(n) IMMUTABLE takes 29900,114 ms
...id2un_case(n) IMMUTABLE STRICT takes 30187,958 ms
...id2un_case(n) STABLE takes 31457,560 ms
...id2un_case(n) takes 33545,178 ms
...id2un_case(n) VOLATILE takes 35150,920 ms
(and a count(CASE n WHEN ... END) FROM aaa takes: 2564,188 ms


I understand that these measurements are not too accurate.  They
were done on idle system, and the queries were run couple of times
(to make sure they're cached :)).  I believe either something is minor
performance difference between IMMUTABLE STABLE and even
VOLATILE plpgsql... :(

Oh, and doing things like "ORDER BY n" or "WHERE n = 1" didn't help
either...

I still wonder whether it's only my case or is there really something
wrong with these functions?

   Regards,
       Dawid

От:
Tom Lane
Дата:

"Merlin Moncure" <> writes:
>> d) self-join with a function ;)
>> EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
>> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
>> aaa USING (n);

> That's pretty clever.
> It sure seems like the server was not caching the results of the
> function...maybe the server thought it was to small a table to bother?

No, it probably flattened the subquery on sight (looking at the actual
EXPLAIN output would confirm or disprove that).  You could prevent the
flattening by adding OFFSET 0 in the subquery.  However, the SELECT
DISTINCT sub-sub-query is expensive enough, and the join itself is
expensive enough, that you would need an *enormously* expensive
id2username() function to make this a win.

It would be interesting sometime to try to teach the planner about
inlining SQL-language functions to become joins.  That is, given

create function id2name(int) returns text as
'select name from mytab where id = $1' language sql stable;

select uid, id2name(uid) from othertab where something;

I think that in principle this could automatically be converted to

select uid, name from othertab left join mytab on (uid = id) where something;

which is much more amenable to join optimization.  There are some
pitfalls though, particularly that you'd have to be able to prove that
the function's query couldn't return more than one row (else the join
might produce more result rows than the original query).

            regards, tom lane

От:
Greg Stark
Дата:

Tom Lane <> writes:

> It would be interesting sometime to try to teach the planner about
> inlining SQL-language functions to become joins.  That is, given
>
> create function id2name(int) returns text as
> 'select name from mytab where id = $1' language sql stable;
>
> select uid, id2name(uid) from othertab where something;
>
> I think that in principle this could automatically be converted to
>
> select uid, name from othertab left join mytab on (uid = id) where something;

The Inlining of the function is presumably a side-issue. I have tons of
queries that use subqueries in the select list for which the same behaviour
would be appropriate.

Things like

select uid, (select name from mytab where id = uid) as name from othertab ...


> There are some pitfalls though, particularly that you'd have to be able to
> prove that the function's query couldn't return more than one row (else the
> join might produce more result rows than the original query).

Or just have a special join type that has the desired behaviour in that case.
Ie, pretend the query was really

SELECT * FROM othertab LEFT SINGLE JOIN mytab ...

Where "LEFT SINGLE JOIN" is an imaginary syntax that doesn't actually have to
exist in the parser, but exists in the planner/executor and behaves
differently in the case of duplicate matches.

Actually I could see such a syntax being useful directly too.

--
greg

От:
Tom Lane
Дата:

Greg Stark <> writes:
> Tom Lane <> writes:
>> It would be interesting sometime to try to teach the planner about
>> inlining SQL-language functions to become joins.  That is, given

> The Inlining of the function is presumably a side-issue. I have tons of
> queries that use subqueries in the select list for which the same behaviour
> would be appropriate.

Yeah, I was actually thinking about a two-step process: inline the
function to produce somethig equivalent to a handwritten scalar
sub-SELECT, and then try to convert sub-SELECTs into joins.

            regards, tom lane

От:
"Jim C. Nasby"
Дата:

You should re-run the function test using SQL as the function language
instead of plpgsql. There might be some performance to be had there.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Dawid Kuroczko
Дата:

On 4/20/05, Jim C. Nasby <> wrote:
> You should re-run the function test using SQL as the function language
> instead of plpgsql. There might be some performance to be had there.

Yay!  You're right!  I wonder why have I forgotten about LANGUAGE SQL. :)
It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
LANGUAGE SQL. :)  I.e. its almost the same as in-place entered SQL.

  Regards,
      Dawid

От:
Christopher Kings-Lynne
Дата:

> Yay!  You're right!  I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :)  I.e. its almost the same as in-place entered SQL.

Probably because simple SQL functions get inlined by the optimiser.

Chris

От:
Christopher Kings-Lynne
Дата:

> Yay!  You're right!  I wonder why have I forgotten about LANGUAGE SQL. :)
> It's 30 seconds vs 5 seconds for CASE ... END insisde PLpgsql vs CASE...END
> LANGUAGE SQL. :)  I.e. its almost the same as in-place entered SQL.
>
>   Regards,
>       Dawid
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match


От:
Enrico Weigelt
Дата:

* Jaime Casanova <> wrote:

<snip>
> Even if your data never changes it *can* change so the function should
> be at most stable not immutable.

okay, the planner sees that the table could potentionally change.
but - as the dba - I'd like to tell him, this table *never* changes
in practise (or at most there will be an insert once a year)

isnt there any way to enforce the function to be really immutable ?


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

От:
Enrico Weigelt
Дата:

* Tom Lane <> wrote:

<snip>
> Yeah, I was actually thinking about a two-step process: inline the
> function to produce somethig equivalent to a handwritten scalar
> sub-SELECT, and then try to convert sub-SELECTs into joins.

... back to my original question ...

What kind of query should I use ?
Is a join better than a function ?


cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux IT service

  phone:     +49 36207 519931         www:       http://www.metux.de/
  fax:       +49 36207 519932         email:     
  cellphone: +49 174 7066481
---------------------------------------------------------------------
 -- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

От:
Dawid Kuroczko
Дата:

On 4/21/05, Enrico Weigelt <> wrote:
> * Tom Lane <> wrote:
>
> <snip>
> > Yeah, I was actually thinking about a two-step process: inline the
> > function to produce somethig equivalent to a handwritten scalar
> > sub-SELECT, and then try to convert sub-SELECTs into joins.
>
> ... back to my original question ...
>
> What kind of query should I use ?
> Is a join better than a function ?

It appears that JOINs are usually faster.  So if performance is an
important issue, go with JOIN (and VIEWs probably).  Functions are nicer
(in terms off look and feel).

   Regards,
     Dawid

От:
Dawid Kuroczko
Дата:

On 4/21/05, Enrico Weigelt <> wrote:
> <snip>
> > Even if your data never changes it *can* change so the function should
> > be at most stable not immutable.
>
> okay, the planner sees that the table could potentionally change.
> but - as the dba - I'd like to tell him, this table *never* changes
> in practise (or at most there will be an insert once a year)
>
> isnt there any way to enforce the function to be really immutable ?

Never say never. :)

And to answer your question -- your IMMUTABLE function may reference
other functions (even VOLATILE).  So you may create a "caller" immutable
function which just calls your non-immutable function.  But from
performance standpoint there is not much difference (probably your
STABLE function will be faster than STABLE inside IMMUTABLE function).

Ah, and please note that some time in future PostgreSQL may require
that IMMUTABLE function calls only IMMUTABLE functions.

   Regards,
      Dawid