Re: function not called if part of aggregate

Поиск
Список
Период
Сортировка
От Craig A. James
Тема Re: function not called if part of aggregate
Дата
Msg-id 448E5A92.9050206@modgraph-usa.com
обсуждение исходный текст
Ответ на Re: function not called if part of aggregate  (Greg Stark <gsstark@mit.edu>)
Ответы Re: function not called if part of aggregate
Список pgsql-performance
Greg Stark wrote:
> However that's not enough to explain what you've shown. How about you show the
> actual query and actual plan you're working with? The plan you've shown can't
> result from the query you sent.

Mea culpa, sort of.  But ... in fact, the plan I sent *was* from query I sent, with the table/column names changed for
clarity. This time I'll send the plan "raw".  (This is PG 8.0.1.) 

chm=> explain select count(1) from (select normalize_add_salt(smiles) from
chm(>     salt_smiles order by db_no) as foo;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Aggregate  (cost=69.95..69.95 rows=1 width=0)
   ->  Subquery Scan foo  (cost=0.00..67.93 rows=806 width=0)
         ->  Index Scan using salt_smiles_pkey on salt_smiles  (cost=0.00..59.87 rows=806 width=30)
(3 rows)

As pointed out by Tom and others, this query DOES in fact call the normalize_add_salt() function.

Now here's the weird part. (And where my original posting went wrong -- sorry for the error!  I got the two queries
mixedup.) 

I originally had a more complex query, the purpose being to guarantee that the function was called on the strings in
theorder specified.  (More on this below.)  Here is the original query I used: 

chm=> explain select count(1) from (select normalize_add_salt(smiles)
chm(>   from (select smiles from salt_smiles order by db_no) as foo) as bar;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Aggregate  (cost=67.94..67.94 rows=1 width=0)
   ->  Subquery Scan foo  (cost=0.00..65.92 rows=806 width=0)
         ->  Index Scan using salt_smiles_pkey on salt_smiles  (cost=0.00..57.86 rows=806 width=30)
(3 rows)

Notice that the plans are essentially identical, yet in this one the function does NOT get called.  I proved this by
bruteforce, inserting "char **p = NULL; *p = "foo";" into the C code to guarantee a segmentation violation if the
functiongets called.  In the first case it does SIGSEGV, and in the second case it does not. 

Now the reason for this more-complex query with an additional subselect is that the SMILES (which, by the way, are a
lexicalway of representing chemical structures - see www.daylight.com), must be passed to the function in a particular
order(hence the ORDER BY).  In retrospect I realize the optimizer apparently flattens this query anyway (hence the
identicalplans, above). 

But the weird thing is that, in spite of flattening, which would appear to make the queries equivalent, the function
getscalled in one case, and not in the other. 

Steinar H. Gunderson asked:
>>  select count(1) from (select foo_init(value) from foo_init_table order by
>>  value_id) as foo;
> Why not just count(foo_init(value))?

Because the SMILES must be processed in a specific order, hence the more complex queries.

The simple answer to this whole problem is what Steinar wrote:
>>This works well, but it requires me to actually retrieve the function's
>>value 800 times.
>
> Is this actually a problem?

No, it's just a nuisance.  It occurs to me that in spite of the ORDER BY expression, Postgres is free to evaluate the
functionfirst, THEN sort the results, which means the SMILES would be processed in random order anyway.  I.e. my ORDER
BYclause is useless for the intended purpose. 

So the only way I can see to get this right is to pull the SMILES into my application with the ORDER BY to ensure I
havethem in the correct order, then send them back one at a time via a "select normalize_add_salt(smiles)", meaning
I'llretrieve 800 strings and then send them back.  

I just thought there ought to be a way to do this all on the PG server instead of sending all these strings back and
forth. I'd like to say to Postgres, "Just do it this way, OK?"  But the optimizer can't be turned off, so I guess I
haveto do it the slow way.  The good news is that this is just an initialization step, after which I typically process
thousandsof molecules, so the extra overhead won't kill me. 

Thanks to all for your help.

Craig

В списке pgsql-performance по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: 64-bit vs 32-bit performance ... backwards?
Следующее
От: "Steinar H. Gunderson"
Дата:
Сообщение: Placement of 64-bit libraries (offtopic)