"andremachado" <andremachado@techforce.com.br> writes:
> continuing the saga,
> http://archives.postgresql.org/pgsql-performance/2006-04/msg00558.php ,
> my coleague created a test database with fake data (see below).
Thanks. I played around with this a bit, and got results like these:
original query, 8.1 branch from a couple weeks back: 945 sec
original query, 8.1 branch tip: 184 sec
modified query, 8.1 branch tip: 15 sec
The first differential is because of this patch:
http://archives.postgresql.org/pgsql-committers/2006-04/msg00355.php
viz
Remove the restriction originally coded into
optimize_minmax_aggregates() that MIN/MAX not be converted to
use an index if the query WHERE clause contains any volatile
functions or subplans.
Allowing the max(DEC2.AM_REFERENCIA) subquery to be converted to an
indexscan makes for about a 5X reduction in the number of times the
EXISTS sub-subquery is executed. But the real problem is that Postgres
isn't excessively smart about EXISTS subqueries. I manually changed it
into an IN to get the 15-second runtime: instead of
(select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
where DEC2.IN_FOI_RETIFICADA=0 and
exists (select CAD3.ID_CADASTRO from CADASTRO CAD3 where
CAD3.ID_DECLARACAO=DEC2.ID_DECLARACAO and
CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )
write
(select max(DEC2.AM_REFERENCIA) from DECLARACAO DEC2
where DEC2.IN_FOI_RETIFICADA=0 and DEC2.ID_DECLARACAO in
(select CAD3.ID_DECLARACAO from CADASTRO CAD3 where
CAD3.ID_EMPRESA=CADASTRO.ID_EMPRESA ) )
I'm not clear on how Firebird is managing to do this query in under
a second --- I can believe that they know how to do EXISTS as a join
but it still seems like the subqueries need to be done many thousand
times. I thought maybe they were caching the results of the overall
subquery for specific values of CADASTRO.ID_EMPRESA, but now that I
see your test data, there are several thousand distinct values of
that, so there's not a lot of traction to be gained that way.
regards, tom lane