> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, July 20, 2012 4:47 PM
> To: David Johnston
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> "David Johnston" <polobo@yahoo.com> writes:
> > WITH
> > full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
>
> > The performance of this query is exponential due to the fact that the
> > sub-queries/CTEs are not indexed and so each subset has to be scanned
> > completely for each record in the full set.
>
> Surely not. Neither merge nor hash joins require an index. What plan is
> getting selected? Are you sure there's at most one match in each "sub"
set
> for each row in the "full" set? If you were getting a large number of
matches
> in some cases, the size of the result could balloon to something
unfortunate
> ... but we have not got enough information to know.
>
> regards, tom lane
The final result, in this case would have 8,500 records AND
sub_1.field would be TRUE for basically all of them and FALSE for the
minimal remainder
sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them
sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them
There is never, in reality, two records in a sub-table for a single record
in the master table. It is possible a record exists in a sub-table but not
in the main table but I do not care about those (thus the LEFT instead of a
FULL OUTER JOIN).
I have attached a scrubbed query and explain/analyze. Let me know if
something more is needed.
I have included two versions of the query, one using CTE and the other using
mostly sub-selects.
I had run ANALYZE on the pertinent tables but the CTE queries all perform
quite quickly when run by themselves.
In looking at the source tables for the data I did notice that I have not
properly defined the relevant INDEXes as being UNIQUE. This applies to two
of the sub-tables. The third sub-table requires the use of "DISTINCT". The
joining columns with each set of data are unique when fed into the LEFT
JOIN. The master CTE/Query is generated via a function call and it also
generates unique keys for the LEFT JOIN.
Thank you for your help!
David J.