Re: A Better Way? (Multi-Left Join Lookup)

Поиск
Список
Период
Сортировка
От David Johnston
Тема Re: A Better Way? (Multi-Left Join Lookup)
Дата
Msg-id 00f201cd66bf$d3273080$79759180$@yahoo.com
обсуждение исходный текст
Ответ на Re: A Better Way? (Multi-Left Join Lookup)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: A Better Way? (Multi-Left Join Lookup)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, July 20, 2012 4:47 PM
> To: David Johnston
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> "David Johnston" <polobo@yahoo.com> writes:
> > WITH
> >   full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
>
> > The performance of this query is exponential due to the fact that the
> > sub-queries/CTEs are not indexed and so each subset has to be scanned
> > completely for each record in the full set.
>
> Surely not.  Neither merge nor hash joins require an index.  What plan is
> getting selected?  Are you sure there's at most one match in each "sub"
set
> for each row in the "full" set?  If you were getting a large number of
matches
> in some cases, the size of the result could balloon to something
unfortunate
> ... but we have not got enough information to know.
>
>             regards, tom lane

The final result, in this case would have 8,500 records AND
sub_1.field would be TRUE for basically all of them and FALSE for the
minimal remainder
sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them
sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them

There is never, in reality, two records in a sub-table for a single record
in the master table.  It is possible a record exists in a sub-table but not
in the main table but I do not care about those (thus the LEFT instead of a
FULL OUTER JOIN).

I have attached a scrubbed query and explain/analyze.  Let me know if
something more is needed.

I have included two versions of the query, one using CTE and the other using
mostly sub-selects.

I had run ANALYZE on the pertinent tables but the CTE queries all perform
quite quickly when run by themselves.

In looking at the source tables for the data I did notice that I have not
properly defined the relevant INDEXes as being UNIQUE.  This applies to two
of the sub-tables.  The third sub-table requires the use of "DISTINCT".  The
joining columns with each set of data are unique when fed into the LEFT
JOIN.  The master CTE/Query is generated via a function call and it also
generates unique keys for the LEFT JOIN.

Thank you for your help!

David J.





Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Alban Hertroys
Дата:
Сообщение: Re: A Better Way? (Multi-Left Join Lookup)
Следующее
От: "David Johnston"
Дата:
Сообщение: Re: A Better Way? (Multi-Left Join Lookup)