A Better Way? (Multi-Left Join Lookup)

Поиск
Список
Период
Сортировка
От David Johnston
Тема A Better Way? (Multi-Left Join Lookup)
Дата
Msg-id 00c701cd66b6$815191a0$83f4b4e0$@yahoo.com
обсуждение исходный текст
Ответы Re: A Better Way? (Multi-Left Join Lookup)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: A Better Way? (Multi-Left Join Lookup)  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general

Hi!

 

Can someone please point me to a resource (or suggest a solution) that will improve the performance of this query?  I have some thoughts but figure I should avoid reinventing the wheel since this seems like something that has to have been solved already. 

 

I am working on a query where I have a list of identifiers (sample set has about 8,500 records) and I have three other queries that return a subset of these 8,500 identifiers

 

Basic query is designed as such:

 

WITH

  full_set AS ( ) -- 8,500 records

, sub_1 AS () -- also about 8,500

, sub_2 AS () -- maybe 5,000

, sub_3 AS () - - maybe 3,000

SELECT full_set.*

, COALESCE(sub_1.field, FALSE)

, COALESCE(sub_2.field, FALSE)

, COALESCE(sub_2.field, FALSE)

 

FROM full_set

LEFT JOIN sub_1

LEFT JOIN sub_2

LEFT JOIN sub_3

 

The goal is to output a boolean for each record in “full_set” specifying whether a corresponding records exists in the sub-set.  If the record exists “sub_x.field” is defined to be TRUE and thus is output otherwise sub_x.field is NULL and coalesce returns FALSE.

 

I have some explain+analyze results for this and an equivalent query that uses sub-queries in FROM instead of CTEs but I figure I’ll throw this out on general as it seems fairly basic.  I am guessing that anything more deep should be sent to the performance sub-list (which I haven’t subscribed to at this point) with the explain+analyze results attached.

 

The performance of this query is exponential due to the fact that the sub-queries/CTEs are not indexed and so each subset has to be scanned completely for each record in the full set.

 

The identifiers are sortable and so my first thought was to try and divide each dataset into sub-sets of, say, 1000, then UNION the results of the 9 passes it would take to process all 8,500 records.  Would a Recursive CTE be able to accomplish this?  The sticking point is that the identifiers not pure integers (usually, in my current dataset they are) and even when the range of values is variable.

 

I am working with 9.0

 

Thank You!

 

David J.

 

 

 

 

 

 

 

В списке pgsql-general по дате отправления:

Предыдущее
От: Manoj Govindassamy
Дата:
Сообщение: Pg_ctl promote -- wait for slave to be promoted fully ?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: A Better Way? (Multi-Left Join Lookup)