Re: Left Outer Join much faster than non-outer Join?

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: Left Outer Join much faster than non-outer Join?
Дата
Msg-id 424BA115.8050606@cheapcomplexdevices.com
обсуждение исходный текст
Ответ на Re: Left Outer Join much faster than non-outer Join?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Left Outer Join much faster than non-outer Join?
Список pgsql-performance
Tom Lane wrote:
> rm_pg@cheapcomplexdevices.com writes:
>
>> select *
>>     from streetname_lookup as sl
>>     join city_lookup as cl on (true)
>>     left outer join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id and
cl.geo_city_id=ts.geo_city_id)
>>     where  str_name='alamo' and  city='san antonio' and state='TX'
>>;
>
> That's a fairly odd query;

I think it's a very common type of query in data warehousing.

It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.

> why don't you have any join condition between
> streetname_lookup and city_lookup?

Those two tables shared no data.   They merely get the "id"s
for looking things up in the much larger central table.

Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.


With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?

   streetname_lookup
   (for every street name used in the country)
       streetid  |  name  | type
       ----------+--------+------
       1         |  Main  | St
       2         |  1st   | St

   city_lookup
   (for every city name used in the country)
       cityid  |  name   | state
       --------+---------+------
       1       |  Boston | MA
       2       |  Alameda| CA


   tlid_smaller
   (containing a record for every city block in the country)
    city_id | street_id  | addresses | demographics, etc.
    --------+------------+-----------+----------------------
         1  |    1       | 100 block | [lots of columns]
         1  |    1       | 200 block | [lots of columns]
         1  |    1       | 300 block | [lots of columns]
         1  |    2       | 100 block | [lots of columns]
         1  |    2       | 100 block | [lots of columns]

> The planner won't consider Cartesian joins unless forced to, which is
> why it fails to consider the join order "((sl join cl) join ts)" unless
> you have an outer join in the mix.  I think that's generally a good
> heuristic, and am disinclined to remove it ...

IMHO it's a shame it doesn't even consider it when the estimated
results are very small.   I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).

     Ron

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Left Outer Join much faster than non-outer Join?
Следующее
От: Ron Mayer
Дата:
Сообщение: Re: Left Outer Join much faster than non-outer Join?