JOIN to a VIEW makes a real slow query

Поиск
Список
Период
Сортировка
От Chuck D.
Тема JOIN to a VIEW makes a real slow query
Дата
Msg-id 200702131253.55486.pgsql-performance@nullmx.com
обсуждение исходный текст
Ответы Re: JOIN to a VIEW makes a real slow query  ("Merlin Moncure" <mmoncure@gmail.com>)
Список pgsql-performance
Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to
check here first.  I've seen this discussed on the list before but I'm still
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,'
and 'country' tables.  The city data comes from multiple sources and totals
about 3 million rows.  I decided to split the city table up based on the
source (world_city, us_city).  This makes easier updating because the
assigned feature id's from the two sources overlap in some cases making it
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About
65 seconds to select a city.  It doesn't matter wether it is joined to one
table or 6 like it is in my user_detail query - it is still slow.  It has
indexes on the city_id, state_id, country_id of each table in the view too.
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=650146.58..751018.45 rows=10618 width=55)
(actual time=53078.261..61269.190 rows=1 loops=1)
   Join Filter: ("outer".city_id = "inner"."?column1?")
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.022 rows=1 loops=1)
   ->  Unique  (cost=650146.58..703236.51 rows=2123597 width=62) (actual
time=49458.007..59635.140 rows=2122712 loops=1)
         ->  Sort  (cost=650146.58..655455.58 rows=2123597 width=62) (actual
time=49458.003..55405.965 rows=2122712 loops=1)
               Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat,
lon, city_name
               ->  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual
time=18.835..13706.395 rows=2122712 loops=1)
                     ->  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
                     ->  Seq Scan on world_city  (cost=0.00..47632.88
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
 Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=36) (actual
time=53.854..53.871 rows=1 loops=1)
   ->  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.016 rows=1 loops=1)
   ->  Index Scan using world_city_pk on world_city ci  (cost=0.00..3.01
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
         Index Cond: ("outer".city_id = ci.city_id)
 Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms


I'm not sure that a view on a UNION is the best idea but I don't know how to
go about keeping the tables from the data sources with the view (other than
modifying them with a source_id column).  Any ideas on what is causing the
performance lag?



В списке pgsql-performance по дате отправления:

Предыдущее
От: Alan Hodgson
Дата:
Сообщение: Re: CPU Usage
Следующее
От: "Merlin Moncure"
Дата:
Сообщение: Re: JOIN to a VIEW makes a real slow query