Query on view radically slower than query on underlying table

Поиск
Список
Период
Сортировка
От Craig James
Тема Query on view radically slower than query on underlying table
Дата
Msg-id 4D6BE95C.3090102@emolecules.com
обсуждение исходный текст
Ответ на Re: inheritance: planning time vs children number vs column number  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Query on view radically slower than query on underlying table
Список pgsql-performance
We have a medium-sized catalog (about 5 million rows), but some of our customers only want to see portions of it.  I've
beenexperimenting with a customer-specific schema that contains nothing but a "join table" -- just the primary keys of
thatportion of the data that each customer wants to see, which is used to create a view that looks like the original
table.But the most important query, the one that customers use to scan page-by-page through search results, turns out
tobe far too slow (65 seconds versus 55 milliseconds). 

Below are the results of two explain/analyze statements.  The first one uses the view, the second one goes directly to
theoriginal tables.  I thought this would be a slam-dunk, that it would return results in a flash because the view is
createdfrom two tables with the same primary keys. 

My guess (and it's just a wild guess) is that the "left join" is forcing a sequence scan or something.  But we need the
leftjoin, because it's on a "hitlist" that recorded all the matches to a customer's earlier query, and if rows have
beenremoved from the tables, the customer needs to see a blank row. 

Here is the "bad" query, which is run on the view:

em=> explain analyze
select version.version_id, version.isosmiles
from hitlist_rows_reset_140
left join version on (hitlist_rows_reset_140.objectid = version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.sortorder;
                                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
------------------------------
  Nested Loop Left Join  (cost=23687.51..215315.74 rows=1 width=54) (actual time=2682.662..63680.076 rows=10 loops=1)
    Join Filter: (hitlist_rows_reset_140.objectid = v.version_id)
    ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8)
(actualtime= 
0.015..0.049 rows=10 loops=1)
          Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
    ->  Hash Join  (cost=23687.51..204666.54 rows=851267 width=50) (actual time=31.829..6263.403 rows=851267 loops=10)
          Hash Cond: (v.version_id = mv.version_id)
          ->  Seq Scan on version v  (cost=0.00..116146.68 rows=5631968 width=50) (actual time=0.006..859.758
rows=5632191loo 
ps=10)
          ->  Hash  (cost=13046.67..13046.67 rows=851267 width=4) (actual time=317.488..317.488 rows=851267 loops=1)
                ->  Seq Scan on my_version mv  (cost=0.00..13046.67 rows=851267 width=4) (actual time=2.888..115.166
rows=8512
67 loops=1)
  Total runtime: 63680.162 ms

Here is the "good" query, which is run directly on the data tables.

em=> explain analyze
select registry.version.version_id, registry.version.isosmiles
from hitlist_rows_reset_140
left join registry.version on (hitlist_rows_reset_140.objectid = registry.version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.SortOrder;
                                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
------------------------------
  Nested Loop Left Join  (cost=0.00..17.73 rows=1 width=54) (actual time=36.022..55.558 rows=10 loops=1)
    ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8)
(actualtime= 
0.021..0.025 rows=10 loops=1)
          Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
    ->  Index Scan using version_pkey on version  (cost=0.00..9.35 rows=1 width=50) (actual time=5.551..5.552 rows=1
loops=10)
          Index Cond: (hitlist_rows_reset_140.objectid = version.version_id)
  Total runtime: 55.608 ms
(6 rows)


The view is defined like this:

em=> \d my_version
Table "test_schema.my_version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer | not null
Indexes:
     "my_version_pkey" PRIMARY KEY, btree (version_id)

em=> \d version
  View "test_schema.version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer |
  parent_id  | integer |
  isosmiles  | text    |
  coord_2d   | text    |
View definition:
  SELECT v.version_id, v.parent_id, v.isosmiles, v.coord_2d
    FROM registry.version v
    JOIN my_version mv ON mv.version_id = v.version_id;

This is:

   Postgres 8.4.4
   Ubuntu Linux 2.6.32-27
   Database: 8x7200 RAID 10, LSI RAID controller with BBU
   WAL: 2x7200 RAID1

Non-default config parameters:

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
track_activities = on
track_counts = off
track_functions = none
escape_string_warning = off

Thanks,
Craig


В списке pgsql-performance по дате отправления:

Предыдущее
От: Andreas Kretschmer
Дата:
Сообщение: Re: optimalization
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Load and Stress on PostgreSQL 9.0