От: Thomas Kellerer
Тема: Re: query optimization
Дата: ,
Msg-id: jnc860$q30$1@dough.gmane.org
(см: обсуждение, исходный текст)
Ответ на: Re: query optimization  (Tom Lane)
Ответы: Re: query optimization  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

query optimization  (Richard Kojedzinszky, )
 Re: query optimization  ("Kevin Grittner", )
 Re: query optimization  (Tom Lane, )
  Re: query optimization  (Thomas Kellerer, )
   Re: query optimization  (Tom Lane, )
    Re: query optimization  (Andrew Dunstan, )
  Re: query optimization  (Richard Kojedzinszky, )

Tom Lane wrote on 26.04.2012 21:17:
> Richard Kojedzinszky<>  writes:
>> Dear list,
>> We have a database schema, which looks the same as the attached script.
>> When filling the tables with data, and skipping analyze on the table (so
>> pg_stats contains no records for table 'a'), the first select in the
>> script runs fast, but after an analyze the planner decides to sequence
>> scan tables b and c, thus making the query much slower. Can somebody help
>> me solving this issue, or tuning our installation to not to use sequence
>> scans in this case?
> Um ... did you analyze all the tables, or just some of them?  I get
> sub-millisecond runtimes if all four tables have been analyzed, but it
> does seem to pick lousy plans if, say, only a and b have been analyzed.

Here it's similar to Richard's experience:

Before analyzing the four tables, the first statement yields this plan:

Merge Left Join  (cost=504.89..2634509.91 rows=125000000 width=16) (actual time=0.103..0.108 rows=1 loops=1)
   Merge Cond: (a.b = b.id)
   ->  Sort  (cost=504.89..506.14 rows=500 width=8) (actual time=0.043..0.043 rows=1 loops=1)
         Sort Key: a.b
         Sort Method: quicksort  Memory: 17kB
         ->  Bitmap Heap Scan on a  (cost=12.14..482.47 rows=500 width=8) (actual time=0.028..0.029 rows=1 loops=1)
               Recheck Cond: (id = 4)
               ->  Bitmap Index Scan on a_idx1  (cost=0.00..12.01 rows=500 width=0) (actual time=0.021..0.021 rows=1
                     Index Cond: (id = 4)
   ->  Materialize  (cost=0.00..884002.52 rows=50000000 width=8) (actual time=0.041..0.057 rows=5 loops=1)
         ->  Merge Join  (cost=0.00..759002.52 rows=50000000 width=8) (actual time=0.037..0.051 rows=5 loops=1)
               Merge Cond: (b.id = c.id)
               ->  Index Scan using b_idx1 on b  (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.016..0.018
               ->  Materialize  (cost=0.00..4626.26 rows=100000 width=4) (actual time=0.017..0.022 rows=5 loops=1)
                     ->  Index Scan using c_idx1 on c  (cost=0.00..4376.26 rows=100000 width=4) (actual
time=0.014..0.017rows=5 loops=1) 
Total runtime: 0.209 ms

This continues to stay the plan for about 10-15 repetitions, then it turns to this plan

Hash Right Join  (cost=2701.29..6519.30 rows=1 width=16) (actual time=79.604..299.227 rows=1 loops=1)
   Hash Cond: (b.id = a.b)
   ->  Hash Join  (cost=2693.00..6136.00 rows=100000 width=8) (actual time=79.550..265.251 rows=100000 loops=1)
         Hash Cond: (b.id = c.id)
         ->  Seq Scan on b  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..36.158 rows=100000 loops=1)
         ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4) (actual time=79.461..79.461 rows=100000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 2735kB
               ->  Seq Scan on c  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.010..32.930 rows=100000
   ->  Hash  (cost=8.28..8.28 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
               Index Cond: (id = 4)
Total runtime: 299.564 ms

(I guess autovacuum kicked in, because that the same plan I get when running analyze on all four tables right after

And the second one yields this one here (Regardless of analyze or not):

Nested Loop Left Join  (cost=0.00..16.89 rows=1 width=16) (actual time=0.027..0.031 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..16.57 rows=1 width=12) (actual time=0.020..0.022 rows=1 loops=1)
         ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
               Index Cond: (id = 4)
         ->  Index Scan using b_idx1 on b  (cost=0.00..8.28 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
               Index Cond: (a.b = id)
   ->  Index Scan using c_idx1 on c  (cost=0.00..0.31 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
         Index Cond: (b.id = id)
Total runtime: 0.104 ms

My version:
PostgreSQL 9.1.3, compiled by Visual C++ build 1500, 32-bit
Running on Windows XP SP3

shared_buffers = 768MB
work_mem = 24MB
effective_cache_size = 1024MB

All other (relevant) settings are on defaults


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Сообщение: Re: Weird plan variation with recursive CTEs
От: AI Rumman
Сообщение: NOT EXISTS or LEFT JOIN which one is better?