Subquery flattening causing sequential scan

Поиск
Список
Период
Сортировка
От Jim Crate
Тема Subquery flattening causing sequential scan
Дата
Msg-id 0E417F76-FE21-4FB5-8132-0C485778A041@gmail.com
обсуждение исходный текст
Ответы Re: Subquery flattening causing sequential scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Subquery flattening causing sequential scan  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-performance
PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup.

I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a
subquery. A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0.  This does
work,reducing the query from around 1s to around 250ms, most of which is the subquery.    

My question is why does it do a seq scan when it flattens this subquery into a JOIN?  Is it because the emsg_messages
tableis around 1M rows?  Are there some guidelines to when the planner will prefer not to use an available index?  I
justhad a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable
fora machine with 16GB of memory.  Would the default setting of 128MB cause this behavior?  I can't bounce the
productionserver midday to test that change. 



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
    SELECT id
    FROM emsg_messages msg
    WHERE msg.account_id = 314 AND msg.outgoing = FALSE
      AND msg.message_type = 1 AND msg.spam_level < 2
      AND msg.deleted_at IS NULL
      AND msg.id NOT IN (
            SELECT emf.message_id
            FROM emsg_message_folders emf
            where emf.account_id = 314
    )
)


QUERY PLAN
Hash Semi Join  (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1)
  Hash Cond: (ema.message_id = msg.id)
  ->  Seq Scan on emsg_message_addresses ema  (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241
rows=2875437loops=1)     
  ->  Hash  (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 1kB
        ->  Bitmap Heap Scan on emsg_messages msg  (cost=7979.35..84475.45 rows=3783 width=4) (actual
time=273.224..273.387rows=1 loops=1)     
              Recheck Cond: (account_id = 314)
              Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND
(message_type= 1))     
              ->  Bitmap Index Scan on index_emsg_messages_on_account_id  (cost=0.00..867.98 rows=34611 width=0)
(actualtime=9.633..9.633 rows=34997 loops=1)     
                    Index Cond: (account_id = 314)
              SubPlan 1
                ->  Bitmap Heap Scan on emsg_message_folders emf  (cost=704.90..7022.51 rows=35169 width=4) (actual
time=5.684..38.016rows=34594 loops=1)     
                      Recheck Cond: (account_id = 314)
                      ->  Bitmap Index Scan on index_emsg_message_folders_on_account_id  (cost=0.00..696.10 rows=35169
width=0)(actual time=5.175..5.175 rows=34594 loops=1)     
                            Index Cond: (account_id = 314)
Total runtime: 1083.890 ms




EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
    SELECT id
    FROM emsg_messages msg
    WHERE msg.account_id = 314 AND msg.outgoing = FALSE
      AND msg.message_type = 1 AND msg.spam_level < 2
      AND msg.deleted_at IS NULL
      AND msg.id NOT IN (
            SELECT emf.message_id
            FROM emsg_message_folders emf
            where emf.account_id = 314
    )
    OFFSET 0
)


QUERY PLAN
Nested Loop  (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1)
  ->  HashAggregate  (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1)
        ->  Limit  (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1)
              ->  Bitmap Heap Scan on emsg_messages msg  (cost=7979.36..84477.60 rows=3783 width=4) (actual
time=273.169..273.333rows=1 loops=1)     
                    Recheck Cond: (account_id = 314)
                    Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1))
AND(message_type = 1))     
                    ->  Bitmap Index Scan on index_emsg_messages_on_account_id  (cost=0.00..867.99 rows=34612 width=0)
(actualtime=9.693..9.693 rows=34998 loops=1)     
                          Index Cond: (account_id = 314)
                    SubPlan 1
                      ->  Bitmap Heap Scan on emsg_message_folders emf  (cost=704.90..7022.51 rows=35169 width=4)
(actualtime=5.795..39.420 rows=34594 loops=1)     
                            Recheck Cond: (account_id = 314)
                            ->  Bitmap Index Scan on index_emsg_message_folders_on_account_id  (cost=0.00..696.10
rows=35169width=0) (actual time=5.266..5.266 rows=34594 loops=1)     
                                  Index Cond: (account_id = 314)
  ->  Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema  (cost=0.00..14.69
rows=13width=12) (actual time=0.056..0.058 rows=2 loops=1)     
        Index Cond: (ema.message_id = msg.id)
Total runtime: 273.679 ms


Jim Crate


В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: Exploring memory usage
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Subquery flattening causing sequential scan