Re: Subquery flattening causing sequential scan

Поиск
Список
Период
Сортировка
От Jim Crate
Тема Re: Subquery flattening causing sequential scan
Дата
Msg-id 4BCFFC7E-795D-4886-ABB0-D7628E583999@gmail.com
обсуждение исходный текст
Ответ на Re: Subquery flattening causing sequential scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Dec 27, 2011, at 1:12 PM, Tom Lane wrote:

> Jim Crate <jimcfl@gmail.com> writes:
>> My question is why does it do a seq scan when it flattens this
>> subquery into a JOIN?
>
> Because it thinks there will be 3783 rows out of the msg scan, which if
> true would make your desired nestloop join a serious loser.  You need to
> see about getting that estimate to be off by less than three orders of
> magnitude.  Possibly raising the stats target on emsg_messages would
> help.  I'd also try converting the inner NOT IN into a NOT EXISTS, just
> to see if that makes the estimate any better.


The planner does choose the nested loop after converting the NOT IN to NOT EXISTS.  Using LEFT JOIN / IS NULL also
generatedthe same plan as NOT EXISTS.  I guess I really need to learn more about reading explain plans, and expand my
useof different constructs.   It's so easy to fall into the trap of using the same construct in all situations just
becauseit works well enough most of the time and is easy to read.   

As for default_statistics_target, I read the docs and I'm not sure how increasing that value would help in this case.
Thereare only a couple hundred accounts, and less than 5 values for message_type and spam_level.  In the
emsg_message_folderstable, the message_id is considered unique (pg_stats has n_distinct = -1), which would also be
correct. 



EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
    SELECT id
    FROM emsg_messages msg
    WHERE msg.account_id = 314 AND msg.outgoing = FALSE
      AND msg.message_type = 1 AND msg.spam_level < 2
      AND msg.deleted_at IS NULL
      AND NOT EXISTS (
            SELECT emf.message_id
            FROM emsg_message_folders emf
            WHERE emf.account_id = 314 AND emf.message_id = msg.id
    )
)


QUERY PLAN
Nested Loop  (cost=84785.80..84806.43 rows=100455 width=12) (actual time=262.507..262.528 rows=6 loops=1)
  ->  HashAggregate  (cost=84785.80..84785.81 rows=1 width=4) (actual time=262.445..262.446 rows=3 loops=1)
        ->  Hash Anti Join  (cost=8285.87..84785.80 rows=1 width=4) (actual time=254.363..262.426 rows=3 loops=1)
              Hash Cond: (msg.id = emf.message_id)
              ->  Bitmap Heap Scan on emsg_messages msg  (cost=869.66..77274.56 rows=7602 width=4) (actual
time=13.622..204.879rows=12387 loops=1)     
                    Recheck Cond: (account_id = 314)
                    Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (message_type = 1))
                    ->  Bitmap Index Scan on index_emsg_messages_on_account_id  (cost=0.00..867.76 rows=34582 width=0)
(actualtime=8.756..8.756 rows=35091 loops=1)     
                          Index Cond: (account_id = 314)
              ->  Hash  (cost=6990.69..6990.69 rows=34042 width=4) (actual time=45.785..45.785 rows=34647 loops=1)
                    Buckets: 4096  Batches: 1  Memory Usage: 1219kB
                    ->  Bitmap Heap Scan on emsg_message_folders emf  (cost=680.16..6990.69 rows=34042 width=4) (actual
time=5.465..35.842rows=34647 loops=1)     
                          Recheck Cond: (account_id = 314)
                          ->  Bitmap Index Scan on index_emsg_message_folders_on_account_id  (cost=0.00..671.65
rows=34042width=0) (actual time=4.966..4.966 rows=34647 loops=1)     
                                Index Cond: (account_id = 314)
  ->  Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema  (cost=0.00..20.45
rows=13width=12) (actual time=0.023..0.023 rows=2 loops=3)     
        Index Cond: (ema.message_id = msg.id)
Total runtime: 262.742 ms


Jim Crate


В списке pgsql-performance по дате отправления:

Предыдущее
От: Matteo Beccati
Дата:
Сообщение: Re: PostgreSQL 9.0.4 blocking in lseek?
Следующее
От: Matteo Beccati
Дата:
Сообщение: Re: PostgreSQL 9.0.4 blocking in lseek?