Обсуждение: Subquery flattening causing sequential scan
PostgreSQL 9.0.2 Mac OS X Server 10.6.8 Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup. I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a subquery. A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0. This does work,reducing the query from around 1s to around 250ms, most of which is the subquery. My question is why does it do a seq scan when it flattens this subquery into a JOIN? Is it because the emsg_messages tableis around 1M rows? Are there some guidelines to when the planner will prefer not to use an available index? I justhad a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable fora machine with 16GB of memory. Would the default setting of 128MB cause this behavior? I can't bounce the productionserver midday to test that change. EXPLAIN ANALYZE SELECT ema.message_id, ema.email_address_id, ema.address_type FROM emsg_message_addresses ema WHERE ema.message_id IN ( SELECT id FROM emsg_messages msg WHERE msg.account_id = 314 AND msg.outgoing = FALSE AND msg.message_type = 1 AND msg.spam_level < 2 AND msg.deleted_at IS NULL AND msg.id NOT IN ( SELECT emf.message_id FROM emsg_message_folders emf where emf.account_id = 314 ) ) QUERY PLAN Hash Semi Join (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1) Hash Cond: (ema.message_id = msg.id) -> Seq Scan on emsg_message_addresses ema (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241 rows=2875437loops=1) -> Hash (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Bitmap Heap Scan on emsg_messages msg (cost=7979.35..84475.45 rows=3783 width=4) (actual time=273.224..273.387rows=1 loops=1) Recheck Cond: (account_id = 314) Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND (message_type= 1)) -> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.98 rows=34611 width=0) (actualtime=9.633..9.633 rows=34997 loops=1) Index Cond: (account_id = 314) SubPlan 1 -> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual time=5.684..38.016rows=34594 loops=1) Recheck Cond: (account_id = 314) -> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169 width=0)(actual time=5.175..5.175 rows=34594 loops=1) Index Cond: (account_id = 314) Total runtime: 1083.890 ms EXPLAIN ANALYZE SELECT ema.message_id, ema.email_address_id, ema.address_type FROM emsg_message_addresses ema WHERE ema.message_id IN ( SELECT id FROM emsg_messages msg WHERE msg.account_id = 314 AND msg.outgoing = FALSE AND msg.message_type = 1 AND msg.spam_level < 2 AND msg.deleted_at IS NULL AND msg.id NOT IN ( SELECT emf.message_id FROM emsg_message_folders emf where emf.account_id = 314 ) OFFSET 0 ) QUERY PLAN Nested Loop (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1) -> HashAggregate (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1) -> Limit (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1) -> Bitmap Heap Scan on emsg_messages msg (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.169..273.333rows=1 loops=1) Recheck Cond: (account_id = 314) Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND(message_type = 1)) -> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.99 rows=34612 width=0) (actualtime=9.693..9.693 rows=34998 loops=1) Index Cond: (account_id = 314) SubPlan 1 -> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actualtime=5.795..39.420 rows=34594 loops=1) Recheck Cond: (account_id = 314) -> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169width=0) (actual time=5.266..5.266 rows=34594 loops=1) Index Cond: (account_id = 314) -> Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema (cost=0.00..14.69 rows=13width=12) (actual time=0.056..0.058 rows=2 loops=1) Index Cond: (ema.message_id = msg.id) Total runtime: 273.679 ms Jim Crate
Jim Crate <jimcfl@gmail.com> writes: > My question is why does it do a seq scan when it flattens this > subquery into a JOIN? Because it thinks there will be 3783 rows out of the msg scan, which if true would make your desired nestloop join a serious loser. You need to see about getting that estimate to be off by less than three orders of magnitude. Possibly raising the stats target on emsg_messages would help. I'd also try converting the inner NOT IN into a NOT EXISTS, just to see if that makes the estimate any better. Using something newer than 9.0.2 might help too, as we fixed some outer-join estimation bugs a few months ago. regards, tom lane
Hi, On 28 December 2011 05:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Possibly raising the stats target on emsg_messages would help. In the function std_typanalyze() is this comment: /*-------------------- * The following choice of minrows is based on the paper * "Random sampling for histogram construction: how much is enough?" * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in * Proceedings of ACM SIGMOD International Conference on Management * of Data, 1998, Pages 436-447. Their Corollary 1 to Theorem 5 * says that for table size n, histogram size k, maximum relative * error in bin size f, and error probability gamma, the minimum * random sample size is * r = 4 * k * ln(2*n/gamma) / f^2 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain * r = 305.82 * k * Note that because of the log function, the dependence on n is * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66 * bin size error with probability 0.99. So there's no real need to * scale for n, which is a good thing because we don't necessarily * know it at this point. *-------------------- */ The question is why the parameter f is not exposed as a GUC? Sometimes it could make sense to have few bins with better estimation (for same r). -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
=?UTF-8?Q?Ondrej_Ivani=C4=8D?= <ondrej.ivanic@gmail.com> writes: > The question is why the parameter f is not exposed as a GUC? What would that accomplish that default_statistics_target doesn't? (Other than being much harder to explain...) regards, tom lane
'
27.12.2011 20:13 пользователь "Tom Lane" <tgl@sss.pgh.pa.us> написал:
>
> Jim Crate <jimcfl@gmail.com> writes:
> > My question is why does it do a seq scan when it flattens this
> > subquery into a JOIN?
>
> Because it thinks there will be 3783 rows out of the msg scan, which if
> true would make your desired nestloop join a serious loser.
But second plan is evaluated cheapier by analyze. I thought this should make it being used unless it is not evaluated. Can it be collapse limit problem or like?
On Dec 27, 2011, at 1:12 PM, Tom Lane wrote: > Jim Crate <jimcfl@gmail.com> writes: >> My question is why does it do a seq scan when it flattens this >> subquery into a JOIN? > > Because it thinks there will be 3783 rows out of the msg scan, which if > true would make your desired nestloop join a serious loser. You need to > see about getting that estimate to be off by less than three orders of > magnitude. Possibly raising the stats target on emsg_messages would > help. I'd also try converting the inner NOT IN into a NOT EXISTS, just > to see if that makes the estimate any better. The planner does choose the nested loop after converting the NOT IN to NOT EXISTS. Using LEFT JOIN / IS NULL also generatedthe same plan as NOT EXISTS. I guess I really need to learn more about reading explain plans, and expand my useof different constructs. It's so easy to fall into the trap of using the same construct in all situations just becauseit works well enough most of the time and is easy to read. As for default_statistics_target, I read the docs and I'm not sure how increasing that value would help in this case. Thereare only a couple hundred accounts, and less than 5 values for message_type and spam_level. In the emsg_message_folderstable, the message_id is considered unique (pg_stats has n_distinct = -1), which would also be correct. EXPLAIN ANALYZE SELECT ema.message_id, ema.email_address_id, ema.address_type FROM emsg_message_addresses ema WHERE ema.message_id IN ( SELECT id FROM emsg_messages msg WHERE msg.account_id = 314 AND msg.outgoing = FALSE AND msg.message_type = 1 AND msg.spam_level < 2 AND msg.deleted_at IS NULL AND NOT EXISTS ( SELECT emf.message_id FROM emsg_message_folders emf WHERE emf.account_id = 314 AND emf.message_id = msg.id ) ) QUERY PLAN Nested Loop (cost=84785.80..84806.43 rows=100455 width=12) (actual time=262.507..262.528 rows=6 loops=1) -> HashAggregate (cost=84785.80..84785.81 rows=1 width=4) (actual time=262.445..262.446 rows=3 loops=1) -> Hash Anti Join (cost=8285.87..84785.80 rows=1 width=4) (actual time=254.363..262.426 rows=3 loops=1) Hash Cond: (msg.id = emf.message_id) -> Bitmap Heap Scan on emsg_messages msg (cost=869.66..77274.56 rows=7602 width=4) (actual time=13.622..204.879rows=12387 loops=1) Recheck Cond: (account_id = 314) Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (message_type = 1)) -> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.76 rows=34582 width=0) (actualtime=8.756..8.756 rows=35091 loops=1) Index Cond: (account_id = 314) -> Hash (cost=6990.69..6990.69 rows=34042 width=4) (actual time=45.785..45.785 rows=34647 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 1219kB -> Bitmap Heap Scan on emsg_message_folders emf (cost=680.16..6990.69 rows=34042 width=4) (actual time=5.465..35.842rows=34647 loops=1) Recheck Cond: (account_id = 314) -> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..671.65 rows=34042width=0) (actual time=4.966..4.966 rows=34647 loops=1) Index Cond: (account_id = 314) -> Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema (cost=0.00..20.45 rows=13width=12) (actual time=0.023..0.023 rows=2 loops=3) Index Cond: (ema.message_id = msg.id) Total runtime: 262.742 ms Jim Crate
On Tue, Dec 27, 2011 at 12:29 PM, Jim Crate <jimcfl@gmail.com> wrote: > My question is why does it do a seq scan when it flattens this subquery into a JOIN? Is it because the emsg_messages tableis around 1M rows? Are there some guidelines to when the planner will prefer not to use an available index? I justhad a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable fora machine with 16GB of memory. Would the default setting of 128MB cause this behavior? I can't bounce the productionserver midday to test that change. You wouldn't need to bounce the production server to test that. You could just use SET in the session you were testing from. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company