Обсуждение: Is this planner choice easily explained?
This is a summary from a thread on the dbmail mailing list. I am trying to understand some apparently odd behaviour with 7.2.1 and 7.2.3. Adding a LIMIT statement causes what seems like a very bad choice of strategy. The first statement is fine: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr ; gives: Sort (cost=5793.33..5793.33 rows=1453 width=40) -> Index Scan using messageblks_msg_idx on messageblks (cost=0.00..5716.99 rows=1453 width=40) and returns almost instantly, whereas, just adding a limit: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr limit 1; gives: Limit (cost=0.00..777.50 rows=1 width=40) -> Index Scan using messageblks_id_idx on messageblks (cost=0.00..1129984.15 rows=1453 width=40) which takes several minutes to run. The relevant metadata is: Table "messageblks" Column | Type | Modifiers -----------------+-------- +------------------------------------------------------- messageblk_idnr | bigint | not null default nextval('messageblk_idnr_seq'::text) message_idnr | bigint | not null default '0' messageblk | text | not null blocksize | bigint | not null default '0' Indexes: messageblks_msg_idx Primary key: messageblks_pkey Unique keys: messageblks_id_idx Index "messageblks_id_idx" Column | Type -----------------+-------- messageblk_idnr | bigint unique btree Index "messageblks_msg_idx" Column | Type --------------+-------- message_idnr | bigint btree If anyone could explain the likely reasons for the choice, I would be very interested. Even given the planner estimates, they don't look like sensible choices. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > I am trying to understand some apparently odd behaviour with 7.2.1 and > 7.2.3. Adding a LIMIT statement causes what seems like a very bad choice of > strategy. The first statement is fine: Could we see the pg_stats rows for the columns used in these queries? (You have done a vacuum analyze recently, I trust...) > If anyone could explain the likely reasons for the choice, I would be very > interested. Even given the planner estimates, they don't look like sensible > choices. Why do you say that? It's taking a plan estimated at 777 cost units over one that is estimated at (presumably) 5793 units. regards, tom lane
> >Could we see the pg_stats rows for the columns used in these queries? >(You have done a vacuum analyze recently, I trust...) Strange you should ask; it is actually taking some effort to persuade them it's a good idea. They also don't believe that a LIMIT clause affects strategy choice, so it's an uphill battle. As a result, my strong suspicion is that there will be no results in pg_stat - but I have asked for the data. I am mainly interested in understanding the output from this point of view -- and to understand the choice that has been made in such a 'boundary case' (when no analyze has been done). Am I correct in my interpretation that: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr ; Sort (cost=5793.33..5793.33 rows=1453 width=40) -> Index Scan using messageblks_msg_idx on messageblks (cost=0.00..5716.99 rows=1453 width=40) means it expects to get 1453 rows based on a search for a specific key (hence why it has a high cost)? Based on other tests I have done, I have concluded that it assumes a selectivity of 0.5% for non-unique indexes - is that right? Whereas with: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr limit 1; Limit (cost=0.00..777.50 rows=1 width=40) -> Index Scan using messageblks_id_idx on messageblks (cost=0.00..1129984.15 rows=1453 width=40) it looks like the high cost on the last line here is based on the number of pages/tuples in the file, and that the limit is causing 1/1453th of the cost to be applied. It looks like it gets the 1453 as a basic default selectivity again. The problem I have with this is that it seems that the estimate for the LIMIT is assuming that the first row returned will be the right one; in fact it has no guarantee that the specified criteria will be satisfied by the row. Is this assumption likely to be wrong when we do have stats? ISTM that a LIMIT with a WHERE clause that is not satisfied by any index used in the search should not be deemed to reduce the result set size as drastically. In fact, it may actually have to scan the entire result set to get the row. Err...I think 8-). ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Strange you should ask; it is actually taking some effort to persuade them > it's a good idea. They also don't believe that a LIMIT clause affects > strategy choice, so it's an uphill battle. If they don't believe that, they are wrong (and pretty muleheaded, to continue disbelieving it in the face of indisputable evidence to the contrary...). The entire reason that the planner estimates both startup and total cost is so that it can make an informed choice about what to do in the presence of LIMIT. > Am I correct in my interpretation that: > explain SELECT messageblk FROM messageblks WHERE message_idnr = > 100::bigint > ORDER BY messageblk_idnr ; > Sort (cost=5793.33..5793.33 rows=1453 width=40) > -> Index Scan using messageblks_msg_idx on messageblks > (cost=0.00..5716.99 rows=1453 width=40) > means it expects to get 1453 rows based on a search for a specific key > (hence why it has a high cost)? It looks that way. It's always tricky to be sure what the index condition actually is (which is why 7.3's EXPLAIN displays conditions explicitly) --- but I'd interpret the above as an indexed search for message_idnr = 100 (estimated to yield 1453 rows) followed by a sort on messageblk_idnr. > Based on other tests I have done, I have > concluded that it assumes a selectivity of 0.5% for non-unique indexes - is > that right? Yeah, I think that's the default selectivity in the absence of pg_stats data, at least in recent releases. Look in src/backend/utils/adt/selfuncs.c. > Whereas with: > explain SELECT messageblk FROM messageblks WHERE message_idnr = > 100::bigint > ORDER BY messageblk_idnr > limit 1; > Limit (cost=0.00..777.50 rows=1 width=40) > -> Index Scan using messageblks_id_idx on messageblks > (cost=0.00..1129984.15 rows=1453 width=40) > it looks like the high cost on the last line here is based on the number of > pages/tuples in the file, and that the limit is causing 1/1453th of the > cost to be applied. It looks like it gets the 1453 as a basic default > selectivity again. What is presumably happening here is that the index is being used only to retrieve rows in messageblk_idnr order. There is no index condition per se (ie, the whole index is potentially scanned) --- but there is a filter condition ("qpqual") for message_idnr = 100 on the indexscan's output, so the indexscan node is still estimated to yield the same 1453 rows. But notice the very high cost to run the indexscan to completion; under the hood it'd be visiting all the rows. The reason this plan is chosen is that because of the LIMIT 1, it's not going to be run to completion. Essentially, the system is betting that scanning the messageblk_idnr index in order till it hits the first row with message_idnr = 100 will be quicker than finding and sorting all the rows with message_idnr = 100 to get the one with lowest messageblk_idnr. This bet is evidently wrong, but it's hard to tell whether it's wrong because no statistics are available, or because the system isn't making the right deductions from the stats it has, or because the stats aren't adequate to model the situation. (For example, we currently do not have any cross-column correlation stats. If message_idnr and messageblk_idnr are strongly correlated, which I'm suspecting is likely, the rows with message_idnr = 100 would not be randomly scattered in the messageblk_idnr index --- but the system is assuming they will be in order to estimate how long it will take to find the first one.) regards, tom lane
At 10:38 AM 22/11/2002 -0500, Tom Lane wrote: >If they don't believe that, they are wrong (and pretty muleheaded, to >continue disbelieving it in the face of indisputable evidence to the >contrary...). I am not at all sure what their mindset is -- I may have just misunderstood their responses. Very odd. Up until now I thought I held the record for mule-headedness on this sort of thing. >This bet is evidently wrong, but it's hard to tell whether it's wrong >because no statistics are available, or because the system isn't making >the right deductions from the stats it has, or because the stats aren't >adequate to model the situation. Based on my own database with about 30GB of mail, the planner is easily capable of coming up with the right strategy here -- so long as an analyze is done. I have managed to construct trivial examples that exactly mirror the behaviour above, but which work properly after an analyze. >(For example, we currently do not have >any cross-column correlation stats. If message_idnr and messageblk_idnr >are strongly correlated, which I'm suspecting is likely, the rows with >message_idnr = 100 would not be randomly scattered in the >messageblk_idnr index --- but the system is assuming they will be in >order to estimate how long it will take to find the first one.) Indeed; I think this is exactly what we are seeing. Thanks very much for the insights; I have relayed this to the other list -- I'll let you know if anything unexpected happens. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/