Обсуждение: Inconsistant SQL results - Suspected error with query planing or query optimisation.
Inconsistant SQL results - Suspected error with query planing or query optimisation.
От
adam terrey
Дата:
Hay all :)
Firstly, just wanted to say how much I appreceate all the hard work that has gorn into postgres. It really is a fine system.
Anyway, mybug: I have a test SELECT statement (Listing A - see "sql listings.txt") wich produces different results under two simular setups (Listing B and Listing C). Each setup should product the same result for the given SELECT statement.
In the first setup (Listing B) the table "items" has 10,000 records id'ed from 1 to 10,000. The field "number" for every record except two records (id:500 and 600) has the value NULL., the two exceptions (id: 500 and 600) have the value 1.
This first setup produces the correct results for the given SELECT statement.
The SELECT statement is essentlually a dubble negitive - using two LEFT JOINS.
- The most inner nested query selects WHERE number = 1
- Then next most inner selects everything but, and
- The most outer selects every thing but everything but WHERE number = 1
The second setup (Listing C) is identicle to the first execpt that the table "items" has an extra field and a primary key index. The goal of this setup is to produce a cirtian query plan that I beleive is broken, where it seems that the "Nested Loop Left Join" has forced the filter for "WHERE number = 1" outside or (perhaps after) a join one of the more nested joins causeing that more nested join to cancel it self out.
Well.... that's what I make of it. I really hope you are able to reproduce this, it took me ages to find a setup that would be reproduce what I was observing in my developement system.
See the attached output.txt for the results and the EXPLAIN statement for the SELECT statement under each setup.
Postgres:
"PostgreSQL 8.2.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)"
Hardware:
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
2GB Mem
OS:
Kernel: Linux version 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12)
OS: GNU/Linux Debian "Etch" - AMD64
Kind regards,
Adam Terrey
Data Systems - Marketing Strategy and Planning
University Relations Directorate
Australian Catholic University Limited (ABN 15 050 192 660)
42 Edward Street, North Sydney NSW 2060
PO Box 968 North Sydney NSW 2059
Telephone 02 9739 2517
Facsimile 02 9739 2960
Mobile 0421 430 824
Email: adam.terrey@acu.edu.au
http://www.acu.edu.au
Australian Catholic University and the courses offered by the University are
registered on the Commonwealth Register of Institutions and Courses for
Overseas Students (CRICOS). Provider registration codes: 00004G, 0012C,
Вложения
Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.
От
Alvaro Herrera
Дата:
adam terrey wrote: > The second setup (Listing C) is identicle to the first execpt that the > table "items" has an extra field and a primary key index. The goal of this > setup is to produce a cirtian query plan that I beleive is broken, where > it seems that the "Nested Loop Left Join" has forced the filter for "WHERE > number = 1" outside or (perhaps after) a join one of the more nested joins > causeing that more nested join to cancel it self out. It's easy to confirm that the nested loop is the culprit here: if you SET enable_nestloop to off, the query returns different results (the expected two tuples). 8.1 seems to work OK, but both 8.2 and HEAD don't. alvherre=# set enable_nestloop to off; SET alvherre=# SELECT items.id FROM items LEFT JOIN ( -- Query i. SELECT items.id FROM items LEFT JOIN ( -- Query ii. SELECT id FROM items WHERE number = 1 ) AS moded_items USING (id) WHERE moded_items.id IS NULL ) AS sub_items USING (id) WHERE sub_items.id IS NULL; id ----- 500 600 (2 rows) alvherre=# set enable_nestloop to on; SET alvherre=# SELECT items.id FROM items LEFT JOIN ( -- Query i. SELECT items.id FROM items LEFT JOIN ( -- Query ii. SELECT id FROM items WHERE number = 1 ) AS moded_items USING (id) WHERE moded_items.id IS NULL ) AS sub_items USING (id) WHERE sub_items.id IS NULL; id ---- (0 rows) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
adam terrey <a.terrey@mackillop.acu.edu.au> writes at http://archives.postgresql.org/pgsql-bugs/2007-05/msg00187.php > Anyway, mybug: I have a test SELECT statement (Listing A - see "sql > listings.txt") wich produces different results under two simular setups > (Listing B and Listing C). Each setup should product the same result for > the given SELECT statement. The problem here is that 8.2 is incorrectly concluding that it can rearrange the order of the two LEFT JOIN steps in the query: SELECT a.id FROM items a LEFT JOIN (SELECT b.idFROM items bLEFT JOIN ( SELECT c.id FROM items c WHERE number = 1) AS moded_items USING (id)WHEREmoded_items.id IS NULL ) AS sub_items USING (id) WHERE sub_items.id IS NULL; The plan it comes up with is: Nested Loop Left Join (cost=469.00..1063.39 rows=1 width=4) (actual time=288.962..288.962 rows=0 loops=1) Filter: (c.idIS NULL) -> Hash Left Join (cost=469.00..1063.00 rows=1 width=8) (actual time=288.946..288.946 rows=0 loops=1) Hash Cond: (a.id = b.id) Filter: (b.id IS NULL) -> Seq Scan on items a (cost=0.00..344.00 rows=10000width=4) (actual time=0.080..50.973 rows=10000 loops=1) -> Hash (cost=344.00..344.00 rows=10000 width=4)(actual time=140.880..140.880 rows=10000 loops=1) -> Seq Scan on items b (cost=0.00..344.00 rows=10000width=4) (actual time=0.046..69.395 rows=10000 loops=1) -> Index Scan using items_pkey on items c (cost=0.00..0.38rows=1 width=4) (never executed) Index Cond: (b.id = c.id) Filter: (c.number = 1) After reducing join_collapse_limit to 1, we get the right join order and the right answers: Hash Left Join (cost=750.54..1132.05 rows=1 width=4) (actual time=409.712..409.740 rows=2 loops=1) Hash Cond: (a.id = b.id) Filter: (b.id IS NULL) -> Seq Scan on items a (cost=0.00..344.00 rows=10000 width=4) (actual time=0.100..51.052rows=10000 loops=1) -> Hash (cost=750.52..750.52 rows=1 width=4) (actual time=264.978..264.978 rows=9998loops=1) -> Hash Left Join (cost=369.01..750.52 rows=1 width=4) (actual time=30.074..192.023 rows=9998loops=1) Hash Cond: (b.id = c.id) Filter: (c.id IS NULL) -> Seq Scan onitems b (cost=0.00..344.00 rows=10000 width=4) (actual time=0.030..50.913 rows=10000 loops=1) -> Hash (cost=369.00..369.00rows=1 width=4) (actual time=29.976..29.976 rows=2 loops=1) -> Seq Scan on itemsc (cost=0.00..369.00 rows=1 width=4) (actual time=29.896..29.916 rows=2 loops=1) Filter: (number= 1) So there is something wrong with the rule used for deciding whether two LEFT JOINs can commute. Per the planner README: : The planner's treatment of outer join reordering is based on the following : identities: : : 1. (A leftjoin B on (Pab)) innerjoin C on (Pac) : = (A innerjoin C on (Pac)) leftjoin B on (Pab) : : where Pac is a predicate referencing A and C, etc (in this case, clearly : Pac cannot reference B, or the transformation is nonsensical). : : 2. (A leftjoin B on (Pab)) leftjoin C on (Pac) : = (A leftjoin C on (Pac)) leftjoin B on (Pab) : : 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc) : = A leftjoin (B leftjoin C on (Pbc)) on (Pab) : : Identity 3 only holds if predicate Pbc must fail for all-null B rows : (that is, Pbc is strict for at least one column of B). If Pbc is not : strict, the first form might produce some rows with nonnull C columns : where the second form would make those entries null. What we have here is an invocation of rule 3 in a situation where it's not appropriate. The difficulty is that the code is only paying attention to the syntactical JOIN/ON clauses and has neglected the intermediate-level WHERE clause. After a bit of reflection it seems that a WHERE that is semantically just below a left-join's right side can be treated as if it were part of that left-join's ON clause. It will have the same effect as if it had been written there: any rows rejected by the WHERE will fail to be joined to the left side and will contribute nothing to the result. Had we been following this rule, we'd have concluded that c.id IS NULL is part of the upper join qual, and therefore that it has a predicate Pabc not just Pab and cannot be commuted with the lower join. Teaching initsplan.c to do things this way seems possible but less than trivial. Before I start worrying about that, does anyone see any flaws in the reasoning at this level of detail? regards, tom lane