Обсуждение: WHERE column = X AND column = Y will always be zero matching rows
Hello PostgreSQL Developers,
I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit.
(In the following X and Y are literals; X <> Y)
I noticed is that…
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
…says that PostgreSQL actually wants to perform a Scan!
I wonder why squandering any resources into a Scan here, as it is pretty obvious that the result is guaranteed to be always, under any conditions, and will always be: zero matching rows – at least in a universe where a single value cannot be X AND Y at the same time.
This can be seen as a chance to add an optimization („col = X AND col = Y always is zero matching rows), or in case such an optimization already exists in PostgreSQL, as a bug.
BTW the same happens for column IS NULL AND column = Y.
Regards,
-Markus
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
…says that PostgreSQL actually wants to perform a Scan!
I wonder why squandering any resources into a Scan here
On Thu, 2023-08-03 at 14:20 +0000, [Quipsy] Markus Karg wrote: > I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, > compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit. > > (In the following X and Y are literals; X <> Y) > > I noticed is that… > > EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y > > …says that PostgreSQL actually wants to perform a Scan! This is not a bug. > I wonder why squandering any resources into a Scan here, as it is pretty obvious that the > result is guaranteed to be always, under any conditions, and will always be: zero matching > rows – at least in a universe where a single value cannot be X AND Y at the same time. Such an optimization, while certainly possible, would not be free, and each query with more than one WHERE condition would have to pay the price. On the other hand, only ill-written queries would benefit. Since PostgreSQL only caches execution plans in special cases, anything that slows down the optimizer should improve enough statements that it is a net win. I don't see that here (but that is of course a matter of opinion). Yours, Laurenz Albe
AW: WHERE column = X AND column = Y will always be zero matching rows
David,
thank you for your opinion, so it is a chance for future optimization.
Whether or not this „should not happen“ is a different discussion. 😉
-Markus
Von: David G. Johnston <david.g.johnston@gmail.com>
Gesendet: Donnerstag, 3. August 2023 16:48
An: [Quipsy] Markus Karg <karg@quipsy.de>
Cc: pgsql-bugs@lists.postgresql.org
Betreff: Re: WHERE column = X AND column = Y will always be zero matching rows
On Thursday, August 3, 2023, [Quipsy] Markus Karg <karg@quipsy.de> wrote:
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
…says that PostgreSQL actually wants to perform a Scan!
I wonder why squandering any resources into a Scan here
Mostly because no one wants to volunteer to write code to deal with poorly written queries, nor, more technically, expend compute resources looking for situations that should not happen.
David J.
David,
thank you for your opinion, so it is a chance for future optimization.
Whether or not this „should not happen“ is a different discussion. 😉
On Thu, Aug 3, 2023 at 9:21 AM [Quipsy] Markus Karg <karg@quipsy.de> wrote: > > Hello PostgreSQL Developers, > > I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0,64-bit. > > (In the following X and Y are literals; X <> Y) > I noticed is that… > > EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y > > …says that PostgreSQL actually wants to perform a Scan! > Actually no... you're original complain, never execute the scan (because of the filter is recognized as false) the one, with "col is null and col = Y" on the other side do execute the scan postgres=# create table t1 (id int); CREATE TABLE postgres=# insert into t1 select generate_series(1, 100000); INSERT 0 100000 postgres=# explain select * from t1 where id =5 and id =6; QUERY PLAN ----------------------------------------------------------- Result (cost=0.00..1855.06 rows=1 width=4) One-Time Filter: false -> Seq Scan on t1 (cost=0.00..1855.06 rows=1 width=4) Filter: (id = 5) (4 filas) postgres=# explain analyze select * from t1 where id =5 and id =6; QUERY PLAN --------------------------------------------------------------------------------------- Result (cost=0.00..1693.00 rows=1 width=4) (actual time=0.005..0.007 rows=0 loops=1) One-Time Filter: false -> Seq Scan on t1 (cost=0.00..1693.00 rows=1 width=4) (never executed) Filter: (id = 5) Planning Time: 0.200 ms Execution Time: 0.056 ms (6 filas) -- Jaime Casanova SYSTEMGUARDS
"[Quipsy] Markus Karg" <karg@quipsy.de> writes: > (In the following X and Y are literals; X <> Y) > I noticed is that... > EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y > ...says that PostgreSQL actually wants to perform a Scan! You should learn to read plans before complaining about them. What you actually get is something like regression=# explain analyze select * from tenk1 where hundred = 42 and hundred = 99; QUERY PLAN ----------------------------------------------------------------------------------------------------- Result (cost=5.04..224.95 rows=1 width=244) (actual time=0.001..0.002 rows=0 loops=1) One-Time Filter: false -> Bitmap Heap Scan on tenk1 (cost=5.04..224.95 rows=1 width=244) (never executed) Recheck Cond: (hundred = 42) -> Bitmap Index Scan on tenk1_hundred (cost=0.00..5.04 rows=100 width=0) (never executed) Index Cond: (hundred = 42) Planning Time: 0.148 ms Execution Time: 0.046 ms (8 rows) See the "never executed" markings? The one-time filter prevents anything below the Result from actually doing anything. From memory, the reason it's like this is that it's the most straightforward extension of cases involving pseudoconstant conditions, that is parameters not known at plan time. If you had something like "WHERE var = param1 AND var = const1" then you might get something involving a "param1 = const1" one-time filter, and below that a plan to retrieve rows with "var = const1", which would only be run when the run-time-supplied parameter happens to match the constant. David's statement that we don't want to write code for such cases isn't completely true, but it is true that we don't want to expend code on producing a pretty-looking plan for them. > BTW the same happens for column IS NULL AND column = Y. That indeed is a case we haven't bothered with, and probably never will. The multiple-constants-equated-to-the-same-thing situation is dealt with by the equivalence class machinery, which has many other purposes but happens to be able to detect that situation cheaply. But IS NULL is not an equivalence condition (as we define those, anyway). Likewise, we won't reduce something like "var > C1 AND var < C1" to constant-false; there's just not enough win probability to justify spending planner cycles looking for such cases. regards, tom lane
AW: WHERE column = X AND column = Y will always be zero matching rows
Tom, thanks a lot for your kind explanation, this was very interesting to read! In fact, I have missed the "never executed" marks. BTW, I did not "complain". I just noticed something and was unclear if this is a bug or not, and I clearly told so in theopening of my posting. It was definitively not my intention to bother or annoy anybody, actually. Sorry to have troubledso many in this mailing list. -Markus -----Ursprüngliche Nachricht----- Von: Tom Lane <tgl@sss.pgh.pa.us> Gesendet: Donnerstag, 3. August 2023 19:59 An: [Quipsy] Markus Karg <karg@quipsy.de> Cc: pgsql-bugs@lists.postgresql.org Betreff: Re: WHERE column = X AND column = Y will always be zero matching rows "[Quipsy] Markus Karg" <karg@quipsy.de> writes: > (In the following X and Y are literals; X <> Y) I noticed is that... > EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y > ...says that PostgreSQL actually wants to perform a Scan! You should learn to read plans before complaining about them. What you actually get is something like regression=# explain analyze select * from tenk1 where hundred = 42 and hundred = 99; QUERY PLAN ----------------------------------------------------------------------------------------------------- Result (cost=5.04..224.95 rows=1 width=244) (actual time=0.001..0.002 rows=0 loops=1) One-Time Filter: false -> Bitmap Heap Scan on tenk1 (cost=5.04..224.95 rows=1 width=244) (never executed) Recheck Cond: (hundred = 42) -> Bitmap Index Scan on tenk1_hundred (cost=0.00..5.04 rows=100 width=0) (never executed) Index Cond: (hundred = 42) Planning Time: 0.148 ms Execution Time: 0.046 ms (8 rows) See the "never executed" markings? The one-time filter prevents anything below the Result from actually doing anything. From memory, the reason it's like this is that it's the most straightforward extension of cases involving pseudoconstantconditions, that is parameters not known at plan time. If you had something like "WHERE var = param1 AND var = const1" then you might get something involving a "param1 = const1" one-time filter, and below that a plan to retrieve rows with "var = const1", which would only be run when the run-time-suppliedparameter happens to match the constant. David's statement that we don't want to write code for such cases isn't completely true, but it is true that we don't wantto expend code on producing a pretty-looking plan for them. > BTW the same happens for column IS NULL AND column = Y. That indeed is a case we haven't bothered with, and probably never will. The multiple-constants-equated-to-the-same-thing situation is dealt with by the equivalence class machinery, which has many other purposes but happens to be able to detectthat situation cheaply. But IS NULL is not an equivalence condition (as we define those, anyway). Likewise, we won'treduce something like "var > C1 AND var < C1" to constant-false; there's just not enough win probability to justifyspending planner cycles looking for such cases. regards, tom lane