Обсуждение: missing optimization - column <> column
Hi
I found some crazy queries in one customer application. These queries are stupid, but it was surprise for me so there are not some simple optimizationcreate table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;
Pavel Stehule <pavel.stehule@gmail.com> writes: > I found some crazy queries in one customer application. These queries are > stupid, but it was surprise for me so there are not some simple optimization > create table foo(a int); > insert into foo select generate_series(1,100000); > analyze foo; > explain select * from foo where a <> a; > It does full scan of foo, although it should be replaced by false in > planner time. > Same issue is a expression a = a .. can be replaced by true Wrong; those expressions yield NULL for NULL input. You could perhaps optimize them slightly into some form of is-null test, but it hardly seems worth the planner cycles to check for. If you write something like "1 <> 1", it will be folded. regards, tom lane
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > I found some crazy queries in one customer application. These queries are > stupid, but it was surprise for me so there are not some simple optimization > > create table foo(a int); > insert into foo select generate_series(1,100000); > analyze foo; > explain select * from foo where a <> a; > > It does full scan of foo, although it should be replaced by false in > planner time. a <> a could go to NULL. Obviously, that'll be false for such a simple case, but it might not work out that way in a more complicated WHERE clause. > Same issue is a expression a = a .. can be replaced by true a = a can't be replaced unless you know that 'a' can't be NULL. In short, fix the application. Thanks! Stephen
2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
> It does full scan of foo, although it should be replaced by false in
> planner time.
> Same issue is a expression a = a .. can be replaced by true
Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.
understand
If you write something like "1 <> 1", it will be folded.
it works, but a <> a not
Regards
Pavel
regards, tom lane
2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
Pavel,
* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
>
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
>
> It does full scan of foo, although it should be replaced by false in
> planner time.
a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.
it should be false everywhere
I don't defend a design of the application - it is exactly wrong. But sometimes, it can be generated by some tool or it can be a human error. And bad performance can be a big problems on systems, where you cannot to deploy fix simply.
It is hard to say what should be good design - because these queries are slow, I know so these queries are wrong, but these queries does significant IO utilization - and I cannot to fix the application, because I am not a author and the fix will not be available in next week.
Regards
Pavel
> Same issue is a expression a = a .. can be replaced by true
a = a can't be replaced unless you know that 'a' can't be NULL.
In short, fix the application.
Thanks!
Stephen
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>: > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: > > > I found some crazy queries in one customer application. These queries are > > > stupid, but it was surprise for me so there are not some simple > > optimization > > > > > > create table foo(a int); > > > insert into foo select generate_series(1,100000); > > > analyze foo; > > > explain select * from foo where a <> a; > > > > > > It does full scan of foo, although it should be replaced by false in > > > planner time. > > > > a <> a could go to NULL. Obviously, that'll be false for such a simple > > case, but it might not work out that way in a more complicated WHERE > > clause. > > > > it should be false everywhere No, it's NULL, not false, if 'a' is NULL: =# SELECT 1 WHERE (NULL <> NULL) IS NULL;?column? ---------- 1 (1 row) =*# SELECT 1 WHERE (FALSE) IS NULL;?column? ---------- (0 rows) You can not make the assumption that 'a <> a' is always false. Thanks! Stephen
2016-12-05 16:41 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
Pavel,
* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Pavel Stehule (pavel.stehule@gmail.com) wrote:
> > > I found some crazy queries in one customer application. These queries are
> > > stupid, but it was surprise for me so there are not some simple
> > optimization
> > >
> > > create table foo(a int);
> > > insert into foo select generate_series(1,100000);
> > > analyze foo;
> > > explain select * from foo where a <> a;
> > >
> > > It does full scan of foo, although it should be replaced by false in
> > > planner time.
> >
> > a <> a could go to NULL. Obviously, that'll be false for such a simple
> > case, but it might not work out that way in a more complicated WHERE
> > clause.
> >
>
> it should be false everywhere
No, it's NULL, not false, if 'a' is NULL:
=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
?column?
----------
1
(1 row)
=*# SELECT 1 WHERE (FALSE) IS NULL;
?column?
----------
(0 rows)
You can not make the assumption that 'a <> a' is always false.
ok - when the expression is tested on NULL, then a <> a should not be reduced. But when there are not this test, then I can replace this expression by false.
Regards
Pavel
Thanks!
Stephen
On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Would it be worth replacing the condition with the equivalent?
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
> It does full scan of foo, although it should be replaced by false in
> planner time.
> Same issue is a expression a = a .. can be replaced by true
Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.
If you write something like "1 <> 1", it will be folded.
regards, tom lane
Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?
a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL
Pantelis Theodosiou
Would it be worth replacing the condition with the equivalent?I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?a <> a : (a IS NULL) AND NULLa = a : (a IS NOT NULL) OR NULL
I think you're looking for
a IS DISTINCT FROM a
And that will work for cases where a might be null.
I have no opinion about whether adding such a test to the planner is worth it.
On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
No, (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.
Would it be worth replacing the condition with the equivalent?I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?a <> a : (a IS NULL) AND NULLa = a : (a IS NOT NULL) OR NULLI think you're looking fora IS DISTINCT FROM aAnd that will work for cases where a might be null.I have no opinion about whether adding such a test to the planner is worth it.
No, (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.
Actually there are lots of things that can be done with this sort of theorem proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy for FDW partitioned tables.
One such implementation an be found here:
https://www.google.com/patents/US6728952 (apparently expired)
Cheers
Serge
On Dec 5, 2016, at 7:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
> It does full scan of foo, although it should be replaced by false in
> planner time.
> Same issue is a expression a = a .. can be replaced by true
Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.understand
If you write something like "1 <> 1", it will be folded.it works, but a <> a notRegardsPavel
regards, tom lane