Обсуждение: missing optimization - column <> column

Поиск
Список
Период
Сортировка

missing optimization - column <> column

От
Pavel Stehule
Дата:
Hi

I found some crazy queries in one customer application. These queries are stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in planner time.

Same issue is a expression a = a .. can be replaced by true

Regards

Pavel

Re: missing optimization - column <> column

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization

> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;

> It does full scan of foo, although it should be replaced by false in
> planner time.

> Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input.  You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

If you write something like "1 <> 1", it will be folded.
        regards, tom lane



Re: missing optimization - column <> column

От
Stephen Frost
Дата:
Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
>
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
>
> It does full scan of foo, although it should be replaced by false in
> planner time.

a <> a could go to NULL.  Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

> Same issue is a expression a = a .. can be replaced by true

a = a can't be replaced unless you know that 'a' can't be NULL.

In short, fix the application.

Thanks!

Stephen

Re: missing optimization - column <> column

От
Pavel Stehule
Дата:


2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization

> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;

> It does full scan of foo, although it should be replaced by false in
> planner time.

> Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input.  You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

understand
 

If you write something like "1 <> 1", it will be folded.

it works, but a <> a not

Regards

Pavel

                        regards, tom lane

Re: missing optimization - column <> column

От
Pavel Stehule
Дата:


2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
>
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
>
> It does full scan of foo, although it should be replaced by false in
> planner time.

a <> a could go to NULL.  Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

it should be false everywhere

I don't defend a design of the application - it is exactly wrong. But sometimes, it can be generated by some tool or it can be a human error. And bad performance can be a big problems on systems, where you cannot to deploy fix simply.

It is hard to say what should be good design - because these queries are slow, I know so these queries are wrong, but these queries does significant IO utilization - and I cannot to fix the application, because I am not a author and the fix will not be available in next week.

Regards

Pavel
 

> Same issue is a expression a = a .. can be replaced by true

a = a can't be replaced unless you know that 'a' can't be NULL.

In short, fix the application.

Thanks!

Stephen

Re: missing optimization - column <> column

От
Stephen Frost
Дата:
Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Pavel Stehule (pavel.stehule@gmail.com) wrote:
> > > I found some crazy queries in one customer application. These queries are
> > > stupid, but it was surprise for me so there are not some simple
> > optimization
> > >
> > > create table foo(a int);
> > > insert into foo select generate_series(1,100000);
> > > analyze foo;
> > > explain select * from foo where a <> a;
> > >
> > > It does full scan of foo, although it should be replaced by false in
> > > planner time.
> >
> > a <> a could go to NULL.  Obviously, that'll be false for such a simple
> > case, but it might not work out that way in a more complicated WHERE
> > clause.
> >
>
> it should be false everywhere

No, it's NULL, not false, if 'a' is NULL:

=# SELECT 1 WHERE (NULL <> NULL) IS NULL;?column?
----------       1
(1 row)

=*# SELECT 1 WHERE (FALSE) IS NULL;?column?
----------
(0 rows)

You can not make the assumption that 'a <> a' is always false.

Thanks!

Stephen

Re: missing optimization - column <> column

От
Pavel Stehule
Дата:


2016-12-05 16:41 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
Pavel,

* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost@snowman.net>:
> > * Pavel Stehule (pavel.stehule@gmail.com) wrote:
> > > I found some crazy queries in one customer application. These queries are
> > > stupid, but it was surprise for me so there are not some simple
> > optimization
> > >
> > > create table foo(a int);
> > > insert into foo select generate_series(1,100000);
> > > analyze foo;
> > > explain select * from foo where a <> a;
> > >
> > > It does full scan of foo, although it should be replaced by false in
> > > planner time.
> >
> > a <> a could go to NULL.  Obviously, that'll be false for such a simple
> > case, but it might not work out that way in a more complicated WHERE
> > clause.
> >
>
> it should be false everywhere

No, it's NULL, not false, if 'a' is NULL:

=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
 ?column?
----------
        1
(1 row)

=*# SELECT 1 WHERE (FALSE) IS NULL;
 ?column?
----------
(0 rows)

You can not make the assumption that 'a <> a' is always false.

ok - when the expression is tested on NULL, then a <> a should not be reduced. But when there are not this test, then I can replace this expression by false.

Regards

Pavel
 

Thanks!

Stephen

Re: missing optimization - column <> column

От
Pantelis Theodosiou
Дата:


On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization

> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;

> It does full scan of foo, although it should be replaced by false in
> planner time.

> Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input.  You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

If you write something like "1 <> 1", it will be folded.

                        regards, tom lane




Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?

a <> a    :   (a IS NULL) AND NULL
a = a     :   (a IS NOT NULL) OR NULL

Pantelis Theodosiou
 

Re: missing optimization - column <> column

От
Corey Huinker
Дата:

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?

a <> a    :   (a IS NULL) AND NULL
a = a     :   (a IS NOT NULL) OR NULL


I think you're looking for

a IS DISTINCT FROM a 

And that will work for cases where a might be null.

I have no opinion about whether adding such a test to the planner is worth it.

Re: missing optimization - column <> column

От
Pantelis Theodosiou
Дата:


On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a is (not) nullable or when "a" is more complicated expression?

a <> a    :   (a IS NULL) AND NULL
a = a     :   (a IS NOT NULL) OR NULL


I think you're looking for

a IS DISTINCT FROM a 

And that will work for cases where a might be null.

I have no opinion about whether adding such a test to the planner is worth it.

No,  (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.

Re: missing optimization - column <> column

От
Serge Rielau
Дата:
Actually there are lots of things that can be done with this sort of theorem proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy for FDW partitioned tables.

One such implementation an be found here:


Cheers
Serge


 
On Dec 5, 2016, at 7:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:



2016-12-05 16:24 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization

> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;

> It does full scan of foo, although it should be replaced by false in
> planner time.

> Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input.  You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

understand
 

If you write something like "1 <> 1", it will be folded.

it works, but a <> a not

Regards

Pavel

                        regards, tom lane