Обсуждение: Should the optimiser convert a CASE into a WHERE if it can?

Поиск
Список
Период
Сортировка

Should the optimiser convert a CASE into a WHERE if it can?

От
Richard Neill
Дата:
Dear All,

Just wondering whether there is a missing scope for the query planner
(on 8.4.2) to be cleverer than it currently is.

Specifically, I wonder whether the optimiser should know that by
converting a CASE condition into a WHERE condition, it can use an index.

Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard



Example:
--------

In this example, tbl_tracker has 255751 rows, with a primary key "id",
whose values lie uniformly in the range 1...1255750.

If one is trying to count multiple conditions, the following query seems
to be the most obvious way to do it:

SELECT
   SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
   SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
   SUM (case when id > 1220000 and id < 1230000 then 1 else 0 end) AS c3,
   SUM (case when id > 1230000 and id < 1240000 then 1 else 0 end) AS c4,
   SUM (case when id > 1240000 and id < 1250000 then 1 else 0 end) AS c5
FROM tbl_tracker;


   c1  |  c2  |  c3  |  c4  |  c5
------+------+------+------+------
  2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms



This can be manually optimised into a far uglier (but much much faster)
query:

SELECT * FROM
  (SELECT COUNT (1) AS c1 FROM tbl_tracker
     WHERE id > 1200000 and id < 1210000) AS s1,
  (SELECT COUNT (1) AS c2 FROM tbl_tracker
     WHERE id > 1210000 and id < 1220000) AS s2,
  (SELECT COUNT (1) AS c3 FROM tbl_tracker
     WHERE id > 1220000 and id < 1230000) AS s3,
  (SELECT COUNT (1) AS c4 FROM tbl_tracker
     WHERE id > 1230000 and id < 1240000) AS s4,
  (SELECT COUNT (1) AS c5 FROM tbl_tracker
     WHERE id > 1240000 and id < 1250000) AS s5

   c1  |  c2  |  c3  |  c4  |  c5
------+------+------+------+------
  2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms





Debugging
---------

The simple queries are:

SELECT SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end)
from tbl_tracker;

Time: 174.804 ms

Explain shows that this does a sequential scan.



SELECT COUNT(1) from tbl_tracker WHERE id > 1200000 and id < 1210000;

Time: 4.153 ms

Explain shows that this uses the index, as expected.



Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Tom Lane
Дата:
Richard Neill <rn214@cam.ac.uk> writes:
> SELECT
>    SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
>    SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
>    ...
> FROM tbl_tracker;

> This can be manually optimised into a far uglier (but much much faster)
> query:

> SELECT * FROM
>   (SELECT COUNT (1) AS c1 FROM tbl_tracker
>      WHERE id > 1200000 and id < 1210000) AS s1,
>   (SELECT COUNT (1) AS c2 FROM tbl_tracker
>      WHERE id > 1210000 and id < 1220000) AS s2,
>   ...

We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.

            regards, tom lane

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Matthew Wakeling
Дата:
On Tue, 26 Jan 2010, Richard Neill wrote:
> SELECT SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) from
> tbl_tracker;
>
> Explain shows that this does a sequential scan.

I'd defer to Tom on this one, but really, for Postgres to work this out,
it would have to peer deep into the mysterious SUM function, and realise
that the number zero is a noop. I suppose it would be possible, but you'd
have to define noops for each of the different possible functions, *and*
make the planner clever enough to spot the noop-matching number in the
else and convert the WHEN into a WHERE.

In my mind, this is quite a lot of work for the planner to do to solve
this one. That translates into quite a lot of work for some poor
programmer to do to achieve it. If you have the money, then hire someone
to do it!

Matthew

--
 I don't want the truth. I want something I can tell parliament!
                                              -- Rt. Hon. Jim Hacker MP

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Richard Neill
Дата:
Thanks for your answers.


David Wilson wrote:

 > Why not simply add the where clause to the original query?
 >
 > SELECT
 > SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
 > SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
 > SUM (case when id > 1220000 and id < 1230000 then 1 else 0 end) AS c3,
 > SUM (case when id > 1230000 and id < 1240000 then 1 else 0 end) AS c4,
 > SUM (case when id > 1240000 and id < 1250000 then 1 else 0 end) AS c5
 > FROM tbl_tracker WHERE (id>1200000) AND (id<1250000);
 >
 > I didn't populate any test tables, but I'd expect that to do just as
 > well without being any uglier than the original query is.

You're absolutely right, but I'm afraid this won't help. I'd simplified
the original example query, but in real life, I've got about 50
different sub-ranges, which cover virtually all the id-space.

----------

Tom Lane wrote:
> Richard Neill <rn214@cam.ac.uk> writes:
>> SELECT
>>    SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
>>    SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
>>    ...
>> FROM tbl_tracker;
>
>> This can be manually optimised into a far uglier (but much much faster)
>> query:
>
>> SELECT * FROM
>>   (SELECT COUNT (1) AS c1 FROM tbl_tracker
>>      WHERE id > 1200000 and id < 1210000) AS s1,
>>   (SELECT COUNT (1) AS c2 FROM tbl_tracker
>>      WHERE id > 1210000 and id < 1220000) AS s2,
>>   ...
>
> We're unlikely to consider doing this, for a couple of reasons:
> it's unlikely to come up often enough to justify the cycles the planner
> would spend looking for the case *on every query*, and it requires very
> special knowledge about the behavior of two specific aggregate functions,
> which is something the planner tends to avoid using.
>

OK - that's all I was wondering. I thought I'd raise this in case it
might be helpful.

I'll add a note to:
http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html
to point out that this is something of a trap for the unwary

Regards,

Richard

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Scott Carey
Дата:
On Jan 26, 2010, at 9:41 AM, Richard Neill wrote:

> Thanks for your answers.
>
>
> David Wilson wrote:
>
>> Why not simply add the where clause to the original query?
>>
>> SELECT
>> SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
>> SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
>> SUM (case when id > 1220000 and id < 1230000 then 1 else 0 end) AS c3,
>> SUM (case when id > 1230000 and id < 1240000 then 1 else 0 end) AS c4,
>> SUM (case when id > 1240000 and id < 1250000 then 1 else 0 end) AS c5
>> FROM tbl_tracker WHERE (id>1200000) AND (id<1250000);
>>
>> I didn't populate any test tables, but I'd expect that to do just as
>> well without being any uglier than the original query is.
>
> You're absolutely right, but I'm afraid this won't help. I'd simplified
> the original example query, but in real life, I've got about 50
> different sub-ranges, which cover virtually all the id-space.
>

Well, it probably shouldn't use the index if it covers the vast majority of the table.  I wonder if it is actually
fasterto reformulate with WHERE or not at that point -- it might be slower. 

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Віталій Тимчишин
Дата:


2010/1/26 Matthew Wakeling <matthew@flymine.org>
On Tue, 26 Jan 2010, Richard Neill wrote:
SELECT SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) from tbl_tracker;

Explain shows that this does a sequential scan.

I'd defer to Tom on this one, but really, for Postgres to work this out, it would have to peer deep into the mysterious SUM function, and realise that the number zero is a noop. I suppose it would be possible, but you'd have to define noops for each of the different possible functions, *and* make the planner clever enough to spot the noop-matching number in the else and convert the WHEN into a WHERE.

Hello.

How  about SELECT SUM (case when id > 1200000 and id < 1210000 then 1 end) from tbl_tracker;
It gives same result (may be unless there are no records at all) and optimizer already knows it need not to call function for null input. Such an optimization would cover much more cases. It would look like:
 * Check only for aggregate subselects
 * All the functions should be noop for null input
 * Add ORed constraint for every function input is not null (in this example (case when id > A1 and id < B1 then 1 end is not null) or (case when id > A2 and id < B2 then 1 end is not null) or ... or (case when id > An and id < Bn then 1 end is not null)
 * Know special "case" (case when id > A1 and id < B1 then 1 end is not null) <=> (id > A1 and id < B1)
by ORing all the "when" conditions case when C1 then D1 when C2 then D2 ... when Cm then Dm end is not null <=> C1 or C2 or ... or Cm.
Event without last part it may give bonuses even for "select count(field) from table" transformed into "select count(field) from table where field is not null" and using [partial] indexes. 
As of last "*", replacing COUNT with SUM(CASE()) is used often enough when multiple count calculations are needed.

Best regards, Vitalii Tymchyshyn

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Matthew Wakeling
Дата:
On Wed, 27 Jan 2010, Віталій Тимчишин wrote:
> How  about SELECT SUM (case when id > 1200000 and id < 1210000 then 1 end)
> from tbl_tracker;

That is very interesting.

> * All the functions should be noop for null input

Alas, not true for COUNT(*), AVG(), etc.

Matthew

--
 An optimist sees the glass as half full, a pessimist as half empty,
 and an engineer as having redundant storage capacity.

Re: Should the optimiser convert a CASE into a WHERE if it can?

От
Віталій Тимчишин
Дата:


27 січня 2010 р. 19:01 Matthew Wakeling <matthew@flymine.org> написав:
On Wed, 27 Jan 2010, Віталій Тимчишин wrote:
How  about SELECT SUM (case when id > 1200000 and id < 1210000 then 1 end)
from tbl_tracker;

That is very interesting.


* All the functions should be noop for null input

Alas, not true for COUNT(*), AVG(), etc.

select avg(b), count(b), count(*) from (values (2),(null))a(b)
gives  (2.0, 1, 2) for me, so AVG is in game. Sure, it won't work for count(*), but optimizer already knows which aggregates are strict and which are not, so no new information is needed.

Best regards, Vitalii Tymchyshyn