Re: [PERFORM]

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [PERFORM]
Дата
Msg-id CAFj8pRA-0Ea9u-N3BzqQk-KwnVD6_GsWmhEJVpkpaTKZfsyqfg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM]  (Yevhenii Kurtov <yevhenii.kurtov@gmail.com>)
Ответы Re: [PERFORM]
Список pgsql-performance


2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov <yevhenii.kurtov@gmail.com>:
Hello folks,

Thank you very much for analysis and suggested - there is a lot to learn here. I just  tried UNION queries and got following error:

ERROR:  FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT

it is sad :(

maybe bitmap index scan can work

postgres=# create table test(id int, started date, failed date, status int);
CREATE TABLE
postgres=# create index on test(id) where status = 0;
CREATE INDEX
postgres=# create index on test(started) where status = 1;
CREATE INDEX
postgres=# create index on test(failed ) where status = 2;
CREATE INDEX
postgres=# explain select id from test where (status = 0 and id in (1,2,3,4,5)) or (status = 1 and started < current_date) or (status = 2 and failed > current_date);
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│                                                                               QUERY PLAN                                           
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
│ Bitmap Heap Scan on test  (cost=12.93..22.50 rows=6 width=4)                                                                       
│   Recheck Cond: (((id = ANY ('{1,2,3,4,5}'::integer[])) AND (status = 0)) OR ((started < CURRENT_DATE) AND (status = 1)) OR ((faile
│   Filter: (((status = 0) AND (id = ANY ('{1,2,3,4,5}'::integer[]))) OR ((status = 1) AND (started < CURRENT_DATE)) OR ((status = 2)
│   ->  BitmapOr  (cost=12.93..12.93 rows=6 width=0)                                                                                 
│         ->  Bitmap Index Scan on test_id_idx  (cost=0.00..4.66 rows=1 width=0)                                                     
│               Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))                                                                    
│         ->  Bitmap Index Scan on test_started_idx  (cost=0.00..4.13 rows=3 width=0)                                                
│               Index Cond: (started < CURRENT_DATE)                                                                                 
│         ->  Bitmap Index Scan on test_failed_idx  (cost=0.00..4.13 rows=3 width=0)                                                 
│               Index Cond: (failed > CURRENT_DATE)                                                                                  
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(10 rows)

  

I made a table dump for anyone who wants to give it a spin https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr
and here is the gist for the original commands https://gist.github.com/lessless/33215d0c147645db721e74e07498ac53

On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <Brad.Dejong@infor.com> wrote:


On 2017-06-28, Pavel Stehule wrote ...
> On 2017-06-28, Yevhenii Kurtov wrote ...
>> On 2017-06-28, Pavel Stehule wrote ...
>>> On 2017-06-28, Yevhenii Kurtov wrote ...
>>>> We have a query that is run almost each second and it's very important to squeeze every other ms out of it. The query is:
>>>> ...
>>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, failed_at, started_at, priority DESC, times_failed);
>>>> ...
>>> There are few issues
>>> a) parametrized LIMIT
>>> b) complex predicate with lot of OR
>>> c) slow external sort
>>>
>>> b) signalize maybe some strange in design .. try to replace "OR" by "UNION" query
>>> c) if you can and you have good enough memory .. try to increase work_mem .. maybe 20MB
>>>
>>> if you change query to union queries, then you can use conditional indexes
>>>
>>> create index(id) where status = 0;
>>> create index(failed_at) where status = 2;
>>> create index(started_at) where status = 1;
>>
>> Can you please give a tip how to rewrite the query with UNION clause?
>
> SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2))))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $3) AND (c0."failed_at" > $4))
> UNION SELECT c0."id" FROM "campaign_jobs" AS c0
> WHERE ((c0."status" = $5) AND (c0."started_at" < $6))
> ORDER BY c0."priority" DESC, c0."times_failed"
> LIMIT $7
> FOR UPDATE SKIP LOCKED


Normally (at least for developers I've worked with), that kind of query structure is used when the "status" values don't overlap and don't change from query to query. Judging from Pavel's suggested conditional indexes (i.e. "where status = <constant>"), he also thinks that is likely.

Give the optimizer that information so that it can use it. Assuming $1 = 0 and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in limit. Push order by and limit to each branch of the union all (or does Postgres figure that out automatically?) Replace union with union all (not sure about Postgres, but allows other dbms to avoid sorting and merging result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" is unique across rows as implied by only "id" being selected with FOR UPDATE. If multiple rows can have the same "id", then use UNION to eliminate the duplicates.)

SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1)
  UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2
  UNION ALL
SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3
ORDER BY "priority" DESC, "times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED


Another thing that you could try is to push the ORDER BY and LIMIT to the branches of the UNION (or does Postgres figure that out automatically?) and use slightly different indexes. This may not make sense for all the branches but one nice thing about UNION is that each branch can be tweaked independently. Also, there are probably unmentioned functional dependencies that you can use to reduce the index size and/or improve your match rate. Example - if status = 1 means that the campaign_job has started but not failed or completed, then you may know that started_at is set, but failed_at and ended_at are null. The < comparison in and of itself implies that only rows where "started_at" is not null will match the condition.

SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND NOT (c0."id" = ANY($1)))) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
UNION ALL
SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT 100
ORDER BY c0."priority" DESC, c0."times_failed"
LIMIT 100
FOR UPDATE SKIP LOCKED

Including the "priority", "times_failed" and "id" columns in the indexes along with "failed_at"/"started_at" allows the optimizer to do index only scans. (May still have to do random I/O to the data page to determine tuple version visibility but I don't think that can be eliminated.)

create index ... ("priority" desc, "times_failed", "id")               where "status" = 0;
create index ... ("priority" desc, "times_failed", "id", "failed_at")  where "status" = 2 and "failed_at" is not null;
create index ... ("priority" desc, "times_failed", "id", "started_at") where "status" = 1 and "started_at" is not null; -- and ended_at is null and ...


I'm assuming that the optimizer knows that "where status = 1 and started_at < $3" implies "and started_at is not null" and will consider the conditional index. If not, then the "and started_at is not null" needs to be explicit.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


В списке pgsql-performance по дате отправления:

Предыдущее
От: Yevhenii Kurtov
Дата:
Сообщение: Re: [PERFORM]
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: [PERFORM]