Re: Query is slow when order by and limit clause are used in the query

Поиск
Список
Период
Сортировка
От sreekanth vajrapu
Тема Re: Query is slow when order by and limit clause are used in the query
Дата
Msg-id CAKPbTYgnMgj8tjTuPU1aRazH3NCXePWgvb1AXGPWDJh1k4h0Wg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query is slow when order by and limit clause are used in the query  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Query is slow when order by and limit clause are used in the query  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-bugs
Hi Team,

Thank you so much for the quick response on my issue. 

I minimized the query to the problematic part, Also I tried with all the options that you suggested, Still I am seeing the slow when I USE LIMIT 30.

Answers to your questions.

1) Statistics are up to date.
2) Version of the postgres is "PostgreSQL 9.5.21".

Below are the execution plans along with the query and index definitions. Kindly help me to resolve this issue.


SCENARIO 1: QUERY PLAN WITH ORDER BY ASC AND WITH LIMIT 31 (This is actual one which is being used by application)
explain analyze
SELECT item.*
FROM (
SELECT
          t1.name,
          t1.deleted
     FROM t1
     JOIN t2 USING (id)
     JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name LIMIT 31 OFFSET 0;

QUERY PLAN WITH ORDER BY ASC AND WITH LIMIT 31
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.14..1034.75 rows=31 width=26) (actual time=1206.826..1207.108 rows=31 loops=1)
   ->  Nested Loop  (cost=1.14..339827.40 rows=10192 width=26) (actual time=1206.824..1207.102 rows=31 loops=1)
         ->  Nested Loop  (cost=0.71..334957.35 rows=10415 width=58) (actual time=1206.811..1206.951 rows=31 loops=1)
               ->  Index Scan using t1_name_index on t1  (cost=0.42..165174.97 rows=542766 width=42) (actual time=0.012..636.150 rows=521998 loops=1)
                     Filter: ((NOT deleted) OR (deleted IS NULL))
               ->  Index Only Scan using t2_pkey on t2  (cost=0.29..0.30 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=521998)
                     Index Cond: (id = t1.id)
                     Heap Fetches: 0
         ->  Index Only Scan using t3_pkey on t3  (cost=0.42..0.46 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=31)
               Index Cond: (id = t1.id)
               Heap Fetches: 0
 Planning time: 0.624 ms
 Execution time: 1207.162 ms
(13 rows)

SCENARIO 2: QUERY PLAN WITH ORDER BY ASC AND WITHOUT LIMIT
explain analyze
SELECT item.*
FROM (
SELECT
          t1.name,
          t1.deleted
     FROM t1
     JOIN t2 USING (id)
     JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name;

QUERY PLAN WITH ORDER BY ASC AND WITHOUT LIMIT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18989.88..19015.36 rows=10192 width=26) (actual time=166.891..167.575 rows=10415 loops=1)
   Sort Key: t1.name
   Sort Method: quicksort  Memory: 1198kB
   ->  Nested Loop  (cost=291.76..18311.34 rows=10192 width=26) (actual time=1.912..139.370 rows=10415 loops=1)
         Join Filter: (t2.id = t1.id)
         ->  Hash Join  (cost=291.34..10571.03 rows=10415 width=32) (actual time=1.897..93.228 rows=10415 loops=1)
               Hash Cond: (t3.id = t2.id)
               ->  Seq Scan on t3  (cost=0.00..8183.67 rows=531167 width=16) (actual time=0.005..30.359 rows=531167 loops=1)
               ->  Hash  (cost=161.15..161.15 rows=10415 width=16) (actual time=1.861..1.861 rows=10415 loops=1)
                     Buckets: 16384  Batches: 1  Memory Usage: 617kB
                     ->  Seq Scan on t2  (cost=0.00..161.15 rows=10415 width=16) (actual time=0.002..0.753 rows=10415 loops=1)
         ->  Index Scan using t1_pkey on t1  (cost=0.42..0.73 rows=1 width=42) (actual time=0.004..0.004 rows=1 loops=10415)
               Index Cond: (id = t3.id)
               Filter: ((NOT deleted) OR (deleted IS NULL))
 Planning time: 0.562 ms
 Execution time: 167.973 ms
(16 rows)

SCENARIO 3: QUERY PLAN WITH ORDER BY DESC AND WITH LIMIT 31
explain analyze
SELECT item.*
FROM (
SELECT
          t1.name,
          t1.deleted
     FROM t1
     JOIN t2 USING (id)
     JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name DESC LIMIT 31 OFFSET 0;

QUERY PLAN WITH ORDER BY DESC AND WITH LIMIT 31
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.14..1034.75 rows=31 width=26) (actual time=0.039..0.431 rows=31 loops=1)
   ->  Nested Loop  (cost=1.14..339827.40 rows=10192 width=26) (actual time=0.038..0.428 rows=31 loops=1)
         ->  Nested Loop  (cost=0.71..334957.35 rows=10415 width=58) (actual time=0.021..0.261 rows=31 loops=1)
               ->  Index Scan Backward using t1_name_index on t1  (cost=0.42..165174.97 rows=542766 width=42) (actual time=0.012..0.091 rows=61 loops=1)
                     Filter: ((NOT deleted) OR (deleted IS NULL))
               ->  Index Only Scan using t2_pkey on t2  (cost=0.29..0.30 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=61)
                     Index Cond: (id = t1.id)
                     Heap Fetches: 0
         ->  Index Only Scan using t3_pkey on t3  (cost=0.42..0.46 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=31)
               Index Cond: (id = t1.id)
               Heap Fetches: 0
 Planning time: 0.663 ms
 Execution time: 0.491 ms
(13 rows)


SCENARIO 4: QUERY PLAN WITH ORDER BY DESC AND WITHOUT LIMIT
explain analyze
SELECT item.*
FROM (
SELECT
          t1.name,
          t1.deleted
     FROM t1
     JOIN t2 USING (id)
     JOIN t3 USING (id)
) item
WHERE (
deleted = FALSE
OR deleted IS NULL
)
ORDER BY item.name DESC


QUERY PLAN WITH ORDER BY DESC AND WITHOUT LIMIT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18989.88..19015.36 rows=10192 width=26) (actual time=177.915..178.513 rows=10415 loops=1)
   Sort Key: t1.name DESC
   Sort Method: quicksort  Memory: 1198kB
   ->  Nested Loop  (cost=291.76..18311.34 rows=10192 width=26) (actual time=2.052..149.175 rows=10415 loops=1)
         Join Filter: (t2.id = t1.id)
         ->  Hash Join  (cost=291.34..10571.03 rows=10415 width=32) (actual time=2.038..96.798 rows=10415 loops=1)
               Hash Cond: (t3.id = t2.id)
               ->  Seq Scan on t3  (cost=0.00..8183.67 rows=531167 width=16) (actual time=0.005..31.440 rows=531167 loops=1)
               ->  Hash  (cost=161.15..161.15 rows=10415 width=16) (actual time=1.993..1.993 rows=10415 loops=1)
                     Buckets: 16384  Batches: 1  Memory Usage: 617kB
                     ->  Seq Scan on t2  (cost=0.00..161.15 rows=10415 width=16) (actual time=0.004..0.822 rows=10415 loops=1)
         ->  Index Scan using t1_pkey on t1  (cost=0.42..0.73 rows=1 width=42) (actual time=0.005..0.005 rows=1 loops=10415)
               Index Cond: (id = t3.id)
               Filter: ((NOT deleted) OR (deleted IS NULL))
 Planning time: 0.585 ms
 Execution time: 178.915 ms
(16 rows)

SCENARIO 5:  QUERY PLAN WITH ORDER BY ASC AND LIMIT 30  BY CHAING QUERY TO (NOT COALESCE(deleted, false))

explain analyze
SELECT item.*
FROM (
SELECT
          t1.name,
          t1.deleted
     FROM t1
     JOIN t2 USING (id)
     JOIN t3 USING (id)
) item
WHERE ( NOT COALESCE(deleted,false)
--deleted = FALSE
--OR deleted IS NULL
)
ORDER BY item.name LIMIT 31 OFFSET 0;


QUERY PLAN WITH ORDER BY ASC AND LIMIT 30  BY CHAING QUERY TO (NOT COALESCE(deleted, false))
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.14..1034.75 rows=31 width=26) (actual time=1230.705..1230.983 rows=31 loops=1)
   ->  Nested Loop  (cost=1.14..339827.40 rows=10192 width=26) (actual time=1230.704..1230.980 rows=31 loops=1)
         ->  Nested Loop  (cost=0.71..334957.35 rows=10415 width=58) (actual time=1230.693..1230.835 rows=31 loops=1)
               ->  Index Scan using t1_name_index on t1  (cost=0.42..165174.97 rows=542766 width=42) (actual time=0.017..666.816 rows=521998 loops=1)
                     Filter: (NOT COALESCE(deleted, false))
               ->  Index Only Scan using t2_pkey on t2  (cost=0.29..0.30 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=521998)
                     Index Cond: (id = t1.id)
                     Heap Fetches: 0
         ->  Index Only Scan using t3_pkey on t3  (cost=0.42..0.46 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=31)
               Index Cond: (id = t1.id)
               Heap Fetches: 0
 Planning time: 0.848 ms
 Execution time: 1231.039 ms
(13 rows)

Time: 1255.381 ms (00:01.255)

Index Definitions;

ALTER TABLE t1 ADD CONSTRAINT t1_pkey PRIMARY KEY (id);

CREATE INDEX t1_name_index ON t1 USING btree (name COLLATE pg_catalog."default" ASC NULLS LAST);
CREATE INDEX t1_name_deleted_peteam1 ON t1 USING btree ((COALESCE(deleted, false)), name);
CREATE INDEX t1_name_deleted_peteam2 ON t1 USING btree (name, (COALESCE(deleted, false)));
CREATE INDEX t1_name_deleted_peteam3 ON t1 USING btree ((COALESCE(deleted, false)), name DESC);
CREATE INDEX t1_name_deleted_peteam4 ON t1 USING btree (name DESC, (COALESCE(deleted, false)));
CREATE INDEX t1_name_index_peteam5   ON t1 USING btree (name, deleted);

ALTER TABLE t2 ADD CONSTRAINT t2_pkey PRIMARY KEY (id);

ALTER TABLE t3 ADD CONSTRAINT t3_pkey PRIMARY KEY (id);




On Tue, May 25, 2021 at 5:35 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 25 May 2021 at 02:19, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> If I had to guess, I'd say this is a case of the usual LIMIT problem,
> where the optimizer assumes the matching rows are uniformly distributed
> in the input relation, when in reality it's "concentrated" at the end.

I'd guess that too. But hard to say due to the inconsistent
anonymisation of the plan,

> Hard to say, though, confirming it would require looking at the data
> more closely. The one thing I'd suggest is changing the xxxx_index to
> also include the "deleted" column, but it's a stab in the dark.

I'd say, providing xxxx_item and xxxxx_item are actually the same
table but just anonymised poorly, then an index such as:

create index on xxxx_item(COALESCE(deleted,false), name);

then change the query so instead of doing WHERE NOT deleted or deleted
is null;   do instead WHERE NOT COALESCE(deleted,false);

Without the query change then there's no hope of that index being used.

I think this would improve the situation as the LIMIT 30 plan is using
xxxxx_index to provide presorted results for the ORDER BY but can only
do index filtering on: (((NOT deleted) OR (deleted IS NULL)) AND
(SubPlan 6)).  So providing not too many rows are filtered out by
SubPlan 6, then that should reduce the Rows Removed by Filter.
However, if the majority of those rows are filtered out by Subplan 6,
then the index won't help much.

It would be nice if the schema was better designed so the deleted
column could only be true or false though.

sreekanth, for the future, you can use https://explain.depesz.com/ to
anonymise your queries. It'll do it in a consistent way that changes
the names of things in a consistent way that people can still follow.

David


--
Thanks & Regards,
Sreekanth

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17039: Won't Upgrade - Repo Error
Следующее
От: "okano.naoki@fujitsu.com"
Дата:
Сообщение: RE: CR is not removed with psql -f command on Windows.