Обсуждение: [GENERAL] Slow index scan - Pgsql 9.2

Поиск
Список
Период
Сортировка

[GENERAL] Slow index scan - Pgsql 9.2

От
Patrick B
Дата:
Hi guys,

I've got the following Query:
WITH
                                       query_p AS (
                                               SELECT CAST(6667176 AS BIGINT) AS client_id),
                                                                                               
 clients AS (
                                                   SELECT
                                                       client.id,client.job_share_mode
                                                       FROM
                                                           customers AS client
                                                       WHERE
                                                           (client.clientid = (SELECT qp.client_id FROM query_p AS qp))
                                                       AND
                                                           NOT client.is_demo
                                                       AND
                                                           NOT client.deleted
                                                   )
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid = qp.client_id) AND NOT c.deleted) AS client_count
FROM query_p AS qp

Explain Analyze:
CTE Scan on "query_p" "qp"  (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
  CTE query_p
    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
  SubPlan 2
    ->  Aggregate  (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
          ->  Index Scan using "clientid_customers" on "customers" "c"  (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
                Index Cond: ("clientid" = "qp"."client_id")
                Filter: (NOT "deleted")
                Rows Removed by Filter: 1068
Total runtime: 4075.753 ms


Why a search for "client_id" is so slow??


Table customers:

                                              Table "public.customers"

         Column         |            Type             |                            Modifiers                            

------------------------+-----------------------------+-----------------------------------------------------------------

 id                     | bigint                      | not null default "nextval"('"customers_seq"'::"regclass")

 clientid               | bigint                      | not null default 0

 name_first             | character varying(80)       | default ''::character varying

 name_last              | character varying(80)       | default ''::character varying

 company                | character varying(255)      | default ''::character varying


Index clientid_customers:

CREATE INDEX

    clientid_customers

ON

    customers

    (

        "clientid"

    );



Thanks!

Patrick

Re: [GENERAL] Slow index scan - Pgsql 9.2

От
Jan de Visser
Дата:
Hi guys,

I've got the following Query:
WITH
                                       query_p AS (
                                               SELECT CAST(6667176 AS BIGINT) AS client_id),
                                                                                               
 clients AS (
                                                   SELECT
                                                       client.id,client.job_share_mode
                                                       FROM
                                                           customers AS client
                                                       WHERE
                                                           (client.clientid = (SELECT qp.client_id FROM query_p AS qp))
                                                       AND
                                                           NOT client.is_demo
                                                       AND
                                                           NOT client.deleted
                                                   )
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid = qp.client_id) AND NOT c.deleted) AS client_count
FROM query_p AS qp

Explain Analyze:
CTE Scan on "query_p" "qp"  (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
  CTE query_p
    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
  SubPlan 2
    ->  Aggregate  (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
          ->  Index Scan using "clientid_customers" on "customers" "c"  (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
                Index Cond: ("clientid" = "qp"."client_id")
                Filter: (NOT "deleted")
                Rows Removed by Filter: 1068
Total runtime: 4075.753 ms


Why a search for "client_id" is so slow??

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.
 


Table customers:

                                              Table "public.customers"

         Column         |            Type             |                            Modifiers                            

------------------------+-----------------------------+-----------------------------------------------------------------

 id                     | bigint                      | not null default "nextval"('"customers_seq"'::"regclass")

 clientid               | bigint                      | not null default 0

 name_first             | character varying(80)       | default ''::character varying

 name_last              | character varying(80)       | default ''::character varying

 company                | character varying(255)      | default ''::character varying


Index clientid_customers:

CREATE INDEX

    clientid_customers

ON

    customers

    (

        "clientid"

    );



Thanks!

Patrick


Re: [GENERAL] Slow index scan - Pgsql 9.2

От
"David G. Johnston"
Дата:
On Mon, Jan 9, 2017 at 6:06 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Explain Analyze:
CTE Scan on "query_p" "qp"  (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
  CTE query_p
    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
  SubPlan 2
    ->  Aggregate  (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
          ->  Index Scan using "clientid_customers" on "customers" "c"  (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
                Index Cond: ("clientid" = "qp"."client_id")
                Filter: (NOT "deleted")
                Rows Removed by Filter: 1068
Total runtime: 4075.753 ms


Why a search for "client_id" is so slow??


​3,581​ individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.

David J.

Re: [GENERAL] Slow index scan - Pgsql 9.2

От
David Rowley
Дата:
On 10 January 2017 at 14:06, Patrick B <patrickbakerbr@gmail.com> wrote:
>           ->  Index Scan using "clientid_customers" on "customers" "c"  (cost=0.00..1059.01 rows=607 width=0) (actual
time=9.105..4063.728rows=2513 loops=1) 
>                 Index Cond: ("clientid" = "qp"."client_id")
>                 Filter: (NOT "deleted")
>                 Rows Removed by Filter: 1068
> Total runtime: 4075.753 ms
>
> Why a search for "client_id" is so slow??

EXPLAIN (ANALYZE, BUFFERS) might reveal something.

Perhaps each of the 2513 found rows, plus the 1068 filtered out rows
were spread over the table. Perhaps each on their own heap page, and
all those pages had to be read from disk. The BUFFERS option might
help show if this is the case.

Does it execute as slowly when you run it for a 2nd time?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [GENERAL] Slow index scan - Pgsql 9.2

От
Patrick B
Дата:
​3,581​ individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is: 

deleted boolean

Should I create an index for that? How could I improve this query?


Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?
 

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.

My mistake.


Would an Index be sufficient to solve the problem?

Patrick

Re: [GENERAL] Slow index scan - Pgsql 9.2

От
"David G. Johnston"
Дата:
On Mon, Jan 9, 2017 at 8:05 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
​3,581​ individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is: 

deleted boolean

Should I create an index for that? How could I improve this query?


Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

​Correct - your first execution swallows disk I/O​.
 
 

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.

My mistake.


Would an Index be sufficient to solve the problem?


​A lot would depend on the selectivity of "deleted"... you are tossing less than 1/3rd of the rows here so probably not that helpful - though a partial index over deleted = false would benefit this specific query.

If this query dominates your non-PK usage of this table you could consider clustering the table on client_id - that would reduce the I/O hit at the expense of increased maintenance.

Really, though, the query you are showing is poorly written - so I am assuming that it is standing in for something else.  If that's so you have provided zero context for meaningful opinions to be rendered.

Queries against the supposedly large customer table, on the high-cardinality client_id field, which result in many matching rows is going to be a physical problem if the data doesn't remain in cache.  CLUSTER can help a bit in that situation.  Otherwise your solutions are more likely to by physical and not logical (model/SQL).

David J.

Re: [GENERAL] Slow index scan - Pgsql 9.2

От
Tomas Vondra
Дата:
On 01/10/2017 04:05 AM, Patrick B wrote:
>     ​3,581​ individual pokes into the heap to confirm tuple visibility
>     and apply the deleted filter - that could indeed take a while.
>     David J.
>
>
> I see.. The deleted column is:
>
> deleted boolean
>
> Should I create an index for that? How could I improve this query?
>
>
>     Does it execute as slowly when you run it for a 2nd time?
>
>
> No, it doesn't. I think it's because of cache?
>
>
>     I would think because of the NOT "deleted" clause. Which is
>     interesting, because that's a column which you conveniently didn't
>     include in the definition below.
>
>
> My mistake.
>
>
> Would an Index be sufficient to solve the problem?
>

Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow
index-only scans, i.e. creating an index like this:

   CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

   CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:

   CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
   WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] Slow index scan - Pgsql 9.2

От
Patrick B
Дата:


2017-01-11 4:05 GMT+13:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
On 01/10/2017 04:05 AM, Patrick B wrote:
    ​3,581​ individual pokes into the heap to confirm tuple visibility
    and apply the deleted filter - that could indeed take a while.
    David J.


I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?


    Does it execute as slowly when you run it for a 2nd time?


No, it doesn't. I think it's because of cache?


    I would think because of the NOT "deleted" clause. Which is
    interesting, because that's a column which you conveniently didn't
    include in the definition below.


My mistake.


Would an Index be sufficient to solve the problem?


Not a separate index - the query probably would not benefit from two separate indexes. But you can amend the existing index, to allow index-only scans, i.e. creating an index like this:

  CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

  CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all columns into the index, even those in the index predicate:

  CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
  WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Thanks for the reply!

I decided to create a partial index for that query, as it is part of a much bigger one and it is run at all the time.

Patrick