Обсуждение: Query optimization problem

Поиск
Список
Период
Сортировка

Query optimization problem

От
Zotov
Дата:
I have a query:

  SELECT d1.ID, d2.ID
  FROM DocPrimary d1
    JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763)

i think what QO(Query Optimizer) can make it faster (now it seq scan and on
million records works 7 sec)

  SELECT d1.ID, d2.ID
  FROM DocPrimary d1
    JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d2.BasedOn=234409763) or (d2.ID=234409763)


  ----------------------
  Slow Query
  ----------------------
  test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT d1.ID,
  d2.ID
  test-# FROM DocPrimary d1
  test-#   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  test-# WHERE (d1.ID=234409763) or (d2.ID=234409763);
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
    Hash Join  (cost=58.15..132.35 rows=2 width=8) (actual time=0.007..0.007
  rows=0 loops=1)
      Output: d1.id, d2.id
      Hash Cond: (d2.basedon = d1.id)
      Join Filter: ((d1.id = 234409763) OR (d2.id = 234409763))
      ->   Seq Scan on public.docprimary d2  (cost=0.00..31.40 rows=2140
  width=8) (actual time=0.002..0.002 rows=0 loops=1)
            Output: d2.id, d2.basedon
      ->   Hash  (cost=31.40..31.40 rows=2140 width=4) (never executed)
            Output: d1.id
            ->   Seq Scan on public.docprimary d1  (cost=0.00..31.40 rows=2140
  width=4) (never executed)
                  Output: d1.id

  ------------------
  Fast Query
  ------------------
  test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT d1.ID,
  d2.ID
  test-# FROM DocPrimary d1
  test-#   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  test-# WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
    Nested Loop  (cost=8.60..58.67 rows=12 width=8) (actual time=0.026..0.026
  rows=0 loops=1)
      Output: d1.id, d2.id
      ->   Bitmap Heap Scan on public.docprimary d2  (cost=8.60..19.31 rows=12
  width=8) (actual time=0.023..0.023 rows=0 loops=1)
            Output: d2.id, d2.basedon
            Recheck Cond: ((d2.basedon = 234409763) OR (d2.id = 234409763))
            ->   BitmapOr  (cost=8.60..8.60 rows=12 width=0) (actual
  time=0.018..0.018 rows=0 loops=1)
                  ->   Bitmap Index Scan on basedon_idx  (cost=0.00..4.33
  rows=11 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                        Index Cond: (d2.basedon = 234409763)
                  ->   Bitmap Index Scan on id_pk  (cost=0.00..4.26 rows=1
  width=0) (actual time=0.003..0.003 rows=0 loops=1)
                        Index Cond: (d2.id = 234409763)
      ->   Index Scan using id_pk on public.docprimary d1  (cost=0.00..3.27
  rows=1 width=4) (never executed)
            Output: d1.id, d1.basedon
            Index Cond: (d1.id = d2.basedon)


--------------------------------------------
PGver: PostgreSQL 9.0b x86
OS: Win7 x64

---------------------
Create table query:
---------------------

CREATE TABLE docprimary
(
   id integer NOT NULL,
   basedon integer,
   CONSTRAINT id_pk PRIMARY KEY (id)
);
CREATE INDEX basedon_idx
   ON docprimary
   USING btree
   (basedon);


Re: Query optimization problem

От
Yeb Havinga
Дата:
Hello Zotov,

Somehow the equivalence d2.basedon=d1.id is not used in the slow query,
probably because the equivalence constant value would be used inside a
not-base expression (the OR). You can see that the equivalence values
*are* used by changing the or to an and and compare both queries. The
only thing you can do to guarantee the planner has all information to in
cases like this it explicitly name the equivalence inside OR
expressions, e.g.

SELECT d1.ID, d2.ID
 FROM DocPrimary d1
   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
 WHERE (d1.ID=234409763 and d2.basedon=234409763) or (d2.ID=234409763) ;

regards,
Yeb Havinga

PS: the analyze time of the slow query showed 0.007ms?

Zotov wrote:
> I have a query:
>
>  SELECT d1.ID, d2.ID
>  FROM DocPrimary d1
>    JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>  WHERE (d1.ID=234409763) or (d2.ID=234409763)
>
> i think what QO(Query Optimizer) can make it faster (now it seq scan
> and on
> million records works 7 sec)
>
>  SELECT d1.ID, d2.ID
>  FROM DocPrimary d1
>    JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>  WHERE (d2.BasedOn=234409763) or (d2.ID=234409763)
>
>
>  ----------------------
>  Slow Query
>  ----------------------
>  test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
> d1.ID,
>  d2.ID
>  test-# FROM DocPrimary d1
>  test-#   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>  test-# WHERE (d1.ID=234409763) or (d2.ID=234409763);
>                                                          QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------

>
>    Hash Join  (cost=58.15..132.35 rows=2 width=8) (actual
> time=0.007..0.007
>  rows=0 loops=1)
>      Output: d1.id, d2.id
>      Hash Cond: (d2.basedon = d1.id)
>      Join Filter: ((d1.id = 234409763) OR (d2.id = 234409763))
>      ->   Seq Scan on public.docprimary d2  (cost=0.00..31.40 rows=2140
>  width=8) (actual time=0.002..0.002 rows=0 loops=1)
>            Output: d2.id, d2.basedon
>      ->   Hash  (cost=31.40..31.40 rows=2140 width=4) (never executed)
>            Output: d1.id
>            ->   Seq Scan on public.docprimary d1  (cost=0.00..31.40
> rows=2140
>  width=4) (never executed)
>                  Output: d1.id
>
>  ------------------
>  Fast Query
>  ------------------
>  test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
> d1.ID,
>  d2.ID
>  test-# FROM DocPrimary d1
>  test-#   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>  test-# WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);
>                                                              QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------

>
>    Nested Loop  (cost=8.60..58.67 rows=12 width=8) (actual
> time=0.026..0.026
>  rows=0 loops=1)
>      Output: d1.id, d2.id
>      ->   Bitmap Heap Scan on public.docprimary d2  (cost=8.60..19.31
> rows=12
>  width=8) (actual time=0.023..0.023 rows=0 loops=1)
>            Output: d2.id, d2.basedon
>            Recheck Cond: ((d2.basedon = 234409763) OR (d2.id =
> 234409763))
>            ->   BitmapOr  (cost=8.60..8.60 rows=12 width=0) (actual
>  time=0.018..0.018 rows=0 loops=1)
>                  ->   Bitmap Index Scan on basedon_idx  (cost=0.00..4.33
>  rows=11 width=0) (actual time=0.008..0.008 rows=0 loops=1)
>                        Index Cond: (d2.basedon = 234409763)
>                  ->   Bitmap Index Scan on id_pk  (cost=0.00..4.26 rows=1
>  width=0) (actual time=0.003..0.003 rows=0 loops=1)
>                        Index Cond: (d2.id = 234409763)
>      ->   Index Scan using id_pk on public.docprimary d1
> (cost=0.00..3.27
>  rows=1 width=4) (never executed)
>            Output: d1.id, d1.basedon
>            Index Cond: (d1.id = d2.basedon)
>
>
> --------------------------------------------
> PGver: PostgreSQL 9.0b x86
> OS: Win7 x64
>
> ---------------------
> Create table query:
> ---------------------
>
> CREATE TABLE docprimary
> (
>   id integer NOT NULL,
>   basedon integer,
>   CONSTRAINT id_pk PRIMARY KEY (id)
> );
> CREATE INDEX basedon_idx
>   ON docprimary
>   USING btree
>   (basedon);
>
>