Re: which is better: using OR clauses or UNION?

Поиск
Список
Период
Сортировка
От Igor Neyman
Тема Re: which is better: using OR clauses or UNION?
Дата
Msg-id F4C27E77F7A33E4CA98C19A9DC6722A207F1851F@EXCHANGE.corp.perceptron.com
обсуждение исходный текст
Ответ на which is better: using OR clauses or UNION?  (adam_pgsql <adam_pgsql@witneyweb.org>)
Список pgsql-sql
> -----Original Message-----
> From: adam_pgsql [mailto:adam_pgsql@witneyweb.org]
> Sent: Tuesday, August 16, 2011 7:39 AM
> To: pgsql-sql
> Subject: which is better: using OR clauses or UNION?
>
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a
text
> field ('identifier') which i need to query for matching rows. The
> question is if i have multiple strings to match against this field I
> can use multiple OR sub-statements or multiple statements in a UNION.
> The UNION seems to run quicker.... is this to be expected? or is there
> anything else I can do improve the speed of this query? Some query
> details:
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier)
> varchar_pattern_ops)
>
>
> Query 1
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
>      dba_data_base a
>  WHERE
>      ( lower(identifier) LIKE lower('BUGS0000001884677') OR
>        lower(identifier) LIKE lower('BUGS0000001884678') OR
>        lower(identifier) LIKE lower('BUGS0000001884679') OR
>        lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;
>
> QUERY PLAN
>
>
-----------------------------------------------------------------------
>
-----------------------------------------------------------------------
> ----------------------------------------------------------
> --------------------------------
> Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
>   Sort Key: identifier
>   ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38
> rows=3857 width=62) (actual time=81.106..3207.721 rows=318 loops=1)
>         Recheck Cond: ((lower(identifier) ~~
'bugs0000001884677'::text)
> OR (lower(identifier) ~~ 'bugs0000001884678'::text) OR
> (lower(identifier) ~~ 'bugs0000001884679'::text) OR (lower(identifier)
> ~
> ~ 'sptigr4-2210 (6f24)'::text))
>         Filter: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR
> (lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier)
> ~~ 'bugs0000001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
>         ->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual
> time=71.397..71.397 rows=0 loops=1)
>               ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0
> loops=1)
>                     Index Cond: (lower(identifier) ~=~
> 'bugs0000001884677'::character varying)
>               ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>                     Index Cond: (lower(identifier) ~=~
> 'bugs0000001884678'::character varying)
>               ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>                     Index Cond: (lower(identifier) ~=~
> 'bugs0000001884679'::character varying)
>               ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347
> rows=318 loops=1)
>                     Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
>      dba_data_base a
>  WHERE
>      lower(identifier) LIKE lower('BUGS0000001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>      dba_data_base a
>  WHERE
>        lower(identifier) LIKE lower('BUGS0000001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>      dba_data_base a
>  WHERE
>        lower(identifier) LIKE lower('BUGS0000001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>      dba_data_base a
>  WHERE
>        lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
>  ORDER BY identifier;
>
> Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual
> time=3.688..3.886 rows=317 loops=1)
>   Sort Key: identifier
>   ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
>         ->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.660..2.834 rows=318 loops=1)
>               Sort Key: bioassay_id, identifier, ratio, log_ratio,
> p_value
>               ->  Append  (cost=32.88..15185.06 rows=3856 width=62)
> (actual time=0.320..2.131 rows=318 loops=1)
>                     ->  Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041
> rows=0 loops=1)
>                           Filter: (lower(identifier) ~~
> 'bugs0000001884677'::text)
>                           ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.036..0.036 rows=0 loops=1)
>                                 Index Cond: (lower(identifier) ~=~
> 'bugs0000001884677'::character varying)
>                     ->  Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010
> rows=0 loops=1)
>                           Filter: (lower(identifier) ~~
> 'bugs0000001884678'::text)
>                           ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.008..0.008 rows=0 loops=1)
>                                 Index Cond: (lower(identifier) ~=~
> 'bugs0000001884678'::character varying)
>                     ->  Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010
> rows=0 loops=1)
>                           Filter: (lower(identifier) ~~
> 'bugs0000001884679'::text)
>                           ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.008..0.008 rows=0 loops=1)
>                                 Index Cond: (lower(identifier) ~=~
> 'bugs0000001884679'::character varying)
>                     ->  Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.255..1.676
> rows=318 loops=1)
>                           Filter: (lower(identifier) ~~ 'sptigr4-2210
> (6f24)'::text)
>                           ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.178..0.178 rows=318 loops=1)
>                                 Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
> Total runtime: 4.174 ms
>
> Also which should scale better if I add more strings to match? would
> there be any better design patterns for this problem?
>
> Thanks for any help
>
> Adam
>
> select version();
>                            version
> ----------------------------------------------------------------
> PostgreSQL 8.2.12 on i686-pc-linux-gnu, compiled by GCC 2.95.4
>

Adam,

There is something strange in your 2 execution plans.
Exactly the same operation:

-- first plan Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64
rows=964 width=0) (actual time=71.347..71.347 rows=318 loops=1)                   Index Cond: (lower(identifier) ~=~
'sptigr4-2210
(6f24)'::character varying)
-- second plan
Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64
rows=964 width=0) (actual time=0.178..0.178 rows=318 loops=1)                   Index Cond: (lower(identifier) ~=~
'sptigr4-2210
(6f24)'::character varying)

takes quite different time: 71.347 versus 0.178 and basically makes all
the difference between duration of your first and second statement.

I think, what you are seeing here is data being cached in memory (when
you executed "union" statement after "or" statement).

Other than that, looking at 2 execution plans, I'd say that in general
"or" should run faster than "union", at least because it does "Bitmap
Heap Scan on dba_data_base" only once, while "union" statement does this
heap scan 4 times (once per "unionized" select).

HTH,
Igor Neyman



В списке pgsql-sql по дате отправления:

Предыдущее
От: adam_pgsql
Дата:
Сообщение: Re: which is better: using OR clauses or UNION?
Следующее
От: "M. D."
Дата:
Сообщение: parsing audit table