Re: which is better: using OR clauses or UNION?

Поиск
Список
Период
Сортировка
От adam_pgsql
Тема Re: which is better: using OR clauses or UNION?
Дата
Msg-id DFC8B07C-5D79-4F52-A7F7-1CC3E16627F3@witneyweb.org
обсуждение исходный текст
Ответ на Re: which is better: using OR clauses or UNION?  (Viktor Bojović <viktor.bojovic@gmail.com>)
Список pgsql-sql
Hi Viktor,

thanks for your email, gave that a try, but the

lower(identifier) LIKE lower('BUGS0000001884677') OR

still comes in quicker than the IN approach

thanks

adam



On 16 Aug 2011, at 12:56, Viktor Bojović wrote:

> hi Adam,
> im not sure which is faster/slower but, possibly you can speed it up by using "in" operator
>
> ...where lower(identifier) in (lower('BUGS0000001884677') , lower('BUGS0000001884678'),....);
>
> if you create function based index:
> CREATE INDEX idx_table_lower_text ON table(lower(text_field));
> (taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)
>
>
>
>
> On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql <adam_pgsql@witneyweb.org> wrote:
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a text field ('identifier') which i need to query
formatching rows. The question is if i have multiple strings to match against this field I can use multiple OR
sub-statementsor multiple statements in a UNION. The UNION seems to run quicker.... is this to be expected? or is there
anythingelse I can do improve the speed of this query? Some query details: 
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
>
>
> Query 1
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>     dba_data_base a
>  WHERE
>     ( lower(identifier) LIKE lower('BUGS0000001884677') OR
>       lower(identifier) LIKE lower('BUGS0000001884678') OR
>       lower(identifier) LIKE lower('BUGS0000001884679') OR
>       lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;
>                                                                                                              QUERY
PLAN
>
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> --------------------------------
> Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual time=3208.466..3208.652 rows=318 loops=1)
>  Sort Key: identifier
>  ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857 width=62) (actual time=81.106..3207.721
rows=318loops=1) 
>        Recheck Cond: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR (lower(identifier) ~~
'bugs0000001884678'::text)OR (lower(identifier) ~~ 'bugs0000001884679'::text) OR (lower(identifier) ~ 
> ~ 'sptigr4-2210 (6f24)'::text))
>        Filter: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR (lower(identifier) ~~ 'bugs0000001884678'::text)
OR(lower(identifier) ~~ 'bugs0000001884679'::text) OR (lower(identifier) ~~ 'spt 
> igr4-2210 (6f24)'::text))
>        ->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual time=71.397..71.397 rows=0 loops=1)
>              ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual
time=0.029..0.029rows=0 loops=1) 
>                    Index Cond: (lower(identifier) ~=~ 'bugs0000001884677'::character varying)
>              ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual
time=0.008..0.008rows=0 loops=1) 
>                    Index Cond: (lower(identifier) ~=~ 'bugs0000001884678'::character varying)
>              ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual
time=0.008..0.008rows=0 loops=1) 
>                    Index Cond: (lower(identifier) ~=~ 'bugs0000001884679'::character varying)
>              ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual
time=71.347..71.347rows=318 loops=1) 
>                    Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> -------
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>     dba_data_base a
>  WHERE
>     lower(identifier) LIKE lower('BUGS0000001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>     dba_data_base a
>  WHERE
>       lower(identifier) LIKE lower('BUGS0000001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>     dba_data_base a
>  WHERE
>       lower(identifier) LIKE lower('BUGS0000001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>     dba_data_base a
>  WHERE
>       lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
>  ORDER BY identifier;
>
> Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886 rows=317 loops=1)
>  Sort Key: identifier
>  ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual time=2.663..3.387 rows=317 loops=1)
>        ->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual time=2.660..2.834 rows=318 loops=1)
>              Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
>              ->  Append  (cost=32.88..15185.06 rows=3856 width=62) (actual time=0.320..2.131 rows=318 loops=1)
>                    ->  Bitmap Heap Scan on dba_data_base a  (cost=32.88..3786.62 rows=964 width=62) (actual
time=0.041..0.041rows=0 loops=1) 
>                          Filter: (lower(identifier) ~~ 'bugs0000001884677'::text)
>                          ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
(actualtime=0.036..0.036 rows=0 loops=1) 
>                                Index Cond: (lower(identifier) ~=~ 'bugs0000001884677'::character varying)
>                    ->  Bitmap Heap Scan on dba_data_base a  (cost=32.88..3786.62 rows=964 width=62) (actual
time=0.010..0.010rows=0 loops=1) 
>                          Filter: (lower(identifier) ~~ 'bugs0000001884678'::text)
>                          ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
(actualtime=0.008..0.008 rows=0 loops=1) 
>                                Index Cond: (lower(identifier) ~=~ 'bugs0000001884678'::character varying)
>                    ->  Bitmap Heap Scan on dba_data_base a  (cost=32.88..3786.62 rows=964 width=62) (actual
time=0.010..0.010rows=0 loops=1) 
>                          Filter: (lower(identifier) ~~ 'bugs0000001884679'::text)
>                          ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
(actualtime=0.008..0.008 rows=0 loops=1) 
>                                Index Cond: (lower(identifier) ~=~ 'bugs0000001884679'::character varying)
>                    ->  Bitmap Heap Scan on dba_data_base a  (cost=32.88..3786.62 rows=964 width=62) (actual
time=0.255..1.676rows=318 loops=1) 
>                          Filter: (lower(identifier) ~~ 'sptigr4-2210 (6f24)'::text)
>                          ->  Bitmap Index Scan on in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
(actualtime=0.178..0.178 rows=318 loops=1) 
>                                Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying)
> Total runtime: 4.174 ms
>
> Also which should scale better if I add more strings to match? would there be any better design patterns for this
problem?
>
> Thanks for any help
>
> Adam
>
> select version();
>                           version
> ----------------------------------------------------------------
> PostgreSQL 8.2.12 on i686-pc-linux-gnu, compiled by GCC 2.95.4
>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
>
>
> --
> ---------------------------------------
> Viktor Bojović
> ---------------------------------------
> Wherever I go, Murphy goes with me



В списке pgsql-sql по дате отправления:

Предыдущее
От: Jasen Betts
Дата:
Сообщение: Re: which is better: using OR clauses or UNION?
Следующее
От: Janiv Ratson
Дата:
Сообщение: Re: bigint and unix time