Re: which is better: using OR clauses or UNION?

Поиск
Список
Период
Сортировка
От Igor Neyman
Тема Re: which is better: using OR clauses or UNION?
Дата
Msg-id F4C27E77F7A33E4CA98C19A9DC6722A207F18A52@EXCHANGE.corp.perceptron.com
обсуждение исходный текст
Ответ на Re: which is better: using OR clauses or UNION?  (adam_pgsql <adam_pgsql@witneyweb.org>)
Список pgsql-sql
> -----Original Message-----
> From: adam_pgsql [mailto:adam_pgsql@witneyweb.org]
> Sent: Tuesday, August 16, 2011 11:56 AM
> To: Tom Lane
> Cc: pgsql-sql
> Subject: Re: which is better: using OR clauses or UNION?
>
>
> On 16 Aug 2011, at 15:09, Tom Lane wrote:
>
> > adam_pgsql <adam_pgsql@witneyweb.org> writes:
> >> I have a query hitting a table of 25 million rows. The table has a
> >> text field ('identifier') which i need to query for matching rows.
> The
> >> question is if i have multiple strings to match against this field
I
> >> can use multiple OR sub-statements or multiple statements in a
> >> UNION. The UNION seems to run quicker.... is this to be expected?
> >
> > Your test cases don't seem exactly comparable; in particular I think
> the
> > second one is benefiting from the first one having already read and
> > cached the relevant disk blocks.  Notice how you've got, eg,
> >
> >>              ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347
> rows=318 loops=1)
> >>                    Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> >
> > versus
> >
> >>                          ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.178..0.178 rows=318 loops=1)
> >>                                Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
> >
> > Those are the exact same subplan, so any honest comparison should be
> > finding them to take the same amount of time.  When the actual
> readings
> > are different by a factor of several hundred, there's something
wrong
> > with your measurement process.
> >
> > In the end this comes down to whether duplicates will be eliminated
> more
> > efficiently by a BitmapOr step or by sort/uniq on the resulting
rows.
> > I'd have to bet on the BitmapOr myself, but it's likely that this is
> > down in the noise compared to the actual disk accesses in any
> > not-fully-cached scenario.  Also, if you don't expect the sub-
> statements
> > to yield any duplicates, or don't care about seeing the same row
> twice
> > in the output, you should consider UNION ALL instead of UNION.
>
>
> Thanks guys, I'll give some of those options a try and see which ones
> improve performance
>
> (Tom, yes i ran those queries after each other so there was caching
> going on. However, I had noticed a difference in performance when
> spacing the queries before and after a few other big queries to help
> clear the cache).
>
> adam

Adam,

Did you verify that your cache is "cleared"? Like using pg_buffercache
contrib. module?
Besides, there is also OS cache...

Regards,
Igor Neyman



В списке pgsql-sql по дате отправления:

Предыдущее
От: Jaime Casanova
Дата:
Сообщение: Re: parsing audit table
Следующее
От: "Kidd, David M"
Дата:
Сообщение: Cursor names in a self-nested function