Re: which is better: using OR clauses or UNION?
От | Igor Neyman |
---|---|
Тема | Re: which is better: using OR clauses or UNION? |
Дата | |
Msg-id | F4C27E77F7A33E4CA98C19A9DC6722A207F18A52@EXCHANGE.corp.perceptron.com обсуждение исходный текст |
Ответ на | Re: which is better: using OR clauses or UNION? (adam_pgsql <adam_pgsql@witneyweb.org>) |
Список | pgsql-sql |
> -----Original Message----- > From: adam_pgsql [mailto:adam_pgsql@witneyweb.org] > Sent: Tuesday, August 16, 2011 11:56 AM > To: Tom Lane > Cc: pgsql-sql > Subject: Re: which is better: using OR clauses or UNION? > > > On 16 Aug 2011, at 15:09, Tom Lane wrote: > > > adam_pgsql <adam_pgsql@witneyweb.org> writes: > >> I have a query hitting a table of 25 million rows. The table has a > >> text field ('identifier') which i need to query for matching rows. > The > >> question is if i have multiple strings to match against this field I > >> can use multiple OR sub-statements or multiple statements in a > >> UNION. The UNION seems to run quicker.... is this to be expected? > > > > Your test cases don't seem exactly comparable; in particular I think > the > > second one is benefiting from the first one having already read and > > cached the relevant disk blocks. Notice how you've got, eg, > > > >> -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 > rows=318 loops=1) > >> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 > (6f24)'::character varying) > > > > versus > > > >> -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (actual time=0.178..0.178 rows=318 loops=1) > >> Index Cond: (lower(identifier) ~=~ > 'sptigr4-2210 (6f24)'::character varying) > > > > Those are the exact same subplan, so any honest comparison should be > > finding them to take the same amount of time. When the actual > readings > > are different by a factor of several hundred, there's something wrong > > with your measurement process. > > > > In the end this comes down to whether duplicates will be eliminated > more > > efficiently by a BitmapOr step or by sort/uniq on the resulting rows. > > I'd have to bet on the BitmapOr myself, but it's likely that this is > > down in the noise compared to the actual disk accesses in any > > not-fully-cached scenario. Also, if you don't expect the sub- > statements > > to yield any duplicates, or don't care about seeing the same row > twice > > in the output, you should consider UNION ALL instead of UNION. > > > Thanks guys, I'll give some of those options a try and see which ones > improve performance > > (Tom, yes i ran those queries after each other so there was caching > going on. However, I had noticed a difference in performance when > spacing the queries before and after a few other big queries to help > clear the cache). > > adam Adam, Did you verify that your cache is "cleared"? Like using pg_buffercache contrib. module? Besides, there is also OS cache... Regards, Igor Neyman
В списке pgsql-sql по дате отправления: