Re: Planning large IN lists

Поиск

Список

Период

Сортировка

От	Atul Deopujari
Тема	Re: Planning large IN lists
Дата	17 мая 2007 г. 06:15:07
Msg-id	464C1CA0.1060705@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Planning large IN lists (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Planning large IN lists Re: Planning large IN lists
Список	pgsql-hackers

Дерево обсуждения

Hi,

Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
>> When planning queries with a large IN expression in the WHERE clause,
>> the planner transforms the IN list into a scalar array expression. In
>> clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr
>> by calling scalararraysel(), which in turn estimates the selectivity of
>> *each* array element in order to determine the selectivity of the array
>> expression as a whole.
> 
>> This is quite inefficient when the IN list is large.
> 
> That's the least of the problems.  We really ought to convert such cases
> into an IN (VALUES(...)) type of query, since often repeated indexscans
> aren't the best implementation.
> 
I thought of giving this a shot and while I was working on it, it 
occurred to me that we need to decide on a threshold value of the IN 
list size above which such transformation should take place. For small 
sizes of the IN list, scalararraysel() of IN list wins over the hash 
join involved in IN (VALUES(...)). But for larger sizes of the IN list, 
IN (VALUES(...)) comes out to be a clear winner.
I would like to know what does the community think should be a heuristic 
value of the IN list size beyond which this transformation should take 
place.
I was thinking of a GUC variable (or a hard coded value) which defaults 
to say 30. This is based on numbers from the following test:

postgres=# create table w (w text);
CREATE TABLE

postgres=# \copy w from '/usr/share/dict/words'

And run the following query with different IN list sizes
explain analyze select * from w where w in ('one', 'two', ...);

I got the following runtimes:
------------------------------------
IN list  IN (VALUES(...))     IN
size
------------------------------------
150     ~2000 ms           ~5500 ms
100     ~1500 ms           ~4000 ms
80      ~1400 ms           ~3000 ms
50      ~1400 ms           ~2500 ms
30      ~1500 ms           ~1500 ms
20      ~1400 ms           ~1200 ms
10      ~1400 ms           ~1200 ms
------------------------------------

The IN (VALUES(...)) gives an almost steady state behavior, while the IN  runtimes deteriorate with growing list size.

There would obviously be different conditions on which to base this 
value. I seek community opinion on this.

-- 
Atul

EnterpriseDB
www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Planning large IN lists