Re: Planning large IN lists
От | Atul Deopujari |
---|---|
Тема | Re: Planning large IN lists |
Дата | |
Msg-id | 464C1CA0.1060705@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Planning large IN lists (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Planning large IN lists
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Planning large IN lists (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
Hi, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: >> When planning queries with a large IN expression in the WHERE clause, >> the planner transforms the IN list into a scalar array expression. In >> clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr >> by calling scalararraysel(), which in turn estimates the selectivity of >> *each* array element in order to determine the selectivity of the array >> expression as a whole. > >> This is quite inefficient when the IN list is large. > > That's the least of the problems. We really ought to convert such cases > into an IN (VALUES(...)) type of query, since often repeated indexscans > aren't the best implementation. > I thought of giving this a shot and while I was working on it, it occurred to me that we need to decide on a threshold value of the IN list size above which such transformation should take place. For small sizes of the IN list, scalararraysel() of IN list wins over the hash join involved in IN (VALUES(...)). But for larger sizes of the IN list, IN (VALUES(...)) comes out to be a clear winner. I would like to know what does the community think should be a heuristic value of the IN list size beyond which this transformation should take place. I was thinking of a GUC variable (or a hard coded value) which defaults to say 30. This is based on numbers from the following test: postgres=# create table w (w text); CREATE TABLE postgres=# \copy w from '/usr/share/dict/words' And run the following query with different IN list sizes explain analyze select * from w where w in ('one', 'two', ...); I got the following runtimes: ------------------------------------ IN list IN (VALUES(...)) IN size ------------------------------------ 150 ~2000 ms ~5500 ms 100 ~1500 ms ~4000 ms 80 ~1400 ms ~3000 ms 50 ~1400 ms ~2500 ms 30 ~1500 ms ~1500 ms 20 ~1400 ms ~1200 ms 10 ~1400 ms ~1200 ms ------------------------------------ The IN (VALUES(...)) gives an almost steady state behavior, while the IN runtimes deteriorate with growing list size. There would obviously be different conditions on which to base this value. I seek community opinion on this. -- Atul EnterpriseDB www.enterprisedb.com
В списке pgsql-hackers по дате отправления: