Re: slow IN() clause for many cases
От | Andrew - Supernews |
---|---|
Тема | Re: slow IN() clause for many cases |
Дата | |
Msg-id | slrndkpmgt.2db7.andrew+nonews@trinity.supernews.net обсуждение исходный текст |
Ответ на | Re: slower merge join on sorted data chosen over (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: slow IN() clause for many cases
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: slow IN() clause for many cases ("Ilia Kantor" <ilia@obnovlenie.ru>) |
Список | pgsql-hackers |
On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote: > When in clause becomes large enough (>20-30 cases), > It is much better to use "join" way of processing.. or even a different way of writing the IN clause. This one is one I've used after considerable research: select * from tablewhere field in (select (some_array_of_N_items)[i] from generate_series(1,N) as s(i)); This generally plans as a nestloop, with a HashAggregate of the function scan (of generate_series) on the outer path, and index lookups on the inner path. It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when comparing queries of this kind. The IN (1,2,...30) form is much slower to plan, and usually can't usefully be used in prepared form (you'd need to prepare it separately for every different number of items); in contrast, the array version can be prepared once and reused. As the number of items in the IN clause increases, the planning time grows rather radically. For example with 1000 items (here stashed in a psql convenience variable for brevity), using 8.1beta2: test=# prepare tstplan1 as select * from test where id in (:v); Time: 4881.702 ms compare: test=# prepare tstplan2 as select * from test where id in (select (ARRAY[:v])[i] from generate_series(1,1000) s(i)); Time: 10.889 ms (on my machine the break-even point for these two is less than 20 items, or even less if the array is passed in as a literal or a single parameter rather than constructed with ARRAY[].) The actual execution time of these two is very close, with the second being about 10% slower on my system (31ms vs 34ms, based on \timing values from psql and averaged over several goes). However, the timings returned from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the "total runtime" line. So not only is the planning time different, but also the instrumentation overhead of EXPLAIN ANALYZE is wildly different between the two forms. What this means is that unless you're going to prepare in advance every possible number of parameters to IN that your app is ever going to use, the only way to get useful performance for IN queries with more than a handful of literal values is to use an array method, in spite of the fact that the bitmap-OR execution plan is actually at least as fast. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
В списке pgsql-hackers по дате отправления: