Re: slow IN() clause for many cases
От | Tom Lane |
---|---|
Тема | Re: slow IN() clause for many cases |
Дата | |
Msg-id | 7976.1129330510@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: slow IN() clause for many cases (Andrew - Supernews <andrew+nonews@supernews.com>) |
Ответы |
Re: slow IN() clause for many cases
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: slow IN() clause for many cases (Greg Stark <gsstark@mit.edu>) |
Список | pgsql-hackers |
Andrew - Supernews <andrew+nonews@supernews.com> writes: > With everything in cache, selecting 1000 random rows from a 200k row table, > I get: > > for IN (list): planning time 47ms, execution time 27ms > array/nestloop: planning time 8ms, execution time 34ms The reason for the slow planning time is that IN (list) is currently converted (by the grammar) into an OR structure: (x = val1) OR (x = val2) OR (x = val3) OR (x = val4) OR ... which means that the planner has to consider each subclause separately: discover that it can be matched to an index, build the indexscan plan, etc. Even just looking up all the "=" operators during parsing takes a fair amount of time :-( The large number of plan nodes also imply relatively slow executor startup. It strikes me that now that we have the bitmap indexscan mechanism, it wouldn't be hard to do better. I'm thinking that IN should be converted to a ScalarArrayOpExpr, ie x = ANY (ARRAY[val1,val2,val3,val4,...]) and then we could treat both OpExpr and ScalarArrayOpExpr as matching an index when the LHS is the index key and the operator is in the index's opclass. This wouldn't fit comfortably into a plain indexscan, but a bitmap indexscan has already got the mechanism for ORing together the results of several primitive indexscans (one per array element). You just do the scans and insert all the results into your output bitmap. This approach would reduce the planner and executor-startup costs of even a long IN list to be pretty comparable to a single index key comparison. The actual runtime of the plan wouldn't change much, I think, but it's the overhead that's hurting us here. This also solves the problem mentioned earlier in the thread that you can't make a prepared plan for varying lengths of IN-lists: instead of using IN, you do something likex = ANY($1::int[]) and then ship your lookup keys as a single array parameter instead of multiple scalars. The main objection I can see is that this technique requires the elements of the IN list to be converted to a common datatype (ie, the element datatype of the ARRAY construct). Historically our code has made no such assumption. However, I believe that the SQL standard expects that conversion to happen: "x IN (list)" is defined as "x = ANY (VALUES list)" and <table value constructor> expects all its rows to be converted to a common rowtype. So we could point to the standard if anyone complains that the behavior changed. Obviously this is not happening for 8.1, but it seems very do-able for 8.2. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: