Re: BUG #17964: Missed query planner optimization
От | Mathias Kunter |
---|---|
Тема | Re: BUG #17964: Missed query planner optimization |
Дата | |
Msg-id | a29936a0-e446-d4d1-6bd1-42d7212c26a0@gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17964: Missed query planner optimization (David Rowley <dgrowleyml@gmail.com>) |
Список | pgsql-bugs |
> We'd need a way to determine which is cheaper based on estimated costs Yes, that decision must of course be based on estimated costs, and also the cost estimation must be reasonably accurate. In all examples which I've seen so far, the cost estimations were adequate. > that might be, in my opinion, fairly tricky to do based on the order of > how things currently are done in the query planner. I don't know the details about the current planner implementation, but the planner obviously can already correctly estimate costs for both the IN and ANY(ARRAY()) variants. > I was just trying to show that it's quite a tricky problem to > fix properly. Yes, the tricky part obviously is to decide whether or not an IN clause should be converted to a semi-join, because that decision depends on the other IN clauses of the query as well. So, it's unfortunately not possible to decide independently for each IN clause. As you've already pointed out, doing an exhaustive search of all possible combinations is obviously problematic, since we'd have to check 2^N possible query plans (with N being the number of IN clauses of the query). So, this is certainly NOT the way to go, unless maybe N<4 or so. > Having the planner consider the costs of converting the IN to a semi- > join and converting or not based on cost seems like a worthy goal. It obviously is. As shown, the performance improvement can be gigantic for certain queries. > If you think it's quite an easy project, then we'll welcome > contributions to improve this. I'm not familiar with the current implementation, but my understanding is that this should be quite easy to implement for someone who is currently working on planner-related issues. After all, a reasonable heuristic solution can be as simple as this: 1) Create the query plan for the original query, using the current planner implementation. (That's the status quo.) 2) Create a modified version of the original input query where all of the IN clauses are replaced with ANY(ARRAY()). Create the query plan for that modified query version, using the current planner implementation. 3) Compare the two previously created query plans and execute the one with the lower estimated total cost. If you don't trust the planner's cost estimations for some reason, then only execute the modified version if its estimated cost is SIGNIFICANTLY lower. A simple solution like this would already be good enough to avoid many of the horrifying worst case scenarios, where a query takes minutes when it should actually only take milliseconds. It would of course be possible (and desirable) to come up with a more sophisticated heuristic than the one given above. For example, create a third version of the original query where only those IN clauses are replaced with ANY(ARRAY()) where the estimated number of rows returned by the subselect is below a certain threshold, and estimate costs for that modified query as well. > The pgsql-hackers as a good emailing list to bring up the topic Thank you for the hint. I'll continue the discussion over there. Mathias
В списке pgsql-bugs по дате отправления:
Следующее
От: Jelte FennemaДата:
Сообщение: Re: Internal error with types changes and prepared statements