Re: Changing SQL Inlining Behaviour (or...?)

Поиск
Список
Период
Сортировка
От Darafei "Komяpa" Praliaskouski
Тема Re: Changing SQL Inlining Behaviour (or...?)
Дата
Msg-id CAC8Q8tLjt0td4Jpt672T-RQ+UU-s2rxUVRWH4p8OJOn85hHUYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Changing SQL Inlining Behaviour (or...?)  (Paul Ramsey <pramsey@cleverelephant.ca>)
Список pgsql-hackers

At pgconf.eu, I canvassed this problem and some potential solutions:


I wonder if there is a middle ground between #2 and #3. A proper mechanism for deduplicating entries might be hard, but on the inlining stage we already know they're going to get duplicated. Can we make a subplan/lateral join/whatever for arguments and deduplicate just them, as we know about that duplication from structure already?

Another thing I see here is PostGIS using indexes. Can we just always inline if we see an index-accelerated operator (or just an operator) on top level AND inside inlined function?

 
* Solution #1 - Quick and dirty and visible: Add an 'INLINE' function decorator, which tells PostgreSQL to just ignore costs and inline the function regardless. Pros: it's not too hard to implement and I'm happy to contribute this. Cons: it adds very specific single-purpose syntax to CREATE FUNCTION.

* Solution #2 - Quick and dirty and invisible. Tom suggested a hack that achieves the aims of #1 but without adding syntax to CREATE FUNCTION: have the inlining logic look at the cost of the wrapper and the cost of parameters, and if the cost of the wrapper "greatly exceeded" the cost of the parameters, then inline. So the PostGIS project would just set the cost of our wrappers very high, and we'd get the behaviour we want, while other users who want to use wrappers to force caching of calculations would have zero coded wrapper functions.  Pros: Solves the problem and easy to implement, I'm happy to contribute. Cons: it's so clearly a hack involving hidden (from users) magic.

* Solution #3 - Correct and globally helpful. When first presented with this problem last year, both Andres and Tom said [1] "but the right fix is to avoid the double-calculation of identical entries in the target list" because then it would be safe to inline functions with duplicate expensive parameters. This would not only address the proximate PostGIS problem but make a whole class of queries faster. There was some discussion of this approach last week [2]. Pros: The right thing! Improves a whole pile of other performance cases. Cons: Hard! Only experienced PgSQL developers need apply.

Naturally, I would love to see #3 implemented, but there's only so much experienced developer time to go around, and it's beyond my current skill set. I would like to be able to start to improve PostGIS parallelism with PgSQL 12, so in order to make that not impossible, I'd like to implement either #1 or #2 in case #3 doesn't happen for PgSQL 12. 

So my question to hackers is: which is less worse, #1 or #2, to implement and submit to commitfest, in case #3 does not materialize in time for PgSQL 12?

Absent any preferences, I would be inclined to go with #2, having a high personal tolerance for ugly hacks... :)

P
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: simple query profile and tracing API
Следующее
От: Ronan Dunklau
Дата:
Сообщение: Parametrization and UNION in view