Changing SQL Inlining Behaviour (or...?)

Поиск
Список
Период
Сортировка
От Paul Ramsey
Тема Changing SQL Inlining Behaviour (or...?)
Дата
Msg-id CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=A@mail.gmail.com
обсуждение исходный текст
Ответы Re: Changing SQL Inlining Behaviour (or...?)  (Paul Ramsey <pramsey@cleverelephant.ca>)
Re: Changing SQL Inlining Behaviour (or...?)  (Adam Brightwell <adam.brightwell@crunchydata.com>)
Список pgsql-hackers
As I promised at PgConf.eu, here's a follow-up email about PostGIS parallelism and function inlining (an odd combination, it is true).

So, context:

- We want PostGIS to parallelize more. In order to achieve that we need to mark our functions with more realistic COSTs. Much much higher COSTs.
- When we do that, we hit a different problem. Our most commonly used functions, ST_Intersects(), ST_DWithin() are actually SQL wrapper functions are more complex combinations of index operators and exact computational geometry functions.
- In the presence of high cost parameters that are used multiple times in SQL functions, PostgreSQL will stop inlining those functions, in an attempt to save the costs of double-calculating the parameters.
- For us, that's the wrong choice, because we lose the index operators at the same time as we "save" the cost of double calculation.
- We need our wrapper functions inlined, even when they are carrying a high COST.

At pgconf.eu, I canvassed this problem and some potential solutions:

* Solution #1 - Quick and dirty and visible: Add an 'INLINE' function decorator, which tells PostgreSQL to just ignore costs and inline the function regardless. Pros: it's not too hard to implement and I'm happy to contribute this. Cons: it adds very specific single-purpose syntax to CREATE FUNCTION.

* Solution #2 - Quick and dirty and invisible. Tom suggested a hack that achieves the aims of #1 but without adding syntax to CREATE FUNCTION: have the inlining logic look at the cost of the wrapper and the cost of parameters, and if the cost of the wrapper "greatly exceeded" the cost of the parameters, then inline. So the PostGIS project would just set the cost of our wrappers very high, and we'd get the behaviour we want, while other users who want to use wrappers to force caching of calculations would have zero coded wrapper functions.  Pros: Solves the problem and easy to implement, I'm happy to contribute. Cons: it's so clearly a hack involving hidden (from users) magic.

* Solution #3 - Correct and globally helpful. When first presented with this problem last year, both Andres and Tom said [1] "but the right fix is to avoid the double-calculation of identical entries in the target list" because then it would be safe to inline functions with duplicate expensive parameters. This would not only address the proximate PostGIS problem but make a whole class of queries faster. There was some discussion of this approach last week [2]. Pros: The right thing! Improves a whole pile of other performance cases. Cons: Hard! Only experienced PgSQL developers need apply.

Naturally, I would love to see #3 implemented, but there's only so much experienced developer time to go around, and it's beyond my current skill set. I would like to be able to start to improve PostGIS parallelism with PgSQL 12, so in order to make that not impossible, I'd like to implement either #1 or #2 in case #3 doesn't happen for PgSQL 12. 

So my question to hackers is: which is less worse, #1 or #2, to implement and submit to commitfest, in case #3 does not materialize in time for PgSQL 12?

Thanks!

Paul

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: ALTER DEFAULT PRIVILEGES is buggy, and so is its testing
Следующее
От: Pavel Stehule
Дата:
Сообщение: repeated procedure call error