Re: Cached/global query plans, autopreparation

Поиск
Список
Период
Сортировка
От Jorge Solórzano
Тема Re: Cached/global query plans, autopreparation
Дата
Msg-id CA+cVU8PdUD4O7ntwfU2ZASiFTfVvF2BJngYVj9tDY4p+CCLCVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Cached/global query plans, autopreparation  (Shay Rojansky <roji@roji.org>)
Список pgsql-hackers

On Thu, Feb 15, 2018 at 8:00 AM, Shay Rojansky <roji@roji.org> wrote:
I am an author of one of the proposal (autoprepare which is in commit fest now), but I think that sooner or later Postgres has to come to solution with shared DB caches/prepared plans.
Please correct me if I am wrong, but it seems to me that most of all other top DBMSes having something like this.
Such decision can provide a lot of different advantages:
1. Better memory utilization: no need to store the same data N times where N is number of backends and spend time for warming cache.
2. Optimizer can spend more time choosing better plan which then can be used by all clients. Even now time of compilation of some queries several times exceeds time of their execution.
3. It is simpler to add facilities for query plan tuning and maintaining (storing, comparing,...)
4. It make is possible to control size of memory used by caches. Right now catalog cache for DB with hundred thousands and tables and indexes multiplied by hundreds of backends can consume terabytes of memory.
5. Shared caches can simplify invalidation mechanism.
6. Almost all enterprise systems working with Postgres has to use some kind of connection pooling (pgbouncer, pgpool,...). It almost exclude possibility to use prepared statements. Which can slow down performance up to two times.

Just wanted to say I didn't see this email before my previous response, but I agree with all of the above. The last point is particularly important, especially for short-lived connection scenarios, the most typical of which is web.
 
There is just one (but very important) problem which needs to be solved: access to shared cache should be synchronized.
But there are a lot of other shared resources in Postgres (procarray, shared buffers,...). So  I do not think that it is unsolvable problem and that it can cause degrade of performance.

So it seems to be obvious that shared caches/plans can provide a lot of advantages. But it is still not clear to me the value of this advantages for real customers.
Using -M prepared  protocol in pgbench workload can improve speed up to two times. But I have asked real Postgres users in Avito, Yandex, MyOffice and them told me
that on their workloads advantage of prepared statements is about 10%. 10% performance improvement is definitely not a good compensation for rewriting substantial part of Postgres core...

Just wanted to say that I've seen more than 10% improvement in some real-world application when preparation was done properly. Also, I'm assuming that implementing this wouldn't involve "rewriting substantial part of Postgres core", and that even 10% is quite a big gain, especially if it's a transparent/free one as far as the user is concerned (no application changes).

 
​10% of improvement in real-world can be pretty significant​, I ignore how complicated can be to implement this in Postgres core, how about add this to the GSoC 2018 ideas[1]?

[1] https://wiki.postgresql.org/wiki/GSoC_2018

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] why not parallel seq scan for slow functions
Следующее
От: Nikolay Shaplov
Дата:
Сообщение: Re: [PATCH][PROPOSAL] Add enum releation option type