Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)
Дата
Msg-id 5684A1D2.1000601@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers

On 12/30/2015 10:30 PM, David Rowley wrote:
> On 31 December 2015 at 01:24, Tomas Vondra <tomas.vondra@2ndquadrant.com
> <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
> Personally I'd like to see automatic plan caching occur in
> PostgreSQL. There is a bit of a problem with it in regards to a query
> such as: select * from t where mySkewedColumn = 1; where the value 1
> appears 99% of the time. Initially we may plan to seqscan, where with
> other values we'd likely prefer to index scan. I imagine with my
> unique joins patch, that it could be expanded to test
> baserestrictinfos to see if they contain quals with a unique index.
> This knowledge could later permit plan caching to occur on queries
> which are safe from having any skew in the results. It might sound
> rather restrictive, but likely this would cover 99% of UPDATE and
> DELETE operations in an OLTP workload, and likely a good deal of the
> SELECTs too. The safety of caching could be analyzed during planning,
> and a flag could be set somewhere, perhaps in PlannedStmt to mark if
> the plan is safe to cache. The planner() function could initially
> hash the query string and check if any cached plan exists with that
> hash, if not, plan the statement, and then check if the "safeToCache"
> flag is set, and if so, stick that plan into a hash table. Also plans
> with no baserestrictinfos could be "safeToCache" too.

Yeah, that's what I meant by "non-trivial". I don't know a good approach 
to this problem, or if such a "good" approach even exists, but I'd say 
being able to decide whether to rebuild a plan in such cases is a 
"must-have" feature. Otherwise we could easily loose any gains from the 
more thorough optimization because of poor plans.

In other words, we'd have to come up with a way to decide whether to use 
the same plan as before, or try building another plan (for the same 
query with different parameter values). I can think of two approaches:

(1) Try to measure how "different" are the parameter values used in the    new query, compared to the existing plan(s).
Thisprobably means    difference in terms of probability / frequencies etc.
 

(2) Compute the cost of the existing plan for the new parameters. I.e.    don't perform the whole optimization, just
thecosting for the    single plan. If the costs are "close" then use the existing plan.
 

Of course, none of this is trivial and still may fail for some cases.

I wonder what the other databases do, or if there are papers about this 
topic (I'd guess there are, but I haven't looked too much).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: dynloader.h missing in prebuilt package for Windows?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: dynloader.h missing in prebuilt package for Windows?