Re: [PATCH] Equivalence Class Filters

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Equivalence Class Filters
Дата
Msg-id CAKJS1f8RJLCM44+WZP_YdOrg3v2kusaMbb-Bn8AFYuk-9oePOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Equivalence Class Filters  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: [PATCH] Equivalence Class Filters  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: [PATCH] Equivalence Class Filters  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 8 December 2015 at 04:35, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 12/6/15 10:38 AM, Tom Lane wrote:
I said "in most cases".  You can find example cases to support almost any
weird planner optimization no matter how expensive and single-purpose;
but that is the wrong way to think about it.  What you have to think about
is average cases, and in particular, not putting a drag on planning time
in cases where no benefit ensues.  We're not committing any patches that
give one uncommon case an 1100X speedup by penalizing every other query 10%,
or even 1%; especially not when there may be other ways to fix it.

This is a problem that seriously hurts Postgres in data warehousing applications. We can't keep ignoring optimizations that provide even as little as 10% execution improvements for 10x worse planner performance, because in a warehouse it's next to impossible for planning time to matter.

Obviously it'd be great if there was a fast, easy way to figure out whether a query would be expensive enough to go the whole 9 yards on planning it but at this point I suspect a simple GUC would be a big improvement.

I've certainly been here before [1], but the idea fell of deaf ears.

The biggest frustration for me is that the way Tom always seems to argue his point it's as if planning time is roughly the same or more expensive than execution time, and likely that's true in many cases, but I would imagine in more normal cases that execution time is longer, although I've never had the guts to stand up and argued this as I don't have any stats to back me up.

I was talking to Thomas Munro yesterday about this, and he mentioned that it would be quite nice to have some stats on how much time is spent in the planner, vs executor. He came up with the idea of adding a column to pg_stat_statements to record the planning time.

If we could get real statistics on real world cases and we discovered that, for example on average that the total CPU time of planning was only 1% of execution time, then it would certainly make adding 2% overhead onto the planner a bit less of a worry, as this would just be %2 of 1% (0.02%). Such information, if fed back into the community might be able to drive us in the right direction when it comes to deciding what needs to be done to resolve this constant issue with accepting planner improvement patches.

I believe that with parallel query on the horizon for 9.6 that we're now aiming to support bigger OLAP type database than ever before. So if we ignore patches like this one then it appears that we have some conflicting goals in the community as it seems that we're willing to add the brawn, but we're not willing to add the brain. If this is the case then it's a shame, as I think we can have both. So I very much agree on the fact that we must find a way to maintain support and high performance of small OLTP databases too. 

[1] http://www.postgresql.org/message-id/CAApHDvrJrz-0xinyiqTiWs0mFX17GWD2Y8VZ+i92nuZsha8ocw@mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [sqlsmith] Failed to generate plan on lateral subqueries
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.