Re: from_collapse_limit vs. geqo_threshold

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: from_collapse_limit vs. geqo_threshold
Дата
Msg-id 603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com
обсуждение исходный текст
Ответ на Re: from_collapse_limit vs. geqo_threshold  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: from_collapse_limit vs. geqo_threshold  (Selena Deckelmann <selena@endpoint.com>)
Список pgsql-hackers
On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now I'm still not exactly happy with GEQO, but it's surely a lot better
> than it was in the fall of 2000.  So on the whole it does seem that the
> current relationships between from_collapse_limit, join_collapse_limit,
> and geqo_threshold are based on obsolete information and should be
> revisited.  I don't have any data at hand to suggest specific new
> default values, though.

For 8.4, I'd be happy to just improve the documentation.  I think this
sentence could just be deleted from the section on
from_collapse_limit:

It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">.

We could put some other explanation in place of that sentence, but I'm
not exactly sure what that explanation would say.  I guess the point
is that setting from_collapse_limit < geqo_threshold may delay GEQO
planning considerably in the face of complex subqueries, because
pulling up subqueries increases the size of the FROM list (I think).
That could be good if you want your query plans to be more
deterministic, but there's no guarantee they'll be good. Setting
from_collapse_limit > geqo_threshold is basically saying that the
standard planner will always have subqueries pulled up, so
from_collapse_limit should be based on what the pain point will be for
GEQO.

I'm not sure there's a lot of point in spelling all that out, though.
It more or less follows from the definition of the parameters.  So,
I'd be just as happy to delete the misleading hint and call it good.
But I could go either way.

For 8.5, it sounds like we need to do some testing to determine an
appropriate set of values, but I'm not exactly sure what to test.   As
a practical matter, the correct level of effort depends a lot on how
long the query figures to run.  For OLAP queries, planning times of
more than 50 ms or so start to add noticeably to the overall runtime
of the query, but if the query is expected to run for several minutes,
we'd presumably be happy to spend several seconds planning it, which
might make it feasible to use the standard planner even for very, very
big queries.

I'm not 100% convinced of the value of join_collapse_limit for
anything other than explicit control over the join order.  I have yet
to meet a PostgreSQL who thought that it was intuitive that it might
matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3
[etc] or A, B, C, D, [etc] WHERE P1, P2, P3.  I suspect there are many
people who, if they knew that the latter might optimize better than
the former in some circumstances, would simply always write it in the
latter fashion, which makes the whole thing look a lot like a
concealed foot-gun, since whether or not it actually protects you
against exponential planning-time growth has a lot to do with how you
happen to like to write your queries (myself, I've switched styles in
the last few years).

...Robert


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gevik Babakhani
Дата:
Сообщение: Re: usability of pg_get_function_arguments
Следующее
От: Robert Haas
Дата:
Сообщение: Re: generic options for explain