Re: [HACKERS] Removing LEFT JOINs in more cases

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Removing LEFT JOINs in more cases
Дата
Msg-id 7760.1515527095@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
Re: [HACKERS] Removing LEFT JOINs in more cases  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ remove_left_join_distinct_2017-11-30.patch ]

I looked at this, and it seems like an interesting idea, but I'm
unconvinced that the specific conditions here are OK.

One clear problem is that you consider a DISTINCT to be an automatic
get-out-of-jail-free card, but that doesn't work in the presence of
aggregates:

     select distinct count(*) from a left join b on ...

So at the very least it needs to be more like

    if ((root->parse->distinctClause != NIL ||
         root->parse->groupClause != NIL) && !root->parse->hasAggs)

I believe it is probably necessary to reject this optimization if window
functions are present, as well, because DISTINCT would not happen till
after the wfunc calculation and wfuncs are definitely sensitive to the
number of rows they see.  (But maybe wfuncs + GROUP BY are OK.)

Another case that seems less than safe is if the proposed-to-be-dropped
left join is within the LHS of any LATERAL join.  For instance,

    select distinct ... from (a left join b ...), lateral nextval('foo');

The row duplication from b would affect how often nextval gets called,
and the presence of the DISTINCT later on doesn't entitle us to change
that.  (Maybe it'd be all right to perform the optimization if the
lateral function is known stable/immutable, but I think that's getting
too far afield; I'd be inclined to just drop the idea as soon as we
see a lateral dependency.)

Actually, that same case occurs for volatile functions in the tlist,
ie

    select distinct nextval('foo') from a left join b ...

The presence of the DISTINCT again doesn't excuse changing how often
nextval() gets called.

I kinda doubt this list of counterexamples is exhaustive, either;
it's just what occurred to me in five or ten minutes thought.
So maybe you can make this idea work, but you need to think much
harder about what the counterexamples are.

As far as code structure goes, it's surely not going to be sane to
have two copies of the checking logic, much less try to cram one
of them into a single if.  I'd suggest factoring it out into a
separate function that can be called from both places.

I'll set the patch back to Waiting on Author.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: to_timestamp TZH and TZM format specifiers
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Challenges preventing us moving to 64 bit transaction id (XID)?