Re: [HACKERS] Removing useless DISTINCT clauses

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: [HACKERS] Removing useless DISTINCT clauses
Дата	10 января 2018 г. 01:12:17
Msg-id	CAKJS1f8=KepAZ-UOYnqRfAdwOKWEvt=3vbe4_3QBCmQuHbYKmg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Removing useless DISTINCT clauses (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: [HACKERS] Removing useless DISTINCT clauses
Список	pgsql-hackers

Дерево обсуждения

Hi Tom,

Thanks for looking at this.

On 10 January 2018 at 09:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This is a cute idea, but I'm troubled by a couple of points:
>
> 1. Once you don't have all the tlist items shown in DISTINCT, it really is
> more like DISTINCT ON, seems like.  I am not sure it's a good idea to set
> hasDistinctOn, because that engages some planner behaviors we probably
> don't want, but I'm also not sure we can get away with just ignoring the
> difference.  As an example, in allpaths.c there are assorted assumptions
> that having a distinctClause but !hasDistinctOn means all output columns
> of a subquery are listed in the distinctClause.

hmm, I see:

/*
* If subquery has regular DISTINCT (not DISTINCT ON), we're wasting our
* time: all its output columns must be used in the distinctClause.
*/
if (subquery->distinctClause && !subquery->hasDistinctOn)
return;

I think that particular one would just fail to remove columns from the
subquery if there's a distinct clause (not distinct on). That seems
more like a missed optimisation rather than a future bug. Although it
probably would be a good idea to get rid of the assumption that a
non-NIL distinctClause && !hasDistinctOn means that all output target
items are part of that distinct clause.

I don't see anything else in allpaths.c that would be affected.
Keeping in mind we'll never completely remove a distinctClause, just
possibly remove some items from the list. Everything else just seems
to be checking distinctClause != NIL, or is only doing something if
hasDistinctOn == true.

I'll do some more analysis on places that distinctClause is being used
to check what's safe.

> 2. There's a comment in planner.c to the effect that
>
>          * When we have DISTINCT ON, we must sort by the more rigorous of
>          * DISTINCT and ORDER BY, else it won't have the desired behavior.
>          * Also, if we do have to do an explicit sort, we might as well use
>          * the more rigorous ordering to avoid a second sort later.  (Note
>          * that the parser will have ensured that one clause is a prefix of
>          * the other.)
>
> Removing random elements of the distinctClause will break its
> correspondence with the sortClause, with probably bad results.
>
> I do not remember for sure at the moment, but it may be that this
> correspondence is only important for the case of DISTINCT ON, in which
> case we could dodge the problem by not applying the optimization unless
> it's plain DISTINCT.  That doesn't help us with point 1 though.

Yeah, it seems better to only try and apply this for plain DISTINCT.

> BTW, my dictionary says it's "dependent" not "dependant".

Oops. Thanks.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Removing useless DISTINCT clauses