Re: Removing useless DISTINCT clauses

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Removing useless DISTINCT clauses
Дата
Msg-id 152160295559.12147.1676278544502314555.pgcf@coridan.postgresql.org
обсуждение исходный текст
Ответ на Re: [HACKERS] Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Removing useless DISTINCT clauses  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

This is a review of the patch to remove useless DISTINCT columns from the DISTINCT clause
https://www.postgresql.org/message-id/CAKJS1f8UMJ137sRuVSnEMDDpa57Q71JuLZi4yLCFMekNYVYqaQ%40mail.gmail.com


Contents & Purpose
==================

This patch removes any additional columns in the DISTINCT clause when the
table's primary key columns are entirely present in the DISTINCT clause. This
optimization works because the PK columns functionally determine the other
columns in the relation. The patch includes regression test cases.

Initial Run
===========
The patch applies cleanly to HEAD. All tests which are run as part of the
`installcheck` target pass correctly (all existing tests pass, all new tests
pass with the patch applied and fail without it). All TAP tests pass. All tests
which are run as part of the `installcheck-world` target pass except one of the
Bloom contrib module tests in `contrib/bloom/sql/bloom.sql`,
 `CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);`
which is currently also failing (crashing) for me on master, so it is unrelated to the
patch. The test cases seem to cover the new behavior.

Functionality
=============
Given that this optimization is safe for the GROUP BY clause (you can remove
functionally dependent attributes from the clause there as well), the logic
does seem to follow for DISTINCT. It seems semantically correct to do this. As
for the implementation, the author seems to have reasonably addressed concerns
expressed over the distinction between DISTINCT and DISTINCT ON. As far as I
can see, this should be fine.

Code Review
===========
For a small performance hit but an improvement in readability, the length check
can be moved from the individual group by and distinct clause checks into the
helper function

    if (list_length(parse->distinctClause) < 2)
      return;

    and

     if (list_length(parse->groupClause) < 2)
      return;

can be moved into `remove_functionally_dependent_clauses`


Comments/Documentation
=======================

The main helper function that is added `remove_functionally_dependent_clauses`
uses one style of comment--with the name of the function and then the rest of
the description indented which is found some places in the code, 
/*
 * remove_functionally_dependent_clauses
 *        Processes clauselist and removes any items which are deemed to be
 *        functionally dependent on other clauselist items.
 *
 * If any item from the list can be removed, then a new list is built which
 * does not contain the removed items.  If no item can be removed then the
 * original list is returned.
 */

while other helper functions in the same file use a different style--all lines
flush to the side with a space. I'm not sure which is preferred

/*
 * limit_needed - do we actually need a Limit plan node?
 *
 * If we have constant-zero OFFSET and constant-null LIMIT, we can skip adding
 * a Limit node.  This is worth checking for because "OFFSET 0" is a common
 * locution for an optimization fence.  (Because other places in the planner
 ...

it looks like the non-indented ones are from older commits (2013 vs 2016).

The list length change is totally optional, but I'll leave it as Waiting On Author in case the author wants to make
thatchange.
 

Overall, LGTM.

The new status of this patch is: Waiting on Author

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] taking stdbool.h into use
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] taking stdbool.h into use