Removing Functionally Dependent GROUP BY Columns

Поиск
Список
Период
Сортировка
От David Rowley
Тема Removing Functionally Dependent GROUP BY Columns
Дата
Msg-id CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Removing Functionally Dependent GROUP BY Columns  (Marko Tiikkaja <marko@joh.to>)
Re: Removing Functionally Dependent GROUP BY Columns  (Robert Haas <robertmhaas@gmail.com>)
Re: Removing Functionally Dependent GROUP BY Columns  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
We already allow a SELECT's target list to contain non-aggregated columns in a GROUP BY query in cases where the non-aggregated column is functionally dependent on the GROUP BY clause.

For example a query such as;

SELECT p.product_id,p.description, SUM(s.quantity)
FROM product p
INNER JOIN sale s ON p.product_id = s.product_id
GROUP BY p.product_id;

is perfectly fine in PostgreSQL, as p.description is functionally dependent on p.product_id (assuming product_id is the PRIMARY KEY of product).

It seems that there's no shortage of relational databases in existence today which don't support this. These databases would require the GROUP BY clause to include the p.description column too. 

It seems rather unfortunate that people who migrate applications to PostgreSQL may not be aware that we support this, as currently if we needlessly include the p.description column, PostgreSQL naively includes this column while grouping. These people could well be incurring a performance penalty due to our planner not removing the useless items from the list, as if the primary key is present, then including any other columns won't cause splitting of the groups any further, all other columns from the *same relation* can simply be removed from the GROUP BY clause.

There are in fact also two queries in TPC-H (Q10 and Q18) which are written to include all of the non-aggregated column in the GROUP BY list. During a recent test I witnessed a 50% gain in performance in Q10 by removing the unneeded columns from the GROUP BY clause.

I've attached a patch which implements this in PostgreSQL.

The patch may need a little more work in order to adjust the targetlist's tleSortGroupRefs to remove invalid ones and perhaps also remove the gaps.

I'm posting this now so that I can gauge the community interest in this.

Is it something that we'd like to have in PostgreSQL?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Error with index on unlogged table
Следующее
От: Marko Tiikkaja
Дата:
Сообщение: Re: Removing Functionally Dependent GROUP BY Columns