[PATCH] Remove useless distinct clauses

Поиск

Список

Период

Сортировка

От	Pierre Ducroquet
Тема	[PATCH] Remove useless distinct clauses
Дата	31 июля 2020 г. 08:41:19
Msg-id	7426429.iaNirn0XA0@peanuts2 обсуждение исходный текст
Ответы	Re: [PATCH] Remove useless distinct clauses Re: [PATCH] Remove useless distinct clauses
Список	pgsql-hackers

Дерево обсуждения

Hi

In a recent audit, I noticed that application developers have a tendency to 
abuse the distinct clause. For instance they use an ORM and add a distinct at 
the top level just because they don't know the cost it has, or they don't know 
that using EXISTS is a better way to express their queries than doing JOINs 
(or worse, they can't do better).

They thus have this kind of queries (considering tbl1 has a PK of course):
SELECT DISTINCT * FROM tbl1;
SELECT DISTINCT * FROM tbl1 ORDER BY a;
SELECT DISTINCT tbl1.* FROM tbl1
    JOIN tbl2 ON tbl2.a = tbl1.id;

These can be transformed into:
SELECT * FROM tbl1 ORDER BY *;
SELECT * FROM tbl1 ORDER BY a;
SELECT tbl1.* FROM tbl1 SEMI-JOIN tbl2 ON tbl2.a = tbl1.id ORDER BY tbl1.*;

The attached patch does that.
I added extra safeties in several place just to be sure I don't touch 
something I can not handle, but I may have been very candid with the distinct 
to sort transformation.
The cost of this optimization is quite low : for queries that don't have any 
distinct, it's just one if. If there is a distinct, we iterate once through 
every target, then we fetch the PK and iterate through the DISTINCT clause 
fields. If it is possible to optimize, we then iterate through the JOINs.

Any comment on this would be more than welcome!

Regards

 Pierre

Вложения

0001-Add-a-new-remove_useless_distinct-function-in-planne.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PATCH] Remove useless distinct clauses

Вложения