Re: [PATCH] Erase the distinctClause if the result is unique by definition

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [PATCH] Erase the distinctClause if the result is unique by definition
Дата
Msg-id CAApHDvqg+mQyxJnCizE=qJcBL90L=oFXTFyiwWWEaUnzG7Uc5Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Erase the distinctClause if the result is unique by definition  (Andy Fan <zhihui.fan1213@gmail.com>)
Список pgsql-hackers
On Tue, 10 Mar 2020 at 21:19, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>> SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part.
>
>
> Actually I want to keep the distinct for this case now.  One reason is there are only 1
> row returned, so the distinct erasing can't help much.   The more important reason is
> Query->hasAggs is true for "select distinct  (select count(*) filter (where t2.c2 = 6
> and t2.c1 < 10) from ft1 t1 where t1.c1 = 6)  from ft2 t2 where t2.c2 % 6 = 0 order by 1;"
> (this sql came from postgres_fdw.sql).

I think that sort of view is part of the problem here.  If you want to
invent some new way to detect uniqueness that does not count that case
then we have more code with more possible places to have bugs.

FWIW, query_is_distinct_for() does detect that case with:

/*
* If we have no GROUP BY, but do have aggregates or HAVING, then the
* result is at most one row so it's surely unique, for any operators.
*/
if (query->hasAggs || query->havingQual)
return true;

which can be seen by the fact that the following find the unique join on t2.

postgres=# explain verbose select * from t1 inner join (select
count(*) c from t1) t2 on t1.a=t2.c;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Hash Join  (cost=41.91..84.25 rows=13 width=12)
   Output: t1.a, (count(*))
   Inner Unique: true
   Hash Cond: (t1.a = (count(*)))
   ->  Seq Scan on public.t1  (cost=0.00..35.50 rows=2550 width=4)
         Output: t1.a
   ->  Hash  (cost=41.89..41.89 rows=1 width=8)
         Output: (count(*))
         ->  Aggregate  (cost=41.88..41.88 rows=1 width=8)
               Output: count(*)
               ->  Seq Scan on public.t1 t1_1  (cost=0.00..35.50
rows=2550 width=0)
                     Output: t1_1.a
(12 rows)

It will be very simple to add an empty List of UniqueKeys to the GROUP
BY's RelOptInfo to indicate that all expressions are unique.  That way
any code that checks if some of the RelOptInfo's unique keys are a
subset of some expressions they'd like to know are unique, then
they'll get a match.

It does not really matter how much effort is saved in your example
above. The UniqueKey infrastructure won't know how much effort
properly adding all the uniquekeys will save. It should just add all
the keys it can and let whichever code cares about that reap the
benefits.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables