Re: Todo: Teach planner to evaluate multiple windows in the optimal order

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Дата
Msg-id CAApHDvr7rSCVXzGfVa1L9pLpkKj6-s8NynK8o+98X9sKjejnQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Ответы Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Re: Todo: Teach planner to evaluate multiple windows in the optimal order  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Список pgsql-hackers
On Wed, 11 Jan 2023 at 19:21, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
> HashAgg has better cost than Unique even with incremental sort (tried
> with other case
>
> where we have more columns pushed down but still hashAgg wins).

I don't think you can claim that one so easily.  The two should have
quite different scaling characteristics which will be more evident
with a larger number of input rows. Also, Hash Aggregate makes use of
work_mem * hash_mem_multiplier, whereas sort uses work_mem.  Consider
a hash_mem_multiplier less than 1.0.

> > I'm not really all that sure the above query shape makes much sense in
> > the real world. Would anyone ever want to use DISTINCT on some results
> > containing WindowFuncs?
>
> This could still have been good to have if there were no negative impact
>
> and some benefit in few cases but as mentioned before, if hashagg removes
>
> any sort (which happened due to push down), all gains will be lost
>
> and we will be probably worse off than before.

We could consider adjusting the create_distinct_paths() so that it
uses some newly invented and less strict pathkey comparison where the
order of the pathkeys does not matter. It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list.  Something like that might be able
to apply in more cases, for example:

select distinct b,a from ab where a < 10;

the distinct pathkeys would be b,a but if there's an index on (a),
then we might have a path with pathkeys containing "a".

You can see when we manually swap the order of the DISTINCT clause
that we get a more optimal plan (even if they're not costed quite as
accurately as we might have liked)

create table ab(a int, b int);
create index on ab(a);
set enable_hashagg=0;
set enable_seqscan=0;
insert into ab select x,y from generate_series(1,100)x, generate_Series(1,100)y;
analyze ab;

# explain select distinct b,a from ab where a < 10;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Unique  (cost=72.20..78.95 rows=611 width=8)
   ->  Sort  (cost=72.20..74.45 rows=900 width=8)
         Sort Key: b, a
         ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04
rows=900 width=8)
               Index Cond: (a < 10)
(5 rows)

# explain select distinct a,b from ab where a < 10; -- manually swap
DISTINCT column order.
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Unique  (cost=0.71..60.05 rows=611 width=8)
   ->  Incremental Sort  (cost=0.71..55.55 rows=900 width=8)
         Sort Key: a, b
         Presorted Key: a
         ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04
rows=900 width=8)
               Index Cond: (a < 10)
(6 rows)

We might also want to also consider if Pathkey.pk_strategy and
pk_nulls_first need to be compared too.  That makes the check a bit
more expensive as Pathkeys are canonical and if those fields vary then
we need to perform more than just a comparison by the memory address
of the pathkey. This very much seems like a separate effort than the
WindowClause sort reduction work. I think it gives us everything we've
talked about extra we might want out of reducing WindowClause sorts
and more.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)