Обсуждение: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка

POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
Hi!

As far as I known, columns in GROUP BY could be reordered without loss of 
correctness. But reorder could give a benefits in performance. Suggested patch 
implements that in several ways:

1) Reorder GROUP BY columns by number of unique values in descent order. Simple 
example shows a performance difference by two times (see 
opt_group_by_demostrator.sql, on my notebook first two queries demonstrate 
that). The idea is: if first column is unique then sort comparator will never 
compute difference of following columns.

2) Reorder GROUP BY columns to match existing pathkeys which are result of index 
scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win.

3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER 
BY or merge join case it doesn't pay attention to actual order of columns 
because of 2)

Patch doesn't change any cost estimation computation, although 1) could take an 
advantage of it.

Some issues/problems/notices:

1) I didn't play around GROUPING SETS at all. As I can see, current coding 
prevents any optimization around it and, suppose, it should be addressed  to 
another patch

2) I'm not completely satisfied with counting of number of groups per column, it 
looks ugly without refactoring estimate_num_groups(). At least, now it could be 
called multiple times for each column. May be, this number should be added to 
PathKey structure?

3) EquivalenceClass->ec_sortref. If planner faced with column first time in 
WHERE clause it doesn't fill target reference field because it is unknown yet. 
Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause 
(groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates 
ec_sortref field if it's not initialized yet.

4) Algorithms to reorder columns is proportional to N^2 where N is number of 
columns, but I hope it isn't a problem because number of GROUP BY columns isn't 
very big usually.




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
Cosmetics change: remove find_sort_group_clause_by_sortref() function added in 
v5 patch version because it duplicates existsing get_sortgroupref_clause().

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Hi Teodor,

Thanks for the patch. This (missing) optimization popped-up repeatedly 
recently, and I was planning to look into it for PG12. So now I don't 
have to, because you've done all the hard work ;-)

I've have done only very basic review so far, but ISTM the general 
approach of considering additional indexes in create_index_paths is 
correct. That being said, I do have a couple of comments:


1) add_path() ensures that we only keep the one cheapest path sorted 
path for each pathkeys. This patch somewhat defeats that because it 
considers additional pathkeys (essentially any permutation of group 
keys) as interesting. So we may end up with more paths in the list.

I wonder if we should make the add_path() logic smarter to recognize 
when two paths have different pathkeys but were generated to match the 
same grouping, to reduce the number of paths added by this optimization. 
Currently we do that for each pathkeys list independently, but we're 
considering many more pathkeys orderings ...

That being said, maybe this concern is moot - to generate large number 
of new paths, there would have to be a lot of indexes matching the group 
by clause (as a prefix), and the group by clause would need to be fairly 
large (to have many permutations). That seems like a fairly rare case. I 
do know a couple of people who do create large numbers of indexes on 
tables, but those are usually single-column indexes.


2) sort reordering based on ndistinct estimates

This part of the patch (reordering the columns in an explicit sort node) 
seems to be largely independent of the "consider other indexes" part. So 
I do suggest separating those two, and discussing them independently.

But thinking about this optimization, I'm worried it relies on a couple 
of important assumptions. For now those decisions could have be made by 
the person writing the SQL query, but this optimization makes that 
impossible. So we really need to get this right.

For example, it seems to disregard that different data types have 
different comparison costs. For example comparing bytea will be far more 
expensive compared to int4, so it may be much more efficient to compare 
int4 columns first, even if there are far fewer distinct values in them.

This costing is probably related to abbreviated keys too, which is a 
huge improvement for text and other varlena-based types.

Also, simply sorting the columns by their ndistinct estimate is somewhat 
naive, because it assumes the columns are independent. Imagine for 
example a table with three columns:

    CREATE TABLE t (a INT, b INT, c INT);

    INSERT INTO t SELECT mod(i,100), mod(i,100)/2, 49*random()
      FROM generate_series(1,1000000) s(i);

Clearly, "a" has the most distinct values (100), while both "b" and "c" 
have 50 distinct values. But "b" does not actually split any of the "a" 
groups, while "c" does:

     select count(*) from (select 1 from t group by a,b) foo;
      count
     -------
        100
     (1 row)

     select count(*) from (select 1 from t group by a,c) foo;
      count
     -------
       5000
     (1 row)

So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use 
"(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using 
per-column ndistinct values. Luckily, we now have ndistinct multi-column 
coefficients which could be used to decide this I believe (but I haven't 
tried).

The real issue however is that this decision can't be made entirely 
locally. Consider for example this:

     explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
                                       QUERY PLAN 

 
------------------------------------------------------------------------------
      Sort  (cost=156010.16..156260.16 rows=100000 width=20)
        Sort Key: c, b, a
        ->  GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
              Group Key: a, c, b
              ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
                    Sort Key: a, c, b
                    ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 
width=12)
     (7 rows)

while without the patch, this would happen:

     explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
                                    QUERY PLAN 

 
------------------------------------------------------------------------
      GroupAggregate  (cost=132154.34..145654.34 rows=100000 width=20)
        Group Key: c, b, a
        ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
              Sort Key: c, b, a
              ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
     (5 rows)

Which is clearly cheaper (at least according to the optimizer) than 
doing two separate sorts. So the optimization actually does the wrong 
thing here, and it needs to somehow consider the other ordering 
requirements (in this case ORDER BY) - either by generating multiple 
paths with different orderings or some heuristics.

I'm also wondering how freely we can change the group by result 
ordering. Assume you disable hash aggregate and parallel query - 
currently we are guaranteed to use group aggregate that produces exactly 
the ordering as specified in GROUP BY, but this patch removes that 
"guarantee" and we can produce arbitrary permutation of the ordering. 
But as I argued in other threads, such implicit guarantees are really 
fragile, can silently break for arbitrary reasons (say, parallel query 
will do just that) and can be easily fixed by adding a proper ORDER BY. 
So I don't think this is an issue.

The other random thought is how will/could this interact with the 
incremental sorts patch. I know Alexander wanted to significantly limit 
where we actually consider the incremental sort, but I'm not sure if 
this was one of those places or not (is sure looks like a place where we 
would greatly benefit from it).

So to summarize this initial review - I do suggest splitting the patch 
into two parts. One that does the index magic, and one for this 
reordering optimization. The first part (additional indexes) seems quite 
fairly safe, likely to get committable soon. The other part (ndistinct 
reordering) IMHO requires more thought regarding costing and interaction 
with other query parts.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
> Thanks for the patch. This (missing) optimization popped-up repeatedly recently, 
> and I was planning to look into it for PG12. So now I don't have to, because 
> you've done all the hard work ;-)
You are welcome. Actually one of out customers faced the problem with  GROUP BY 
column order and exactly with reordering without any indexes, you mean it as 
problem 2). Index optimization was noticed by me later. But based on your 
suggested patch's order I split the patch to index  and non-index part and 
second part depends of first one. They touch the same part of code and they 
could not be independent

> 1) add_path() ensures that we only keep the one cheapest path sorted path for 
> each pathkeys. This patch somewhat defeats that because it considers additional 
> pathkeys (essentially any permutation of group keys) as interesting. So we may 
> end up with more paths in the list.
Seems, index scan here could give benefits here only if:
   1) it's a index only scan
   2) it's a index full (opposite to only) scan but physical order of heap is
      close to logical index order (ie table is clustered)

In other cases costs of disk seeking will be very high. But on this stage of 
planing we don't know that facts yet. So we couldn't make a good decision here 
and should believe in add_path() logic.

 > I wonder if we should make the add_path() logic smarter to recognize when two
 > paths have different pathkeys but were generated to match the same grouping,
 > to reduce the number of paths added by this optimization. Currently we do 
that > for each pathkeys list independently, but we're considering many more 
pathkeys > orderings ...

Hm. I tend to say no.
select .. from t1 group by a, b
union
select .. from t2 group by a, b

t1 and t2 could have different set of indexes and different distribution, so 
locally it could be cheaper to use one index (for example, one defined as (b, a) 
and second as (a,b,c,d) - second is larger) but totally - another (example: 
second table doesn't have (b,a) index)

> 2) sort reordering based on ndistinct estimates

> But thinking about this optimization, I'm worried it relies on a couple of 
> important assumptions. For now those decisions could have be made by the person 
> writing the SQL query, but this optimization makes that impossible. So we really 
> need to get this right.
Hm, sql by design should not be used that way, but, of course, it's used :(

> For example, it seems to disregard that different data types have different 
> comparison costs. For example comparing bytea will be far more expensive 
> compared to int4, so it may be much more efficient to compare int4 columns 
> first, even if there are far fewer distinct values in them.
as I can see cost_sort() doesn't pay attention to this details. And it should be 
a separated patch to improve that.

> Also, simply sorting the columns by their ndistinct estimate is somewhat naive, 
> because it assumes the columns are independent. Imagine for example a table with 
> three columns:
> So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to use 
> "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely using 
> per-column ndistinct values. Luckily, we now have ndistinct multi-column 
> coefficients which could be used to decide this I believe (but I haven't tried).
Agree, but I don't know how to use it here. Except, may be:
1) first column - the column with bigger estimated number of groups
2) second column - the pair of (first, second) with bigger estimated number of 
groups
3) third column - the triple of (first, second, third) with bigger ...

But seems even with that algorithm, ISTM, it could be implemented in cheaper manner.


> The real issue however is that this decision can't be made entirely locally. 
> Consider for example this:
> 
>      explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
> 
> Which is clearly cheaper (at least according to the optimizer) than doing two 
> separate sorts. So the optimization actually does the wrong thing here, and it 
> needs to somehow consider the other ordering requirements (in this case ORDER 
> BY) - either by generating multiple paths with different orderings or some 
> heuristics.
Hm, thank you. I consider it is a bug of my implementation - basic idea was that 
we try to match already existing or needed order and only if we fail or have 
unmatched tail of pathkey list than we will try to find cheapest column order.
Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by naive 
way: if we don't have a path pathkey first try to reorder columns accordingly to 
order by clause. Test for your is also added.


> I'm also wondering how freely we can change the group by result ordering. Assume 
> you disable hash aggregate and parallel query - currently we are guaranteed to 
> use group aggregate that produces exactly the ordering as specified in GROUP BY, 
> but this patch removes that "guarantee" and we can produce arbitrary permutation 
> of the ordering. But as I argued in other threads, such implicit guarantees are 
> really fragile, can silently break for arbitrary reasons (say, parallel query 
> will do just that) and can be easily fixed by adding a proper ORDER BY. So I 
> don't think this is an issue.
Agree. SQL by design doesn't give a warranty of particular order without 
explicit ORDER BY clause.

> The other random thought is how will/could this interact with the incremental 
> sorts patch. I know Alexander wanted to significantly limit where we actually 
> consider the incremental sort, but I'm not sure if this was one of those places 
> or not (is sure looks like a place where we would greatly benefit from it).
Seems, they should not directly interact. Patch tries to find cheapest column 
order, Alexander's patch tries to make sort cheaper - they are a different 
tasks. But I will try.

> So to summarize this initial review - I do suggest splitting the patch into two 
> parts. One that does the index magic, and one for this reordering optimization. 
> The first part (additional indexes) seems quite fairly safe, likely to get 
> committable soon. The other part (ndistinct reordering) IMHO requires more 
> thought regarding costing and interaction with other query parts.
Thank you for review!

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/05/2018 07:56 PM, Teodor Sigaev wrote:
>> Thanks for the patch. This (missing) optimization popped-up repeatedly 
>> recently, and I was planning to look into it for PG12. So now I don't 
>> have to, because you've done all the hard work ;-)
> You are welcome. Actually one of out customers faced the problem with  
> GROUP BY column order and exactly with reordering without any indexes, 
> you mean it as problem 2). Index optimization was noticed by me later. 
> But based on your suggested patch's order I split the patch to index  
> and non-index part and second part depends of first one. They touch the 
> same part of code and they could not be independent
> 

The way I see it the patch does two different things:

a) consider additional indexes by relaxing the pathkeys check

b) if there are no indexes, i.e. when an explicit Sort is needed, 
consider reordering the columns by ndistinct

Not sure why those two parts couldn't be separated. I haven't tried 
splitting the patch, of course, so I may be missing something.

In the worst case, one part will depend on the other, which is OK. It 
still allows us to commit the first part and continue working on the 
other one, for example.

>> 1) add_path() ensures that we only keep the one cheapest path sorted 
>> path for each pathkeys. This patch somewhat defeats that because it 
>> considers additional pathkeys (essentially any permutation of group 
>> keys) as interesting. So we may end up with more paths in the list.
> Seems, index scan here could give benefits here only if:
>    1) it's a index only scan
>    2) it's a index full (opposite to only) scan but physical order of 
> heap is
>       close to logical index order (ie table is clustered)
> 
> In other cases costs of disk seeking will be very high. But on this 
> stage of planing we don't know that facts yet. So we couldn't make a 
> good decision here and should believe in add_path() logic.
>
Not sure what you mean? Surely we do costing of the paths at this stage, 
so we can decide which one is cheaper etc. The decision which paths to 
keep is done by add_path(), and it should stay like this, of course. I 
wasn't suggesting to move the logic elsewhere.

>  > I wonder if we should make the add_path() logic smarter to recognize 
> when two
>  > paths have different pathkeys but were generated to match the same 
> grouping,
>  > to reduce the number of paths added by this optimization. Currently 
> we do that > for each pathkeys list independently, but we're considering 
> many more pathkeys > orderings ...
> 
> Hm. I tend to say no.
> select .. from t1 group by a, b
> union
> select .. from t2 group by a, b
> 
> t1 and t2 could have different set of indexes and different 
> distribution, so locally it could be cheaper to use one index (for 
> example, one defined as (b, a) and second as (a,b,c,d) - second is 
> larger) but totally - another (example: second table doesn't have (b,a) 
> index)
> 

But add_path() treats each of the relations independently, why couldn't 
we pick a different index for each of the two relations?

>> 2) sort reordering based on ndistinct estimates
> 
>> But thinking about this optimization, I'm worried it relies on a 
>> couple of important assumptions. For now those decisions could have be 
>> made by the person writing the SQL query, but this optimization makes 
>> that impossible. So we really need to get this right.
> Hm, sql by design should not be used that way, but, of course, it's used :(
> 

Well, yes and no. I'm not worried about people relying on us to give 
them some ordering - they can (and should) add an ORDER BY clause to fix 
that. I'm more worried about the other stuff.

>> For example, it seems to disregard that different data types have 
>> different comparison costs. For example comparing bytea will be far 
>> more expensive compared to int4, so it may be much more efficient to 
>> compare int4 columns first, even if there are far fewer distinct 
>> values in them.
> as I can see cost_sort() doesn't pay attention to this details. And it 
> should be a separated patch to improve that.
> 

But sort also does not reorder columns.

Imagine you have a custom data type that is expensive for comparisons. 
You know that, so you place it at the end of GROUP BY clauses, to reduce 
the number of comparisons on that field. And then we come along and just 
reorder the columns, placing it first, because it happens to have a high 
ndistinct statistic. And there's no way to get the original behavior :-(

>> Also, simply sorting the columns by their ndistinct estimate is 
>> somewhat naive, because it assumes the columns are independent. 
>> Imagine for example a table with three columns:
>> So clearly, when evaluating GROUP BY a,b,c it'd be more efficient to 
>> use "(a,c,b)" than "(a,b,c)" but there is no way to decide this merely 
>> using per-column ndistinct values. Luckily, we now have ndistinct 
>> multi-column coefficients which could be used to decide this I believe 
>> (but I haven't tried).
> Agree, but I don't know how to use it here. Except, may be:
> 1) first column - the column with bigger estimated number of groups
> 2) second column - the pair of (first, second) with bigger estimated 
> number of groups
> 3) third column - the triple of (first, second, third) with bigger ...
> 
> But seems even with that algorithm, ISTM, it could be implemented in 
> cheaper manner.
> 

Maybe. I do have some ideas, although I haven't tried implementing it.

If there's no extended statistic on the columns, you can do the current 
thing (assuming independence etc.). There's not much we can do here.

If there's an extended statistic, you can do either a greedy search (get 
the next column with the highest ndistinct coefficient) or exhaustive 
search (computing the estimated number of comparisons).

Another challenge is that using only the ndistinct coefficient assumes 
uniform distribution of the values. But you can have a column with 1M 
distinct values, where a single value represents 99% of the rows. And 
another column with 100k distinct values, with actual uniform 
distribution. I'm pretty sure it'd be more efficient to place the 100k 
column first.

> 
>> The real issue however is that this decision can't be made entirely 
>> locally. Consider for example this:
>>
>>      explain select a,b,c, count(*) from t group by a,b,c order by c,b,a;
>>
>> Which is clearly cheaper (at least according to the optimizer) than 
>> doing two separate sorts. So the optimization actually does the wrong 
>> thing here, and it needs to somehow consider the other ordering 
>> requirements (in this case ORDER BY) - either by generating multiple 
>> paths with different orderings or some heuristics.
> Hm, thank you. I consider it is a bug of my implementation - basic idea 
> was that we try to match already existing or needed order and only if we 
> fail or have unmatched tail of pathkey list than we will try to find 
> cheapest column order.
> Fixed in v7 (0002-opt_group_by_index_and_order-v7.patch), but may be by 
> naive way: if we don't have a path pathkey first try to reorder columns 
> accordingly to order by clause. Test for your is also added.
> 

OK. I'll give it a try.

> 
>> I'm also wondering how freely we can change the group by result 
>> ordering. Assume you disable hash aggregate and parallel query - 
>> currently we are guaranteed to use group aggregate that produces 
>> exactly the ordering as specified in GROUP BY, but this patch removes 
>> that "guarantee" and we can produce arbitrary permutation of the 
>> ordering. But as I argued in other threads, such implicit guarantees 
>> are really fragile, can silently break for arbitrary reasons (say, 
>> parallel query will do just that) and can be easily fixed by adding a 
>> proper ORDER BY. So I don't think this is an issue.
> Agree. SQL by design doesn't give a warranty of particular order without 
> explicit ORDER BY clause.
> 
>> The other random thought is how will/could this interact with the
>> incremental sorts patch. I know Alexander wanted to significantly 
>> limit where we actually consider the incremental sort, but I'm not 
>> sure if this was one of those places or not (is sure looks like a 
>> place where we would greatly benefit from it).
> Seems, they should not directly interact. Patch tries to find cheapest 
> column order, Alexander's patch tries to make sort cheaper - they are a 
> different tasks. But I will try.
> 

Thanks.

>> So to summarize this initial review - I do suggest splitting the patch 
>> into two parts. One that does the index magic, and one for this 
>> reordering optimization. The first part (additional indexes) seems 
>> quite fairly safe, likely to get committable soon. The other part 
>> (ndistinct reordering) IMHO requires more thought regarding costing 
>> and interaction with other query parts.
> Thank you for review!
> 

;-)


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
>> problem 2). Index optimization was noticed by me later. But based on your 
>> suggested patch's order I split the patch to index and non-index part and 
>> second part depends of first one. They touch the same part of code and they 
>> could not be independent
> The way I see it the patch does two different things:
Yes

> a) consider additional indexes by relaxing the pathkeys check
Yes, but not only. Patch could reorder GROUP BY clause to match existing 
pathkeys which could come from index scan (suggested by patch or not) or some 
another way to get ordered output.

> b) if there are no indexes, i.e. when an explicit Sort is needed, consider 
> reordering the columns by ndistinct
Yes. But again, this description is a bit short. First it works after first 
patch and might get some preordered leading pathkeys. Second, it tries to match 
ORDER BY clause order if there is no preordered leading pathkeys from first 
patch (it was introduced in v7). And third, if there is a tail of unmatched 
pathkeys on previous stages then it will reorder that tail.

> In the worst case, one part will depend on the other, which is OK. It still 
> allows us to commit the first part and continue working on the other one, for 
> example.
Exactly it's our case. Of course, it's possible to split first patch for two 
again: just suggestion of index (1.1) and reordering by existing pathkeys (1.2). 
Then 1.1 will be independent but 2 still should be applied after 1.2. But patch 
1.1 is rather useless.

> can decide which one is cheaper etc. The decision which paths to keep is done by 
> add_path(), and it should stay like this, of course. I wasn't suggesting to move 
> the logic elsewhere.
Cool, I haven't intention to modify it too.

> 
>>  > I wonder if we should make the add_path() logic smarter to recognize when two
>>  > paths have different pathkeys but were generated to match the same grouping,
>>  > to reduce the number of paths added by this optimization. Currently we do 
>> that > for each pathkeys list independently, but we're considering many more 
>> pathkeys > orderings ...
>>
>> Hm. I tend to say no.
>> select .. from t1 group by a, b
>> union
>> select .. from t2 group by a, b
>>
>> t1 and t2 could have different set of indexes and different distribution, so 
>> locally it could be cheaper to use one index (for example, one defined as (b, 
>> a) and second as (a,b,c,d) - second is larger) but totally - another (example: 
>> second table doesn't have (b,a) index)
>>
> 
> But add_path() treats each of the relations independently, why couldn't we pick 
> a different index for each of the two relations?

Having of sorted output of both subselect could be cheaper that sorting one of 
them even if index scan was cheaper. But it seems to me that is not deal of 
suggested here optimization.


>>> For example, it seems to disregard that different data types have different 
>>> comparison costs. For example comparing bytea will be far more expensive 
>>> compared to int4, so it may be much more efficient to compare int4 columns 
>>> first, even if there are far fewer distinct values in them.
>> as I can see cost_sort() doesn't pay attention to this details. And it should 
>> be a separated patch to improve that.
>>
> 
> But sort also does not reorder columns.
Yes. But estimation of cost of comparing function/number of unique values in 
column could be not very accurate and so planner could make a wrong choice. I 
saw 2 times difference in real-world application. Again, improving sort cost 
estimation is a separate task.

> 
> Imagine you have a custom data type that is expensive for comparisons. You know 
> that, so you place it at the end of GROUP BY clauses, to reduce the number of 
> comparisons on that field. And then we come along and just reorder the columns, 
> placing it first, because it happens to have a high ndistinct statistic. And 
> there's no way to get the original behavior :-(
Hm. that it could be, but I don't know how to improve here.  Current cost_sort() 
will return the same cost for any columns order.

Okay, here we know an estimation of nrow, we could somehow find a comparing 
function oid and find a its procost field. And then? we also need to know width 
of tuple here and mostly repeat the cost_sort.

Another option is an introducing enable_groupby_reorder GUC variable although 
personally I don't like an idea to add new GUC variable.

>> Agree, but I don't know how to use it here. Except, may be:
>> 1) first column - the column with bigger estimated number of groups
>> 2) second column - the pair of (first, second) with bigger estimated number of 
>> groups
>> 3) third column - the triple of (first, second, third) with bigger ...
>>
>> But seems even with that algorithm, ISTM, it could be implemented in cheaper 
>> manner.
>>
> 
> Maybe. I do have some ideas, although I haven't tried implementing it.
Implemented, pls, have a look.

> 
> If there's no extended statistic on the columns, you can do the current thing 
> (assuming independence etc.). There's not much we can do here.
Agree

> 
> If there's an extended statistic, you can do either a greedy search (get the 
> next column with the highest ndistinct coefficient) or exhaustive search 
> (computing the estimated number of comparisons).
> 
> Another challenge is that using only the ndistinct coefficient assumes uniform 
> distribution of the values. But you can have a column with 1M distinct values, 
> where a single value represents 99% of the rows. And another column with 100k 
> distinct values, with actual uniform distribution. I'm pretty sure it'd be more 
> efficient to place the 100k column first.

Interesting.  Will think, thank you


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/06/2018 08:04 PM, Teodor Sigaev wrote:
> 
>>> problem 2). Index optimization was noticed by me later. But based on
>>> your suggested patch's order I split the patch to index and non-index
>>> part and second part depends of first one. They touch the same part
>>> of code and they could not be independent
>> The way I see it the patch does two different things:
> Yes
> 
>> a) consider additional indexes by relaxing the pathkeys check
> Yes, but not only. Patch could reorder GROUP BY clause to match existing
> pathkeys which could come from index scan (suggested by patch or not) or
> some another way to get ordered output.
> 
>> b) if there are no indexes, i.e. when an explicit Sort is needed,
>> consider reordering the columns by ndistinct
> Yes. But again, this description is a bit short. First it works after
> first patch and might get some preordered leading pathkeys. Second, it
> tries to match ORDER BY clause order if there is no preordered leading
> pathkeys from first patch (it was introduced in v7). And third, if there
> is a tail of unmatched pathkeys on previous stages then it will reorder
> that tail.
> 

OK, I haven't looked at v7 yet, but if I understand correctly it tries
to maintain the ordering as much as possible? Does that actually help? I
mean, the incremental sort patch allows the sorting to happen by pieces,
but here we still need to sort all the data, right?

Can you give an example demonstrating the benefit?

>> In the worst case, one part will depend on the other, which is OK. It
>> still allows us to commit the first part and continue working on the
>> other one, for example.
> Exactly it's our case. Of course, it's possible to split first patch for
> two again: just suggestion of index (1.1) and reordering by existing
> pathkeys (1.2). Then 1.1 will be independent but 2 still should be
> applied after 1.2. But patch 1.1 is rather useless.
> 
>> can decide which one is cheaper etc. The decision which paths to keep
>> is done by add_path(), and it should stay like this, of course. I
>> wasn't suggesting to move the logic elsewhere.
> Cool, I haven't intention to modify it too.
> 
>>
>>>  > I wonder if we should make the add_path() logic smarter to
>>> recognize when two
>>>  > paths have different pathkeys but were generated to match the same
>>> grouping,
>>>  > to reduce the number of paths added by this optimization.
>>> Currently we do that > for each pathkeys list independently, but
>>> we're considering many more pathkeys > orderings ...
>>>
>>> Hm. I tend to say no.
>>> select .. from t1 group by a, b
>>> union
>>> select .. from t2 group by a, b
>>>
>>> t1 and t2 could have different set of indexes and different
>>> distribution, so locally it could be cheaper to use one index (for
>>> example, one defined as (b, a) and second as (a,b,c,d) - second is
>>> larger) but totally - another (example: second table doesn't have
>>> (b,a) index)
>>>
>>
>> But add_path() treats each of the relations independently, why
>> couldn't we pick a different index for each of the two relations?
> 
> Having of sorted output of both subselect could be cheaper that sorting
> one of them even if index scan was cheaper. But it seems to me that is
> not deal of suggested here optimization.
> 

OK, I think I understand what you're trying to say. We may have a query
like this:

SELECT a, SUM(x) FROM
(
    SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b
    UNION ALL
    SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b
) foo GROUP BY a;

and indexes on (a,b) and (b,a) for both relations. The "deduplication"
by pathkeys I suggested would mean we might keep only index (a,b) on t1
and (b,a) on t2, which means the grouping by "a" can't leverage index
scans on both relations. But if we keep paths for both indexes on each
relation, we can.

That's a good point, yes.

> 
>>>> For example, it seems to disregard that different data types have
>>>> different comparison costs. For example comparing bytea will be far
>>>> more expensive compared to int4, so it may be much more efficient to
>>>> compare int4 columns first, even if there are far fewer distinct
>>>> values in them.
>>> as I can see cost_sort() doesn't pay attention to this details. And
>>> it should be a separated patch to improve that.
>>>
>>
>> But sort also does not reorder columns.
> Yes. But estimation of cost of comparing function/number of unique 
> values in column could be not very accurate and so planner could make
> a wrong choice.

Isn't "estimation of cost of comparing function/number of unique values
in column could be not very accurate and so planner could make a wrong
choice" is more an argument against relying on it when doing these
optimizations?

FWIW it's one of the arguments Tom made in the incremental sort patch,
which relies on it too when computing cost of the incremental sort. I'm
sure it's going to be an obstacle there too.

> I saw 2 times difference in real-world application. Again, improving
> sort cost estimation is a separate task.
Sure. But we also need to ask the other question, i.e. how many people
would be negatively affected by the optimization. And I admit I don't
know the answer to that, the next example is entirely made up.

>>
>> Imagine you have a custom data type that is expensive for comparisons.
>> You know that, so you place it at the end of GROUP BY clauses, to
>> reduce the number of comparisons on that field. And then we come along
>> and just reorder the columns, placing it first, because it happens to
>> have a high ndistinct statistic. And there's no way to get the
>> original behavior :-(
> Hm. that it could be, but I don't know how to improve here.  Current
> cost_sort() will return the same cost for any columns order.
> 
> Okay, here we know an estimation of nrow, we could somehow find a
> comparing function oid and find a its procost field. And then? we also
> need to know width of tuple here and mostly repeat the cost_sort.
> 
> Another option is an introducing enable_groupby_reorder GUC variable
> although personally I don't like an idea to add new GUC variable.
> 

I don't like the idea of yet another GUC either, as it's rather blunt
instrument. Which is why I'm suggesting to split the patch into two
parts. Then we can apply the index stuff (which seems relatively
straightforward) and keep working on this second part.

FWIW I think it would be useful to have "development GUC" that would
allow us to enable/disable these options during development, because
that makes experiments much easier. But then remove them before commit.

>>> Agree, but I don't know how to use it here. Except, may be:
>>> 1) first column - the column with bigger estimated number of groups
>>> 2) second column - the pair of (first, second) with bigger estimated
>>> number of groups
>>> 3) third column - the triple of (first, second, third) with bigger ...
>>>
>>> But seems even with that algorithm, ISTM, it could be implemented in
>>> cheaper manner.
>>>
>>
>> Maybe. I do have some ideas, although I haven't tried implementing it.
> Implemented, pls, have a look.
> 
>>
>> If there's no extended statistic on the columns, you can do the
>> current thing (assuming independence etc.). There's not much we can do
>> here.
> Agree
> 
>>
>> If there's an extended statistic, you can do either a greedy search
>> (get the next column with the highest ndistinct coefficient) or
>> exhaustive search (computing the estimated number of comparisons).
>>
>> Another challenge is that using only the ndistinct coefficient assumes
>> uniform distribution of the values. But you can have a column with 1M
>> distinct values, where a single value represents 99% of the rows. And
>> another column with 100k distinct values, with actual uniform
>> distribution. I'm pretty sure it'd be more efficient to place the 100k
>> column first.
> 
> Interesting.  Will think, thank you
> 

You're welcome.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Claudio Freire
Дата:
On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> >>>> For example, it seems to disregard that different data types have
> >>>> different comparison costs. For example comparing bytea will be far
> >>>> more expensive compared to int4, so it may be much more efficient to
> >>>> compare int4 columns first, even if there are far fewer distinct
> >>>> values in them.
> >>> as I can see cost_sort() doesn't pay attention to this details. And
> >>> it should be a separated patch to improve that.
> >>>
> >>
> >> But sort also does not reorder columns.
> > Yes. But estimation of cost of comparing function/number of unique
> > values in column could be not very accurate and so planner could make
> > a wrong choice.

...
>
> >> Imagine you have a custom data type that is expensive for comparisons.
> >> You know that, so you place it at the end of GROUP BY clauses, to
> >> reduce the number of comparisons on that field. And then we come along
> >> and just reorder the columns, placing it first, because it happens to
> >> have a high ndistinct statistic. And there's no way to get the
> >> original behavior :-(
> > Hm. that it could be, but I don't know how to improve here.  Current
> > cost_sort() will return the same cost for any columns order.
> >
> > Okay, here we know an estimation of nrow, we could somehow find a
> > comparing function oid and find a its procost field. And then? we also
> > need to know width of tuple here and mostly repeat the cost_sort.
> >
> > Another option is an introducing enable_groupby_reorder GUC variable
> > although personally I don't like an idea to add new GUC variable.
> >
>
> I don't like the idea of yet another GUC either, as it's rather blunt
> instrument. Which is why I'm suggesting to split the patch into two
> parts. Then we can apply the index stuff (which seems relatively
> straightforward) and keep working on this second part.
>
> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.

Correct me if I'm wrong, but doesn't this patch concern itself with
precisely sort performance?

As such, estimating sort performance correctly in the various plan
variants being considered seems to be a very central aspect of it.

This means it's pretty much time to consider the effect of operator
cost *and* distinct values in the cost computation.

Assuming cost_sort does its thing, it's just a matter of building the
desired variants and picking the one with the smallest cost_sort. One
can use heuristics to build a subset of all possible column orderings
with a guiding principle that tries to maximize the likelihook of
finding a better order than the one in the query (like sorting by
ndistinct), but I'd suggest:

- Always considering the sort order verbatim as given in the SQL (ie:
what the user requests)
- Relying on cost_sort to distinguish among them, and pick a winner, and
- When in doubt (if cost_sort doesn't produce a clear winner), use the
order given by the user

Comparison cost can be approximated probabilistically as:

cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))

Where cost_op_n is the cost of the comparison function for column N,
and ndistinct(col_1_to_n) is an estimation of the number of distinct
values for columns prior to N in the sort order.

You can approximate ndistinct as the product of ndistinct of previous
columns, or use extended statistics when available.

I think that should give a good enough approximation of
ndistinct-enriched sort costs that considers operator cost
appropriately. That operator cost is basically an average and can be
used as a constant, so it's just a matter of passing the right
comparison_cost to cost_sort.

Even if ndistinct estimates are off, the fact that operator cost is
involved should be a good enough tool for the user should the planner
pick the wrong path - it's only a matter of boosting operator cost to
let the planner know that sorting that way is expensive.

Priorization of the user-provided order can be as simple as giving
that comparison_cost a small handicap.


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/06/2018 11:22 PM, Claudio Freire wrote:
> On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>>>>>> For example, it seems to disregard that different data types have
>>>>>> different comparison costs. For example comparing bytea will be far
>>>>>> more expensive compared to int4, so it may be much more efficient to
>>>>>> compare int4 columns first, even if there are far fewer distinct
>>>>>> values in them.
>>>>> as I can see cost_sort() doesn't pay attention to this details. And
>>>>> it should be a separated patch to improve that.
>>>>>
>>>>
>>>> But sort also does not reorder columns.
>>> Yes. But estimation of cost of comparing function/number of unique
>>> values in column could be not very accurate and so planner could make
>>> a wrong choice.
> 
> ...
>>
>>>> Imagine you have a custom data type that is expensive for comparisons.
>>>> You know that, so you place it at the end of GROUP BY clauses, to
>>>> reduce the number of comparisons on that field. And then we come along
>>>> and just reorder the columns, placing it first, because it happens to
>>>> have a high ndistinct statistic. And there's no way to get the
>>>> original behavior :-(
>>> Hm. that it could be, but I don't know how to improve here.  Current
>>> cost_sort() will return the same cost for any columns order.
>>>
>>> Okay, here we know an estimation of nrow, we could somehow find a
>>> comparing function oid and find a its procost field. And then? we also
>>> need to know width of tuple here and mostly repeat the cost_sort.
>>>
>>> Another option is an introducing enable_groupby_reorder GUC variable
>>> although personally I don't like an idea to add new GUC variable.
>>>
>>
>> I don't like the idea of yet another GUC either, as it's rather blunt
>> instrument. Which is why I'm suggesting to split the patch into two
>> parts. Then we can apply the index stuff (which seems relatively
>> straightforward) and keep working on this second part.
>>
>> FWIW I think it would be useful to have "development GUC" that would
>> allow us to enable/disable these options during development, because
>> that makes experiments much easier. But then remove them before commit.
> 
> Correct me if I'm wrong, but doesn't this patch concern itself with 
> precisely sort performance?
> 

Yes, the second part (reordering pathkeys by ndistinct) does.

> As such, estimating sort performance correctly in the various plan
> variants being considered seems to be a very central aspect of it.
> 
> This means it's pretty much time to consider the effect of operator
> cost *and* distinct values in the cost computation.
> 

Yes, until now that was not really needed because the optimizer does not
consider reordering of the pathkeys - it simply grabs either GROUP BY or
ORDER BY pathkeys and runs with that.

So the costing was fairly trivial, we simply do something like

    comparison_cost = 2.0 * cpu_operator_cost;

    sort_cost = comparison_cost * tuples * LOG2(tuples);

which essentially ignores that there might be multiple columns, or that
the columns may have sort operator with different costs.

The question is how reliable the heuristics can be. The current patch
uses just plain ndistinct, but that seems rather unreliable but I don't
have a clear idea how to improve that - we may have MCV for the columns
and perhaps some extended statistics, but I'm not sure how far we can
run with that.

Essentially what we need to estimate the number of comparisons for each
column, to compute better comparison_cost.

>
> Assuming cost_sort does its thing, it's just a matter of building the
> desired variants and picking the one with the smallest cost_sort. One
> can use heuristics to build a subset of all possible column orderings
> with a guiding principle that tries to maximize the likelihook of
> finding a better order than the one in the query (like sorting by
> ndistinct), but I'd suggest:
> 
> - Always considering the sort order verbatim as given in the SQL (ie:
> what the user requests)
> - Relying on cost_sort to distinguish among them, and pick a winner, and
> - When in doubt (if cost_sort doesn't produce a clear winner), use the
> order given by the user
> 

I don't see why not to generate all possible orderings (keeping only the
cheapest path for each pathkey variant) and let the optimizer to sort it
out. If the user specified an explicit ORDER BY, we'll slap an extra
Sort by at the end, increasing the cost.

> Comparison cost can be approximated probabilistically as:
> 
> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> 
> Where cost_op_n is the cost of the comparison function for column N,
> and ndistinct(col_1_to_n) is an estimation of the number of distinct
> values for columns prior to N in the sort order.
> 
> You can approximate ndistinct as the product of ndistinct of previous
> columns, or use extended statistics when available.
> 

Sure. The trouble is this also assumes uniform distribution, which is
not always the case.

> I think that should give a good enough approximation of 
> ndistinct-enriched sort costs that considers operator cost 
> appropriately. That operator cost is basically an average and can be 
> used as a constant, so it's just a matter of passing the right 
> comparison_cost to cost_sort.
> 
> Even if ndistinct estimates are off, the fact that operator cost is 
> involved should be a good enough tool for the user should the
> planner pick the wrong path - it's only a matter of boosting operator
> cost to let the planner know that sorting that way is expensive.
> 

There are far to many "should" in these two paragraphs.

> Priorization of the user-provided order can be as simple as giving
> that comparison_cost a small handicap.

I see no point in doing that, and I don't recall a single place in the
planner where we do that. If the user specified ORDER BY, we'll slap an
explicit Sort on top when needed (which acts as the handicap, but in a
clear way). Otherwise we don't do such things - it'd be just plain
confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
types, ndistinct etc. but unexpectedly different costs). Also, what
would be a good value for the handicap?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Claudio Freire
Дата:
On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On 06/06/2018 11:22 PM, Claudio Freire wrote:
> > On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
> > As such, estimating sort performance correctly in the various plan
> > variants being considered seems to be a very central aspect of it.
> >
> > This means it's pretty much time to consider the effect of operator
> > cost *and* distinct values in the cost computation.
> >
>
> Yes, until now that was not really needed because the optimizer does not
> consider reordering of the pathkeys - it simply grabs either GROUP BY or
> ORDER BY pathkeys and runs with that.
>
> So the costing was fairly trivial, we simply do something like
>
>     comparison_cost = 2.0 * cpu_operator_cost;
>
>     sort_cost = comparison_cost * tuples * LOG2(tuples);
>
> which essentially ignores that there might be multiple columns, or that
> the columns may have sort operator with different costs.
>
> The question is how reliable the heuristics can be. The current patch
> uses just plain ndistinct, but that seems rather unreliable but I don't
> have a clear idea how to improve that - we may have MCV for the columns
> and perhaps some extended statistics, but I'm not sure how far we can
> run with that.
>
> Essentially what we need to estimate the number of comparisons for each
> column, to compute better comparison_cost.

This could be approached by adjusting statistics by any relevant
restrictions applicable to the columns being grouped, as is done for
selectivity estimations.

I'm not sure how far that would get us, though. It would be rather
easy to lose sight of those restrictions if there are complex
operations involved.

> > Assuming cost_sort does its thing, it's just a matter of building the
> > desired variants and picking the one with the smallest cost_sort. One
> > can use heuristics to build a subset of all possible column orderings
> > with a guiding principle that tries to maximize the likelihook of
> > finding a better order than the one in the query (like sorting by
> > ndistinct), but I'd suggest:
> >
> > - Always considering the sort order verbatim as given in the SQL (ie:
> > what the user requests)
> > - Relying on cost_sort to distinguish among them, and pick a winner, and
> > - When in doubt (if cost_sort doesn't produce a clear winner), use the
> > order given by the user
> >
>
> I don't see why not to generate all possible orderings (keeping only the
> cheapest path for each pathkey variant) and let the optimizer to sort it
> out.

I'm assuming costing the full N! possible orderings would be
prohibitively expensive.

> If the user specified an explicit ORDER BY, we'll slap an extra
> Sort by at the end, increasing the cost.

I think you misunderstood me. I'm saying the exact opposite.

When I mention handicap, I mean to give the explicitly requested group
by order a *reduced* cost, to give it an advantage over the
heuristics.

This is to try to attack the problem you mentioned where users already
accounting for operator costs in their queries would get overridden by
the planner, perhaps in detriment of overall execution time.

In essence:

- If the user requested that order, we assume it "somewhat
subjectively better" (by giving it a slightly reduced cost).
- If there is an explicit order by clause that differs from a
considered path, the required sort node will already penalize it
appropriately, nothing needs to be done in relation to sort costs.
- All other things being equal, cost_sort will make the decision. If a
plan beats the user-provided order in spite of the handicap, it will
be used. So it's still optimizing clear cases.

>
> > Comparison cost can be approximated probabilistically as:
> >
> > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> >
> > Where cost_op_n is the cost of the comparison function for column N,
> > and ndistinct(col_1_to_n) is an estimation of the number of distinct
> > values for columns prior to N in the sort order.
> >
> > You can approximate ndistinct as the product of ndistinct of previous
> > columns, or use extended statistics when available.
> >
>
> Sure. The trouble is this also assumes uniform distribution, which is
> not always the case.

Well, (1.0 / ndistinct) = p(left == right).

If you can compute a better p(left == right) with an MCV, you can get
a better estimate. If you have an MCV. But that'd be a bit expensive,
and I'm not sure it's all that relevant.

To what degree does improving this produce better plans? As long as
average expected cost is relatively well estimated (as in, one sort
order relative to another sort order), it won't affect the decision.

But if needed, the MCV can be used for this.

So, in essence, you need to account for:

- restrictions on that column that constrain the domain
- distribution skew (MCV, nulls)
- ndistinct

To compute p(left == right).

Maybe something as simple as the following?

p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)

> > I think that should give a good enough approximation of
> > ndistinct-enriched sort costs that considers operator cost
> > appropriately. That operator cost is basically an average and can be
> > used as a constant, so it's just a matter of passing the right
> > comparison_cost to cost_sort.
> >
> > Even if ndistinct estimates are off, the fact that operator cost is
> > involved should be a good enough tool for the user should the
> > planner pick the wrong path - it's only a matter of boosting operator
> > cost to let the planner know that sorting that way is expensive.
> >
>
> There are far to many "should" in these two paragraphs.

Aren't all planner decisions riddled with "should"s?

Anyway, my point is that, since the user can control operator cost,
and operator cost is involved in the computation, and the ordering
given by the user is explicitly tested for and given a slight
advantage, it should (will?) be easy for the user to overcome any poor
planner decisions by giving the planner the relevant information.

> > Priorization of the user-provided order can be as simple as giving
> > that comparison_cost a small handicap.
>
> I see no point in doing that, and I don't recall a single place in the
> planner where we do that. If the user specified ORDER BY, we'll slap an
> explicit Sort on top when needed (which acts as the handicap, but in a
> clear way). Otherwise we don't do such things - it'd be just plain
> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
> types, ndistinct etc. but unexpectedly different costs). Also, what
> would be a good value for the handicap?

As I said, I'm not talking about explicit order by, but about the case
where the user knows which group by order is the most efficient
somehow.

I'm proposing the planner shouldn't override the user without clear
evidence that the other plan is actually better, above the noise
better. That's what I mean by handicap.


Re: POC: GROUP BY optimization

От
Claudio Freire
Дата:
On Wed, Jun 6, 2018 at 7:18 PM Claudio Freire <klaussfreire@gmail.com> wrote:
> > > Comparison cost can be approximated probabilistically as:
> > >
> > > cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> > >
> > > Where cost_op_n is the cost of the comparison function for column N,
> > > and ndistinct(col_1_to_n) is an estimation of the number of distinct
> > > values for columns prior to N in the sort order.
> > >
> > > You can approximate ndistinct as the product of ndistinct of previous
> > > columns, or use extended statistics when available.
> > >
> >
> > Sure. The trouble is this also assumes uniform distribution, which is
> > not always the case.
>
> Well, (1.0 / ndistinct) = p(left == right).
>
> If you can compute a better p(left == right) with an MCV, you can get
> a better estimate. If you have an MCV. But that'd be a bit expensive,
> and I'm not sure it's all that relevant.
>
> To what degree does improving this produce better plans? As long as
> average expected cost is relatively well estimated (as in, one sort
> order relative to another sort order), it won't affect the decision.
>
> But if needed, the MCV can be used for this.
>
> So, in essence, you need to account for:
>
> - restrictions on that column that constrain the domain
> - distribution skew (MCV, nulls)
> - ndistinct
>
> To compute p(left == right).
>
> Maybe something as simple as the following?
>
> p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
> p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)

Well, that came out with a slew of math errors, but the idea is clear:
compute p(left == right) given the available statistics and
constrained by any applicable restrictions.


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/07/2018 12:18 AM, Claudio Freire wrote:
> On Wed, Jun 6, 2018 at 6:57 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>> On 06/06/2018 11:22 PM, Claudio Freire wrote:
>>> On Wed, Jun 6, 2018 at 5:43 PM Tomas Vondra
>>> As such, estimating sort performance correctly in the various plan
>>> variants being considered seems to be a very central aspect of it.
>>>
>>> This means it's pretty much time to consider the effect of operator
>>> cost *and* distinct values in the cost computation.
>>>
>>
>> Yes, until now that was not really needed because the optimizer does not
>> consider reordering of the pathkeys - it simply grabs either GROUP BY or
>> ORDER BY pathkeys and runs with that.
>>
>> So the costing was fairly trivial, we simply do something like
>>
>>     comparison_cost = 2.0 * cpu_operator_cost;
>>
>>     sort_cost = comparison_cost * tuples * LOG2(tuples);
>>
>> which essentially ignores that there might be multiple columns, or that
>> the columns may have sort operator with different costs.
>>
>> The question is how reliable the heuristics can be. The current patch
>> uses just plain ndistinct, but that seems rather unreliable but I don't
>> have a clear idea how to improve that - we may have MCV for the columns
>> and perhaps some extended statistics, but I'm not sure how far we can
>> run with that.
>>
>> Essentially what we need to estimate the number of comparisons for
>> each column, to compute better comparison_cost.
> 
> This could be approached by adjusting statistics by any relevant 
> restrictions applicable to the columns being grouped, as is done for 
> selectivity estimations.
> 

I don't follow how is this related to restrictions? Can you elaborate?

> I'm not sure how far that would get us, though. It would be rather
> easy to lose sight of those restrictions if there are complex
> operations involved.
> 
>>> Assuming cost_sort does its thing, it's just a matter of building the
>>> desired variants and picking the one with the smallest cost_sort. One
>>> can use heuristics to build a subset of all possible column orderings
>>> with a guiding principle that tries to maximize the likelihook of
>>> finding a better order than the one in the query (like sorting by
>>> ndistinct), but I'd suggest:
>>>
>>> - Always considering the sort order verbatim as given in the SQL (ie:
>>> what the user requests)
>>> - Relying on cost_sort to distinguish among them, and pick a winner, and
>>> - When in doubt (if cost_sort doesn't produce a clear winner), use the
>>> order given by the user
>>>
>>
>> I don't see why not to generate all possible orderings (keeping only the
>> cheapest path for each pathkey variant) and let the optimizer to sort it
>> out.
> 
> I'm assuming costing the full N! possible orderings would be
> prohibitively expensive.
> 

That's possible, yes - particularly for large N values. Perhaps there's
a simpler algorithm to compute a sufficient approximation. In a greedy
way, starting from some likely-good combination of columns, or so. I
haven't thought about this part very much ...

>> If the user specified an explicit ORDER BY, we'll slap an extra
>> Sort by at the end, increasing the cost.
> 
> I think you misunderstood me. I'm saying the exact opposite.
> 
> When I mention handicap, I mean to give the explicitly requested group
> by order a *reduced* cost, to give it an advantage over the
> heuristics.
> 

I did understand you. I'm saying that if the ordering does not match the
one requested by the user (in ORDER BY), we end up adding an explicit
Sort node on top of it, which increases the cost of all the other paths,
acting as a disadvantage.

But now I realize this only works for when there is an ORDER BY clause,
not for GROUP BY (in which case we don't add any Sort node).

> This is to try to attack the problem you mentioned where users already
> accounting for operator costs in their queries would get overridden by
> the planner, perhaps in detriment of overall execution time.
> 
> In essence:
> 
> - If the user requested that order, we assume it "somewhat
> subjectively better" (by giving it a slightly reduced cost).
> - If there is an explicit order by clause that differs from a
> considered path, the required sort node will already penalize it
> appropriately, nothing needs to be done in relation to sort costs.
> - All other things being equal, cost_sort will make the decision. If a
> plan beats the user-provided order in spite of the handicap, it will
> be used. So it's still optimizing clear cases.
> 

OK. I still don't believe this is a good approach, because we don't know
how good our estimate of the costs is, so I have no idea how good large
the handicap needs to be.

>>
>>> Comparison cost can be approximated probabilistically as:
>>>
>>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
>>>
>>> Where cost_op_n is the cost of the comparison function for column N,
>>> and ndistinct(col_1_to_n) is an estimation of the number of distinct
>>> values for columns prior to N in the sort order.
>>>
>>> You can approximate ndistinct as the product of ndistinct of previous
>>> columns, or use extended statistics when available.
>>>
>>
>> Sure. The trouble is this also assumes uniform distribution, which is
>> not always the case.
> 
> Well, (1.0 / ndistinct) = p(left == right).
> 
> If you can compute a better p(left == right) with an MCV, you can get
> a better estimate. If you have an MCV. But that'd be a bit expensive,
> and I'm not sure it's all that relevant.
> 
> To what degree does improving this produce better plans? As long as
> average expected cost is relatively well estimated (as in, one sort
> order relative to another sort order), it won't affect the decision.
> 

I'd bet I can construct corner-case examples with vastly different
behavior depending on column data distributions, so it's not entirely
insignificant. How important is it in practice I don't know.

> But if needed, the MCV can be used for this.
> 
> So, in essence, you need to account for:
> 
> - restrictions on that column that constrain the domain
> - distribution skew (MCV, nulls)
> - ndistinct
> 
> To compute p(left == right).
> 
> Maybe something as simple as the following?
> 
> p_special = sum(mcv_i * mcv_i) + null_frac * null_frac
> p_other = (1 - p_special) * (1 - p_special) / ndistinct(restr)
> 
>>> I think that should give a good enough approximation of
>>> ndistinct-enriched sort costs that considers operator cost
>>> appropriately. That operator cost is basically an average and can be
>>> used as a constant, so it's just a matter of passing the right
>>> comparison_cost to cost_sort.
>>>
>>> Even if ndistinct estimates are off, the fact that operator cost is
>>> involved should be a good enough tool for the user should the
>>> planner pick the wrong path - it's only a matter of boosting operator
>>> cost to let the planner know that sorting that way is expensive.
>>>
>>
>> There are far to many "should" in these two paragraphs.
> 
> Aren't all planner decisions riddled with "should"s?
> 
> Anyway, my point is that, since the user can control operator cost,
> and operator cost is involved in the computation, and the ordering
> given by the user is explicitly tested for and given a slight
> advantage, it should (will?) be easy for the user to overcome any poor
> planner decisions by giving the planner the relevant information.
> 

I really don't want to force users to tweak operator cost in order to
tune queries. It's about an order of magnitude less convenient than a
GUC, for various reasons.

>>> Priorization of the user-provided order can be as simple as giving
>>> that comparison_cost a small handicap.
>>
>> I see no point in doing that, and I don't recall a single place in the
>> planner where we do that. If the user specified ORDER BY, we'll slap an
>> explicit Sort on top when needed (which acts as the handicap, but in a
>> clear way). Otherwise we don't do such things - it'd be just plain
>> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
>> types, ndistinct etc. but unexpectedly different costs). Also, what
>> would be a good value for the handicap?
> 
> As I said, I'm not talking about explicit order by, but about the case
> where the user knows which group by order is the most efficient
> somehow.
> 
> I'm proposing the planner shouldn't override the user without clear
> evidence that the other plan is actually better, above the noise
> better. That's what I mean by handicap.
> 

OK, gotcha. But I'm really unsure how large the handicap should be.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Claudio Freire
Дата:
On Wed, Jun 6, 2018 at 8:06 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> >>> Comparison cost can be approximated probabilistically as:
> >>>
> >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n)))
> >>>
> >>> Where cost_op_n is the cost of the comparison function for column N,
> >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct
> >>> values for columns prior to N in the sort order.
> >>>
> >>> You can approximate ndistinct as the product of ndistinct of previous
> >>> columns, or use extended statistics when available.
> >>>
> >>
> >> Sure. The trouble is this also assumes uniform distribution, which is
> >> not always the case.
> >
> > Well, (1.0 / ndistinct) = p(left == right).
> >
> > If you can compute a better p(left == right) with an MCV, you can get
> > a better estimate. If you have an MCV. But that'd be a bit expensive,
> > and I'm not sure it's all that relevant.
> >
> > To what degree does improving this produce better plans? As long as
> > average expected cost is relatively well estimated (as in, one sort
> > order relative to another sort order), it won't affect the decision.
> >
>
> I'd bet I can construct corner-case examples with vastly different
> behavior depending on column data distributions, so it's not entirely
> insignificant. How important is it in practice I don't know.

I guess that can only be answered by building that solution and
testing it against the corner cases.


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:

> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
> 
> Can you give an example demonstrating the benefit?


> 
> SELECT a, SUM(x) FROM
> (
>      SELECT a, b, COUNT(c) AS x FROM t1 GROUP BY a, b
>      UNION ALL
>      SELECT a, b, COUNT(c) AS x FROM t2 GROUP BY a, b
> ) foo GROUP BY a;
> 
> and indexes on (a,b) and (b,a) for both relations. The "deduplication"
> by pathkeys I suggested would mean we might keep only index (a,b) on t1
> and (b,a) on t2, which means the grouping by "a" can't leverage index
> scans on both relations. But if we keep paths for both indexes on each
> relation, we can.
yes, one of option

> Isn't "estimation of cost of comparing function/number of unique values
> in column could be not very accurate and so planner could make a wrong
> choice" is more an argument against relying on it when doing these
> optimizations?
> 
> FWIW it's one of the arguments Tom made in the incremental sort patch,
> which relies on it too when computing cost of the incremental sort. I'm
> sure it's going to be an obstacle there too. >
>> I saw 2 times difference in real-world application. Again, improving
>> sort cost estimation is a separate task.
> Sure. But we also need to ask the other question, i.e. how many people
> would be negatively affected by the optimization. And I admit I don't
> know the answer to that, the next example is entirely made up.
Hm, seems, the best way here is a improving cost_sort estimation. Will try, but 
I think that is separated patch


> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.

Will do

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
> So the costing was fairly trivial, we simply do something like
> 
>      comparison_cost = 2.0 * cpu_operator_cost;
> 
>      sort_cost = comparison_cost * tuples * LOG2(tuples);
> 
> which essentially ignores that there might be multiple columns, or that
> the columns may have sort operator with different costs.
Agree. And distribution of keys.
> 
> The question is how reliable the heuristics can be. The current patch
> uses just plain ndistinct, but that seems rather unreliable but I don't
> have a clear idea how to improve that - we may have MCV for the columns
> and perhaps some extended statistics, but I'm not sure how far we can
> run with that.
v8 already uses another algorithm.

> 
> Essentially what we need to estimate the number of comparisons for each
> column, to compute better comparison_cost.
Exactly

>> Priorization of the user-provided order can be as simple as giving
>> that comparison_cost a small handicap.
> 
> I see no point in doing that, and I don't recall a single place in the
> planner where we do that. If the user specified ORDER BY, we'll slap an
> explicit Sort on top when needed (which acts as the handicap, but in a
> clear way). Otherwise we don't do such things - it'd be just plain
> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
> types, ndistinct etc. but unexpectedly different costs). Also, what
> would be a good value for the handicap?

Again agree. If we have fixed order of columns (ORDER BY) then we should not try 
to reorder it. Current patch follows that if I didn't a mistake.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
>> I don't see why not to generate all possible orderings (keeping only the
>> cheapest path for each pathkey variant) and let the optimizer to sort it
>> out.
> 
> I'm assuming costing the full N! possible orderings would be
> prohibitively expensive.

That's true, but for the first step we need to improve cost_sort and only then 
try to find the best pathkeys order by optimal way.

> - If the user requested that order, we assume it "somewhat
> subjectively better" (by giving it a slightly reduced cost).
I don't think so. It's not a SQL way - DB should define the optimal way to 
execute query.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/07/2018 03:41 PM, Teodor Sigaev wrote:
 >>
>> ... snip ...
 >>
>>> Priorization of the user-provided order can be as simple as giving
>>> that comparison_cost a small handicap.
>>
>> I see no point in doing that, and I don't recall a single place in the
>> planner where we do that. If the user specified ORDER BY, we'll slap an
>> explicit Sort on top when needed (which acts as the handicap, but in a
>> clear way). Otherwise we don't do such things - it'd be just plain
>> confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
>> types, ndistinct etc. but unexpectedly different costs). Also, what
>> would be a good value for the handicap?
> 
> Again agree. If we have fixed order of columns (ORDER BY) then we should 
> not try to reorder it. Current patch follows that if I didn't a mistake.
> 

This part seems to be more a misunderstanding between me and Claudio. I 
believe Claudio was referring to the column order in a GROUP BY, not 
ORDER BY. In which case we don't add any Sort, of course.

I'm still opposed to adding arbitrary handicap to prioritize the order 
specified by user, for the reasons I explained before. We should aim to 
make the heuristics/costing reliable enough to make this unnecessary.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
>> Again agree. If we have fixed order of columns (ORDER BY) then we should not 
>> try to reorder it. Current patch follows that if I didn't a mistake.
>>
> 
> This part seems to be more a misunderstanding between me and Claudio. I believe 
> Claudio was referring to the column order in a GROUP BY, not ORDER BY. In which 
> case we don't add any Sort, of course.
I hope so

> 
> I'm still opposed to adding arbitrary handicap to prioritize the order specified 
> by user, for the reasons I explained before. We should aim to make the 
> heuristics/costing reliable enough to make this unnecessary.
+1

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
>> Yes. But again, this description is a bit short. First it works after
>> first patch and might get some preordered leading pathkeys. Second, it
>> tries to match ORDER BY clause order if there is no preordered leading
>> pathkeys from first patch (it was introduced in v7). And third, if there
>> is a tail of unmatched pathkeys on previous stages then it will reorder
>> that tail.
>>
> 
> OK, I haven't looked at v7 yet, but if I understand correctly it tries
> to maintain the ordering as much as possible? Does that actually help? I
> mean, the incremental sort patch allows the sorting to happen by pieces,
> but here we still need to sort all the data, right?
> 
> Can you give an example demonstrating the benefit?

See tst.sql. queries are marked with opt (optimization is on) and noopt.

Query 1: select count(*) from btg group by v, r;
Query 2: select count(*) from btg group by n, v, r order by n;

For both queries it's possible to reorder v and r column, n column has the 
single distinct value.

On my laptop:
Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
       2opt vs 2noopt: 5800.307 ms vs 7486.967 ms

So, what we see:
1) for query 1 optimization gives 2 times better performance, for query 2 only 
30%. if column 'n' will be unique then time for query 1 and 2 should be the 
same. We could add check for preordered pathkeys in 
get_cheapest_group_keys_order() and if estimate_num_groups(reordered pathkeys) 
is close to 1 then we could do not reordering of tail of pathkeys.

2) Planing cost is the same for all queries. So, cost_sort() doesn't take into 
account even number of columns.

> FWIW I think it would be useful to have "development GUC" that would
> allow us to enable/disable these options during development, because
> that makes experiments much easier. But then remove them before commit.
Added, v9, debug_enable_group_by_match_order_by and 
debug_enable_cheapest_group_by. I also checked compatibility with incremental 
sort patch, and all works except small merge conflict which could be resolved 
right before committing.

Next, I had a look on cost_incremental_sort() provided by incremental sort patch 
and, it's a pity, it doesn't solve our problem with the impact of the cost of 
per-column comparison function and number of its calls.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 06/07/2018 06:22 PM, Teodor Sigaev wrote:
>>> Yes. But again, this description is a bit short. First it works after
>>> first patch and might get some preordered leading pathkeys. Second, it
>>> tries to match ORDER BY clause order if there is no preordered leading
>>> pathkeys from first patch (it was introduced in v7). And third, if there
>>> is a tail of unmatched pathkeys on previous stages then it will reorder
>>> that tail.
>>>
>>
>> OK, I haven't looked at v7 yet, but if I understand correctly it tries
>> to maintain the ordering as much as possible? Does that actually help? I
>> mean, the incremental sort patch allows the sorting to happen by pieces,
>> but here we still need to sort all the data, right?
>>
>> Can you give an example demonstrating the benefit?
> 
> See tst.sql. queries are marked with opt (optimization is on) and noopt.
> 
> Query 1: select count(*) from btg group by v, r;
> Query 2: select count(*) from btg group by n, v, r order by n;
> 
> For both queries it's possible to reorder v and r column, n column has
> the single distinct value.
> 
> On my laptop:
> Query 1opt vs 1noopt: 3177.500 ms vs 6604.493 ms
>       2opt vs 2noopt: 5800.307 ms vs 7486.967 ms
> 
> So, what we see:
> 1) for query 1 optimization gives 2 times better performance, for query
> 2 only 30%. if column 'n' will be unique then time for query 1 and 2
> should be the same. We could add check for preordered pathkeys in
> get_cheapest_group_keys_order() and if estimate_num_groups(reordered
> pathkeys) is close to 1 then we could do not reordering of tail of
> pathkeys.
> 

OK, those are nice improvements, although the examples are somewhat
extreme (only 1 or 2 distinct values). So yes, in some cases this
reordering clearly makes a big difference, but I still think we need to
improve the heuristics to minimize regressions.

I see v9 is now calling estimate_num_groups(), so it already benefits
from extended statistics. Nice! I think the algorithm is more or less
the greedy option I proposed in one of the earlier messages - I don't
know if it's sufficient or not, but the only other idea I have is
essentially an exhaustive search through all possible permutations.

So that's a nice improvement, although I think we should also consider
non-uniform distributions (using the per-column MCV lists).

> 2) Planing cost is the same for all queries. So, cost_sort() doesn't
> take into account even number of columns.
> 

Yeah. As I wrote before, I think we'll need to come up with a model for
comparison_cost depending on the column order, which should make costs
for those paths different.

>> FWIW I think it would be useful to have "development GUC" that would
>> allow us to enable/disable these options during development, because
>> that makes experiments much easier. But then remove them before commit.
> Added, v9, debug_enable_group_by_match_order_by and
> debug_enable_cheapest_group_by. I also checked compatibility with
> incremental sort patch, and all works except small merge conflict which
> could be resolved right before committing.
> 

OK. I think we should also have a debug GUC for the first patch (i.e.
one that would allow reordering group_pathkeys to match the input path).

Regarding the two GUCs introduced in v9, I'm not sure I 100% understand
what they are doing. Let me try explaining what I think they do:

1) debug_cheapest_group_by - Reorder GROUP BY clauses to using ndistinct
stats for columns, placing columns with more distinct values first (so
that we don't need to compare the following columns as often).

2) debug_group_by_match_order_by - Try to reorder the GROUP BY clauses
to match the ORDER BY, so that the group aggregate produces results in
the desired output (and we don't need an explicit Sort).

FWIW I doubt about the debug_group_by_match_order_by optimization. The
trouble is that it's only based on comparing the pathkeys lists, and
entirely ignores that the two result sets may operate on very different
numbers of rows.

Consider for example "SELECT * FROM t GROUP BY a,b,c,d ORDER BY c,d"
where table "t" has 1B rows, but there are only ~10k rows in the result
(which is fairly common in fact tables in star-schema DBs). IIUC the
optimization will ensure "c,d" is placed first in the GROUP BY, and then
we reorder "a,b" using ndistinct. But this may be much less efficient
than simply reordering (a,b,c,d) per ndistinct and then adding explicit
Sort on top of that (because on 10k rows that's going to be cheap).

So I think this somewhat contradicts the order-by-ndistinct optimization
and may easily undo it's benefits.

The other "issue" I have with get_cheapest_group_keys_order() is how it
interacts with group_keys_reorder_by_pathkeys(). I mean, we first call

  n_preordered = group_keys_reorder_by_pathkeys(path->pathkeys, ...);

so the reordering tries to maintain ordering of the input path. Then if
(n_preordered == 0) we do this:

  group_keys_reorder_by_pathkeys(root->sort_pathkeys, ...);

Doesn't that mean that if we happen to have input path with partially
matching ordering (so still requiring explicit Sort before grouping) we
may end up ignoring the ORDER BY entirely? Instead of using Sort that
would match the ORDER BY? I might have misunderstood this, though.

I'm not sure why the first group_keys_reorder_by_pathkeys() call does
not simply return 0 when the input path ordering is not sufficient for
the grouping? Is there some reasoning behind that (e.g. that sorting
partially sorted is faster)?

Overall I think this patch introduces four different optimizations that
are indeed very interesting:

1) consider additional sorted paths for grouping

2) reorder GROUP BY clauses per ndistinct to minimize comparisons

3) when adding Sort for grouping, maintain partial input ordering

4) when adding Sort for grouping, try producing the right output order
   (if the ORDER BY was specified)

But at this point it's kinda mashed together in ad-hoc manner, using
very simple heuristics / assumptions. We'll need to figure out how to
combine those optimizations.

BTW I get compiler warnings that n_preordered_groups may be used
uninitialized in add_paths_to_grouping_rel, because it's only set in an
if/else branch, and then used further down.

> Next, I had a look on cost_incremental_sort() provided by
> incremental sort patch and, it's a pity, it doesn't solve our problem
> with the impact of the cost of per-column comparison function and
> number of its calls.

I currently doesn't, but that might be more an issue in the incremental
sort patch and we may need to fix that. Clearly the costing in general
in that patch needs more work, and I do recall Tom pointing out that the
current heuristics (estimating number of sort groups using ndistincts)
seems rather weak.

It may not be exactly the same problem, but it seems kinda similar to
what this patch does. I have a hunch that those patches will end up
inventing something fairly similar.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/09/2018 08:09 PM, Tomas Vondra wrote:
> 
> /snip/
> 
> 4) when adding Sort for grouping, try producing the right output order
>    (if the ORDER BY was specified)
> 

BTW I've just realized we already do something similar in master. If you
run a query like this:

  SELECT a, b, count(*) FROM t GROUP BY b, a ORDER BY a;

we will actually plan it like this:

          QUERY PLAN
  ---------------------------
   GroupAggregate
     Group Key: a, b
     ->  Sort
           Sort Key: a, b
           ->  Seq Scan on t
  (5 rows)

I.e. we already do reorder the group clauses to match ORDER BY, to only
require a single sort. This happens in preprocess_groupclause(), which
also explains the reasoning behind that.

I wonder if some of the new code reordering group pathkeys could/should
be moved here (not sure, maybe it's too early for those decisions). In
any case, it might be appropriate to update some of the comments before
preprocess_groupclause() which claim we don't do certain things added by
the proposed patches.

This probably also somewhat refutes my claim that the order of grouping
keys is currently fully determined by users (and so they may pick the
most efficient order), while the reorder-by-ndistinct patch would make
that impossible. Apparently when there's ORDER BY, we already mess with
the order of group clauses - there are ways to get around it (subquery
with OFFSET 0) but it's much less clear.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
> I see v9 is now calling estimate_num_groups(), so it already benefits
> from extended statistics. Nice! I think the algorithm is more or less
> the greedy option I proposed in one of the earlier messages - I don't
> know if it's sufficient or not, but the only other idea I have is
> essentially an exhaustive search through all possible permutations.

All possible permutation could be slow with extremely large group by clause, so 
we will need some some stop limit - like join_collapse_limit. I don't like this 
idea.

> So that's a nice improvement, although I think we should also consider
> non-uniform distributions (using the per-column MCV lists).
Could you clarify how to do that?

> 
>> 2) Planing cost is the same for all queries. So, cost_sort() doesn't
>> take into account even number of columns.
>>
> 
> Yeah. As I wrote before, I think we'll need to come up with a model for
> comparison_cost depending on the column order, which should make costs
> for those paths different.
Yeah. But I still think it should be separated patch which will improve cost 
estimation and plan choosing in other optimizer part too.

> OK. I think we should also have a debug GUC for the first patch (i.e.
> one that would allow reordering group_pathkeys to match the input path).
Added to 0002-opt_group_by_index_and_order-v10.patch . 
debug_group_by_reorder_by_pathkeys.

> Regarding the two GUCs introduced in v9, I'm not sure I 100% understand
> what they are doing. Let me try explaining what I think they do:
> 
> 1) debug_cheapest_group_by - Reorder GROUP BY clauses to using ndistinct
> stats for columns, placing columns with more distinct values first (so
> that we don't need to compare the following columns as often).
yes

> 
> 2) debug_group_by_match_order_by - Try to reorder the GROUP BY clauses
> to match the ORDER BY, so that the group aggregate produces results in
> the desired output (and we don't need an explicit Sort).
yes

> FWIW I doubt about the debug_group_by_match_order_by optimization. The
> trouble is that it's only based on comparing the pathkeys lists, and
> entirely ignores that the two result sets may operate on very different
> numbers of rows.
> 
> Consider for example "SELECT * FROM t GROUP BY a,b,c,d ORDER BY c,d"
> where table "t" has 1B rows, but there are only ~10k rows in the result
> (which is fairly common in fact tables in star-schema DBs). IIUC the
> optimization will ensure "c,d" is placed first in the GROUP BY, and then
> we reorder "a,b" using ndistinct. But this may be much less efficient
> than simply reordering (a,b,c,d) per ndistinct and then adding explicit
> Sort on top of that (because on 10k rows that's going to be cheap).
> 
As you pointed in next email, this optimization is already implemented in 
preprocess_groupclause(). And correct resolving of this issue could be done only 
after implementing of correct cost_sort() - which will pay attention to 
comparison cost, number of distinct value in columns and how often it's needed 
to compare second and next columns.

> 
> The other "issue" I have with get_cheapest_group_keys_order() is how it
> interacts with group_keys_reorder_by_pathkeys(). I mean, we first call
> 
>    n_preordered = group_keys_reorder_by_pathkeys(path->pathkeys, ...);
> 
> so the reordering tries to maintain ordering of the input path. Then if
> (n_preordered == 0) we do this:
> 
>    group_keys_reorder_by_pathkeys(root->sort_pathkeys, ...);
> 
> Doesn't that mean that if we happen to have input path with partially
> matching ordering (so still requiring explicit Sort before grouping) we
> may end up ignoring the ORDER BY entirely? Instead of using Sort that
> would match the ORDER BY? I might have misunderstood this, though.
Hm. I believe ordering of input of GROUP clause is always more expensive - just 
because output of GROUP BY clause should have less number of rows than  its 
input, what means more cheap ordering. So here it uses ORDER BY only if we don't 
have a group pathkey(s) matched by path pathkeys.

> 
> I'm not sure why the first group_keys_reorder_by_pathkeys() call does
> not simply return 0 when the input path ordering is not sufficient for
> the grouping? Is there some reasoning behind that (e.g. that sorting
> partially sorted is faster)?
Hm, what do you mean - sufficient? group_keys_reorder_by_pathkeys() always tries 
to order GROUP BY columns by preexisting pathkeys. But of course, incremental 
sort will add more benefits here. I'd like to hope that incremental sort patch 
is close to commit.

> 
> Overall I think this patch introduces four different optimizations that
> are indeed very interesting:
> 
> 1) consider additional sorted paths for grouping
Agree
> 
> 2) reorder GROUP BY clauses per ndistinct to minimize comparisons
Agree (with a help of estimate_num_groups() and patch tries to maximize a number 
of groups on each step)

> 3) when adding Sort for grouping, maintain partial input ordering
Agree - and incremental sort patch will helpful here.

> 4) when adding Sort for grouping, try producing the right output order
>     (if the ORDER BY was specified)
Yes.

> 
> But at this point it's kinda mashed together in ad-hoc manner, using
> very simple heuristics / assumptions. We'll need to figure out how to
> combine those optimizations.
The used heuristics here:
1) if there is a some order  - let's used it
2) if there is a required order - let's match that order to prevent extra sort node.

Incremental sort patch will improve cases where there is partial match of order.

> 
> BTW I get compiler warnings that n_preordered_groups may be used
> uninitialized in add_paths_to_grouping_rel, because it's only set in an
> if/else branch, and then used further down.
Fixed, but I believe that compiler is not smart enough here.



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
> I.e. we already do reorder the group clauses to match ORDER BY, to only
> require a single sort. This happens in preprocess_groupclause(), which
> also explains the reasoning behind that.
Huh. I missed that. That means group_keys_reorder_by_pathkeys() call inside 
get_cheapest_group_keys_order() duplicates work of preprocess_groupclause().
  And so it's possible to replace it to simple counting number of the same 
pathkeys in beginning of lists. Or remove most part of preprocess_groupclause() 
- but it seems to me we should use first option, preprocess_groupclause() is 
simpler, it doesn't operate with pathkeys only with  SortGroupClause which is 
simpler.

BTW, incremental sort path provides pathkeys_common(), exactly what we need.

> I wonder if some of the new code reordering group pathkeys could/should
> be moved here (not sure, maybe it's too early for those decisions). In
> any case, it might be appropriate to update some of the comments before
> preprocess_groupclause() which claim we don't do certain things added by
> the proposed patches.

preprocess_groupclause() is too early step to use something like 
group_keys_reorder_by_pathkeys() because pathkeys is not known here yet.


> This probably also somewhat refutes my claim that the order of grouping
> keys is currently fully determined by users (and so they may pick the
> most efficient order), while the reorder-by-ndistinct patch would make
> that impossible. Apparently when there's ORDER BY, we already mess with
> the order of group clauses - there are ways to get around it (subquery
> with OFFSET 0) but it's much less clear.

I like a moment when objections go away :)

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 06/13/2018 06:56 PM, Teodor Sigaev wrote:
>> I.e. we already do reorder the group clauses to match ORDER BY, to only
>> require a single sort. This happens in preprocess_groupclause(), which
>> also explains the reasoning behind that.
> Huh. I missed that. That means group_keys_reorder_by_pathkeys() call
> inside get_cheapest_group_keys_order() duplicates work of
> preprocess_groupclause().
>  And so it's possible to replace it to simple counting number of the
> same pathkeys in beginning of lists. Or remove most part of
> preprocess_groupclause() - but it seems to me we should use first
> option, preprocess_groupclause() is simpler, it doesn't operate with
> pathkeys only with  SortGroupClause which is simpler.
> 
> BTW, incremental sort path provides pathkeys_common(), exactly what we
> need.
> 
>> I wonder if some of the new code reordering group pathkeys could/should
>> be moved here (not sure, maybe it's too early for those decisions). In
>> any case, it might be appropriate to update some of the comments before
>> preprocess_groupclause() which claim we don't do certain things added by
>> the proposed patches.
> 
> preprocess_groupclause() is too early step to use something like
> group_keys_reorder_by_pathkeys() because pathkeys is not known here yet.
> 
> 
>> This probably also somewhat refutes my claim that the order of
>> grouping keys is currently fully determined by users (and so they
>> may pick the most efficient order), while the reorder-by-ndistinct
>> patch would make that impossible. Apparently when there's ORDER BY,
>> we already mess with the order of group clauses - there are ways to
>> get around it (subquery with OFFSET 0) but it's much less clear.
> 
> I like a moment when objections go away :)
> 

I'm afraid that's a misunderstanding - my objections did not really go
away. I'm just saying my claim that we're not messing with order of
grouping keys is not entirely accurate, because we do that in one
particular case.

I still think we need to be careful when introducing new optimizations
in this area - reordering the grouping keys by ndistinct, ORDER BY or
whatever. In particular I don't think we should commit these patches
that may quite easily cause regressions, and then hope some hypothetical
future patch fixes the costing. Those objections still stand.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
> I'm afraid that's a misunderstanding - my objections did not really go
> away. I'm just saying my claim that we're not messing with order of
> grouping keys is not entirely accurate, because we do that in one
> particular case.
> 
> I still think we need to be careful when introducing new optimizations
> in this area - reordering the grouping keys by ndistinct, ORDER BY or
> whatever. In particular I don't think we should commit these patches
> that may quite easily cause regressions, and then hope some hypothetical
> future patch fixes the costing. Those objections still stand.

Pls, have a look at https://commitfest.postgresql.org/18/1706/
I tried to attack the cost_sort() issues and hope on that basis we can solve 
problems with 0002 patch and improve incremental sort patch.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 06/28/2018 06:52 PM, Teodor Sigaev wrote:
>> I'm afraid that's a misunderstanding - my objections did not really go
>> away. I'm just saying my claim that we're not messing with order of
>> grouping keys is not entirely accurate, because we do that in one
>> particular case.
>>
>> I still think we need to be careful when introducing new optimizations
>> in this area - reordering the grouping keys by ndistinct, ORDER BY or
>> whatever. In particular I don't think we should commit these patches
>> that may quite easily cause regressions, and then hope some hypothetical
>> future patch fixes the costing. Those objections still stand.
> 
> Pls, have a look at https://commitfest.postgresql.org/18/1706/
> I tried to attack the cost_sort() issues and hope on that basis we can 
> solve problems with 0002 patch and improve incremental sort patch.
> 

OK, will do. Thanks for working on this!

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Teodor Sigaev
Дата:
>> I tried to attack the cost_sort() issues and hope on that basis we can solve 
>> problems with 0002 patch and improve incremental sort patch.
>>
> 
> OK, will do. Thanks for working on this!

I hope, now we have a better cost_sort(). The obvious way is a try all 
combination of pathkeys in get_cheapest_group_keys_order() and choose cheapest 
one by cost_sort(). But it requires N! operations and potentially could be very 
expensive in case of large number of pathkeys and doesn't solve the issue with 
user-knows-what-he-does pathkeys. We could suggest an order of pathkeys as patch 
suggests now and if cost_sort() estimates cost is less than 80% (arbitrary 
chosen) cost of user-suggested pathkeys then it use our else user pathkeys.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 06/29/2018 04:51 PM, Teodor Sigaev wrote:
> 
>>> I tried to attack the cost_sort() issues and hope on that basis we 
>>> can solve problems with 0002 patch and improve incremental sort patch.
>>>
>>
>> OK, will do. Thanks for working on this!
> 
> I hope, now we have a better cost_sort(). The obvious way is a try all 
> combination of pathkeys in get_cheapest_group_keys_order() and choose 
> cheapest one by cost_sort().

> But it requires N! operations and potentially could be very
> expensive in case of large number of pathkeys and doesn't solve the
> issue with user-knows-what-he-does pathkeys.

Not sure. There are N! combinations, but this seems like a good 
candidate for backtracking [1]. You don't have to enumerate and evaluate 
all N! combinations, just construct one and then abandon whole classes 
of combinations as soon as they get more expensive than the currently 
best one. That's thanks to additive nature of the comparison costing, 
because appending a column to the sort key can only make it more 
expensive. My guess is this will make this a non-issue.

[1] https://en.wikipedia.org/wiki/Backtracking

>
> We could suggest an order of pathkeys as patch suggests now and if 
> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost
> of user-suggested pathkeys then it use our else user pathkeys.
> 

I really despise such arbitrary thresholds. I'd much rather use a more 
reliable heuristics by default, even if it gets it wrong in some cases 
(which it will, but that's natural).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: GROUP BY optimization

От
Gavin Flower
Дата:
On 30/06/18 03:03, Tomas Vondra wrote:
> On 06/29/2018 04:51 PM, Teodor Sigaev wrote:
>>
>>>> I tried to attack the cost_sort() issues and hope on that basis we 
>>>> can solve problems with 0002 patch and improve incremental sort patch.
>>>>
>>>
>>> OK, will do. Thanks for working on this!
>>
>> I hope, now we have a better cost_sort(). The obvious way is a try 
>> all combination of pathkeys in get_cheapest_group_keys_order() and 
>> choose cheapest one by cost_sort().
>
>> But it requires N! operations and potentially could be very
>> expensive in case of large number of pathkeys and doesn't solve the
>> issue with user-knows-what-he-does pathkeys.
>
> Not sure. There are N! combinations, but this seems like a good 
> candidate for backtracking [1]. You don't have to enumerate and 
> evaluate all N! combinations, just construct one and then abandon 
> whole classes of combinations as soon as they get more expensive than 
> the currently best one. That's thanks to additive nature of the 
> comparison costing, because appending a column to the sort key can 
> only make it more expensive. My guess is this will make this a non-issue.
>
> [1] https://en.wikipedia.org/wiki/Backtracking
>
>>
>> We could suggest an order of pathkeys as patch suggests now and if 
>> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost
>> of user-suggested pathkeys then it use our else user pathkeys.
>>
>
> I really despise such arbitrary thresholds. I'd much rather use a more 
> reliable heuristics by default, even if it gets it wrong in some cases 
> (which it will, but that's natural).
>
> regards
>
Additionally put an upper limit threshold on the number of combinations 
to check, fairly large by default?

If first threshold is exceeded, could consider checking out a few more 
selected at random from paths not yet checked, to avoid any bias caused 
by stopping a systematic search.  This might prove important when N! is 
fairly large.


Cheers,
Gavin



Re: POC: GROUP BY optimization

От
Michael Paquier
Дата:
On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote:
> Additionally put an upper limit threshold on the number of combinations to
> check, fairly large by default?
>
> If first threshold is exceeded, could consider checking out a few more
> selected at random from paths not yet checked, to avoid any bias caused by
> stopping a systematic search.  This might prove important when N! is fairly
> large.

Please note that the latest patch available does not apply, so this has
been moved to next CF 2018-11, waiting for input from its author.
--
Michael

Вложения

Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Tue, Oct 2, 2018 at 4:16 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote:
> > Additionally put an upper limit threshold on the number of combinations to
> > check, fairly large by default?
> >
> > If first threshold is exceeded, could consider checking out a few more
> > selected at random from paths not yet checked, to avoid any bias caused by
> > stopping a systematic search.  This might prove important when N! is fairly
> > large.
>
> Please note that the latest patch available does not apply, so this has
> been moved to next CF 2018-11, waiting for input from its author.

Unfortunately, patch still has some conflicts, could you please rebase it?


Re: POC: GROUP BY optimization

От
Andres Freund
Дата:
On 2018-11-29 17:56:53 +0100, Dmitry Dolgov wrote:
> > On Tue, Oct 2, 2018 at 4:16 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Sat, Jun 30, 2018 at 09:19:13AM +1200, Gavin Flower wrote:
> > > Additionally put an upper limit threshold on the number of combinations to
> > > check, fairly large by default?
> > >
> > > If first threshold is exceeded, could consider checking out a few more
> > > selected at random from paths not yet checked, to avoid any bias caused by
> > > stopping a systematic search.  This might prove important when N! is fairly
> > > large.
> >
> > Please note that the latest patch available does not apply, so this has
> > been moved to next CF 2018-11, waiting for input from its author.
> 
> Unfortunately, patch still has some conflicts, could you please rebase it?

As nothing has happened since, I'm marking this as returned with
feedback.

Greetings,

Andres Freund


Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres@anarazel.de> wrote:
>
> As nothing has happened since, I'm marking this as returned with
> feedback.

This patch was on my radar for some time in the past and we've seen use cases
where it could be pretty useful (probably even without the incremental sort
patch). I would like to make some progress here and see if it's possible to
continue it's development. I've attached the rebased version with a small
changes, e.g. I've created a separate patch with group by reordering tests to
make it easy to see what changes were introduced, and after some experiments
removed part that seems to duplicate "group by" reordering to follow "order
by". Also looks like it's possible to make these patches independent by having
a base patch with the isolated group_keys_reorder_by_pathkeys (they're
connected via n_preordered), but I haven't done this yet.

I went through the thread to summarize the objections, that were mentioned so
far. Most of them are related to the third patch in the series, where
reordering based on "ndistincs" is implemented, and are about cost_sort (all
the possible problems that could happen without proper cost estimation due to
non uniform distribution, different comparison costs and so on) and figuring
out how to limit number of possible combinations of pathkeys to compare. I
haven't looked at the proposed backtracking approach, but taking into account
that suggested patch for cost_sort [1] is RWF, I wonder what would be the best
strategy to proceed?

[1]: https://commitfest.postgresql.org/21/1706/

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Thu, Apr 04, 2019 at 05:11:09PM +0200, Dmitry Dolgov wrote:
>> On Thu, Jan 31, 2019 at 12:24 PM Andres Freund <andres@anarazel.de> wrote:
>>
>> As nothing has happened since, I'm marking this as returned with
>> feedback.
>
>This patch was on my radar for some time in the past and we've seen use cases
>where it could be pretty useful (probably even without the incremental sort
>patch). I would like to make some progress here and see if it's possible to
>continue it's development. I've attached the rebased version with a small
>changes, e.g. I've created a separate patch with group by reordering tests to
>make it easy to see what changes were introduced, and after some experiments
>removed part that seems to duplicate "group by" reordering to follow "order
>by". Also looks like it's possible to make these patches independent by having
>a base patch with the isolated group_keys_reorder_by_pathkeys (they're
>connected via n_preordered), but I haven't done this yet.
>
>I went through the thread to summarize the objections, that were mentioned so
>far. Most of them are related to the third patch in the series, where
>reordering based on "ndistincs" is implemented, and are about cost_sort (all
>the possible problems that could happen without proper cost estimation due to
>non uniform distribution, different comparison costs and so on) and figuring
>out how to limit number of possible combinations of pathkeys to compare. I
>haven't looked at the proposed backtracking approach, but taking into account
>that suggested patch for cost_sort [1] is RWF, I wonder what would be the best
>strategy to proceed?
>
>[1]: https://commitfest.postgresql.org/21/1706/

Dunno. It seems the progres on the sort-related patches was rather limited
in the PG12 cycle in general :-( There's the Incremental Sort patch, GROUP
BY optimization and then the cost_sort patch.

Not sure about the best strategy, though. One obvious option is to rely on
cost_sort patch to do all the improvements needed for the other patches,
but that assumes that patch moves reasonably fast.

So I personally would suggest to treat those patches as independent until
the very last moment, develop the costing improvements needed by each
of them, and then decide which of them are committable / in what order.

At the end of PG11 cycle I've offered my help with testing / reviewing
those patches, if there is progress. That still holds, if there are new
patch versions I'll look at them.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> So I personally would suggest to treat those patches as independent until
> the very last moment, develop the costing improvements needed by each
> of them, and then decide which of them are committable / in what order.

I had the same idea, but judging from the questions, raised in this thread,
it's quite hard to go with reordering based only on frequency of values. I
hoped that the cost_sort improvement patch would be simple enough to
incorporate it here, but of course it wasn't. Having an assumption, that the
amount of work, required for performing sorting, depends only on the number of
distinct groups and how costly it is to compare a values of this data type,
I've ended up extracting get_width_multiplier and get_func_cost parts from
cost_sort patch and including them into 0003-Reorder-by-values-distribution.
This allows to take into account situations when we compare e.g. long strings
or a custom data type with high procost for comparison (but I've used this
values directly without any adjusting coefficients yet).

> On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote:
>
> > So that's a nice improvement, although I think we should also consider
> > non-uniform distributions (using the per-column MCV lists).
>
> Could you clarify how to do that?

Since I'm not familiar with this topic, I would like to ask the same question,
how to do that and what are the advantages?

> On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I still think we need to be careful when introducing new optimizations
> in this area - reordering the grouping keys by ndistinct, ORDER BY or
> whatever. In particular I don't think we should commit these patches
> that may quite easily cause regressions, and then hope some hypothetical
> future patch fixes the costing.

I'm a bit concerned about this part of the discussion. There is an idea through
the whole thread about avoiding the situation, when a user knows which order is
better and we generate different one by mistake. From what I see right now even
if all the objections would be addressed, there is a chance that some
coefficients will be not good enough (e.g. width multiplier is based on an
average width, or it can suddenly happen that all the compared string have some
difference at the very beginning) and the chosen order will be not optimal.
Does it mean that in any case the implementation of such optimization should
provide a way to override it?

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Fri, May 03, 2019 at 10:28:21PM +0200, Dmitry Dolgov wrote:
>> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> So I personally would suggest to treat those patches as independent until
>> the very last moment, develop the costing improvements needed by each
>> of them, and then decide which of them are committable / in what order.
>
>I had the same idea, but judging from the questions, raised in this thread,
>it's quite hard to go with reordering based only on frequency of values. I
>hoped that the cost_sort improvement patch would be simple enough to
>incorporate it here, but of course it wasn't. Having an assumption, that the
>amount of work, required for performing sorting, depends only on the number of
>distinct groups and how costly it is to compare a values of this data type,
>I've ended up extracting get_width_multiplier and get_func_cost parts from
>cost_sort patch and including them into 0003-Reorder-by-values-distribution.
>This allows to take into account situations when we compare e.g. long strings
>or a custom data type with high procost for comparison (but I've used this
>values directly without any adjusting coefficients yet).
>
>> On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote:
>>
>> > So that's a nice improvement, although I think we should also consider
>> > non-uniform distributions (using the per-column MCV lists).
>>
>> Could you clarify how to do that?
>
>Since I'm not familiar with this topic, I would like to ask the same question,
>how to do that and what are the advantages?
>

I don't recall the exact details of the discussion, since most of it
happened almost a year ago, but the main concern about the original
costing approach is that it very much assumes uniform distribution.

For example if you have N tuples with M groups (which are not computed
using estimate_num_groups IIRC, and we can hardly do much better), then
the patch assumed each groups is ~N/M rows and used that for computing
the number of comparisons for a particular sequence of ORDER BY clauses.

But that may result in pretty significant errors in causes with a couple
of very large groups. But hey - we can get some of that information from
MCV lists, so maybe we can compute a safer less-optimistic estimate.

I mean, we can't compute "perfect" estimate of how many comparisons
we'll have to do, because we only have per-column MCV lits and maybe
some multi-column MCV lists (but definitely not on all combinations of
columns in the ORDER BY clause).

But what I think we could do is using largest possible group instead of
the average one. So for example when estimating number of comparisons
for columns (c1,..,cN), you could look at MCV lists for these columns
and compute

    m(i) = Max(largest group in MCV list for i-th column)

and then use Min(m(1), ..., m(k)) when estimating the number of
comparisons.

Of course, this is likely to be a worst-case estimate, and it's not
quite obvious that comparing worst-case estimates is necessarily safer
than comparing the regular ones. So I'm not sure about it.

It might be better to just compute the 'average group' in a different
way, not by arithmetic mean, but say as geometric mean from each MCV
list. Not sure.

I guess this needs some research and experimentation - constructing
cases that are likely to cause problems (non-uniform distributions,
columns with a small number of exceptionally large groups, ...) and then
showing how the costing deals with those.

FWIW I've mentioned MCV lists in the incrementalsort thread too, in
which case it was much clearer how to use them to improve the startup
cost estimate - it's enough to look at the first group in per-column MCV
lists (first in the ORDER BY sense, not by frequency).

>> On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> I still think we need to be careful when introducing new optimizations
>> in this area - reordering the grouping keys by ndistinct, ORDER BY or
>> whatever. In particular I don't think we should commit these patches
>> that may quite easily cause regressions, and then hope some hypothetical
>> future patch fixes the costing.
>
>I'm a bit concerned about this part of the discussion. There is an idea through
>the whole thread about avoiding the situation, when a user knows which order is
>better and we generate different one by mistake. From what I see right now even
>if all the objections would be addressed, there is a chance that some
>coefficients will be not good enough (e.g. width multiplier is based on an
>average width, or it can suddenly happen that all the compared string have some
>difference at the very beginning) and the chosen order will be not optimal.
>Does it mean that in any case the implementation of such optimization should
>provide a way to override it?

I don't think we have to provide an override no matter what. Otherwise
we'd have to have an override for everything, because no optimizer
decision is perfect - it's all built on estimates, after all.

But I assume we agree that optimizations based on estimates thare are
known to be unreliable are bound to be unreliable too. And optimizations
that regularly misfire for common data sets may easily do more harm than
good (and maybe should not be called optimizations at all).

For example, the first patch relied on ndistinct estimates quite a bit
and used them to compute multi-column estimates, which we already know
is rather poor for GROUP BY estimates. The v9 uses estimate_num_groups()
which works better because it leverages extended stats, when available.
That's good, but we need to see how the rest of the costing works.

So I think it very much depends on how reliable the costing can be made,
and how bad consequences a poor choice can have.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Fri, May 3, 2019 at 11:55 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I don't recall the exact details of the discussion, since most of it
> happened almost a year ago, but the main concern about the original
> costing approach is that it very much assumes uniform distribution.
>
> For example if you have N tuples with M groups (which are not computed
> using estimate_num_groups IIRC, and we can hardly do much better), then
> the patch assumed each groups is ~N/M rows and used that for computing
> the number of comparisons for a particular sequence of ORDER BY clauses.
>
> But that may result in pretty significant errors in causes with a couple
> of very large groups.

Yes, you're right, the current version of the patch assumes uniform
distribution of values between these M groups. After some thinking I've got an
idea that maybe it's better to not try to find out what would be acceptable for
both uniform and non uniform distributions, but just do not reorder at all if
there are any significant deviations from what seems to be a "best case",
namely when values distributed among groups relatively uniformly and there are
no doubts about how complicated is to compare them.

Saying that, it's possible on top of the current implementation to check:

* dependency coefficient between columns (if I understand correctly, non
  uniform distributions of values between the groups a.k.a few large groups
  should be visible from an extended statistics as a significant dependency)

* largest/smallest group in MCV doesn't differ too much from an expected
  "average" group

* average width and comparison procost are similar

If everything fits (which I hope would be most of the time) - apply reorder,
otherwise use whatever order was specified (which leaves the room for manual
reordering for relatively rare situations). Does it makes sense?

> But what I think we could do is using largest possible group instead of
> the average one. So for example when estimating number of comparisons
> for columns (c1,..,cN), you could look at MCV lists for these columns
> and compute
>
>     m(i) = Max(largest group in MCV list for i-th column)
>
> and then use Min(m(1), ..., m(k)) when estimating the number of
> comparisons.

I see the idea, but I'm a bit confused about how to get a largest group for a
MCV list? I mean there is a list of most common values per column with
frequencies, and similar stuff for multi columns statistics, but how to get a
size of those groups?



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Fri, May 24, 2019 at 02:10:48PM +0200, Dmitry Dolgov wrote:
>> On Fri, May 3, 2019 at 11:55 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> I don't recall the exact details of the discussion, since most of it
>> happened almost a year ago, but the main concern about the original
>> costing approach is that it very much assumes uniform distribution.
>>
>> For example if you have N tuples with M groups (which are not computed
>> using estimate_num_groups IIRC, and we can hardly do much better), then
>> the patch assumed each groups is ~N/M rows and used that for computing
>> the number of comparisons for a particular sequence of ORDER BY clauses.
>>
>> But that may result in pretty significant errors in causes with a couple
>> of very large groups.
>
>Yes, you're right, the current version of the patch assumes uniform
>distribution of values between these M groups. After some thinking I've got an
>idea that maybe it's better to not try to find out what would be acceptable for
>both uniform and non uniform distributions, but just do not reorder at all if
>there are any significant deviations from what seems to be a "best case",
>namely when values distributed among groups relatively uniformly and there are
>no doubts about how complicated is to compare them.
>
>Saying that, it's possible on top of the current implementation to check:
>
>* dependency coefficient between columns (if I understand correctly, non
>  uniform distributions of values between the groups a.k.a few large groups
>  should be visible from an extended statistics as a significant dependency)
>
>* largest/smallest group in MCV doesn't differ too much from an expected
>  "average" group
>
>* average width and comparison procost are similar
>
>If everything fits (which I hope would be most of the time) - apply reorder,
>otherwise use whatever order was specified (which leaves the room for manual
>reordering for relatively rare situations). Does it makes sense?
>

I think those are interesting sources of information. I don't know if it
will be sufficient, but I think it's worth exploring.

One of the issues with this approach however will be selecting the
threshold values. That is, how do you decide whether that a given
functional dependency is "too strong" or the largest/smallest MCV item
differs too much from the average one?

If you pick a particular threshold value, there'll be a sudden behavior
change at some point, and that's very annoying. For example, you might
pick 0.5 as a threshold for "strong" functional dependencies. There's
little reason why 0.4999 should be weak and 0.5001 should be "strong".

This may lead to sudden behavior changes after ANALYZE, for example.

So I think we need to find a way to make this part of the costing model,
which is how we deal with such cases elsewhere.

>> But what I think we could do is using largest possible group instead of
>> the average one. So for example when estimating number of comparisons
>> for columns (c1,..,cN), you could look at MCV lists for these columns
>> and compute
>>
>>     m(i) = Max(largest group in MCV list for i-th column)
>>
>> and then use Min(m(1), ..., m(k)) when estimating the number of
>> comparisons.
>
>I see the idea, but I'm a bit confused about how to get a largest group for a
>MCV list? I mean there is a list of most common values per column with
>frequencies, and similar stuff for multi columns statistics, but how to get a
>size of those groups?

You can just multiply the frequency by the number of rows in the table,
that gives you the size of the group. Or an estimate of it, because
there might be a filter condition involved.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Hi,

I wonder if anyone has plans to try again with this optimization in v14
cycle? The patches no longer apply thanks to the incremental sort patch,
but I suppose fixing that should not be extremely hard.

The 2020-07 CF is still a couple weeks away, but it'd be good to know if
there are any plans to revive this. I'm willing to spend some time on
reviewing / testing this, etc.


I've only quickly skimmed the old thread, but IIRC there were two main
challenges in getting the optimization right:


1) deciding which orderings are interesting / worth additional work

I think we need to consider these orderings, in addition to the one
specified in GROUP BY:

1) as specified in ORDER BY (if different from 1)

2) one with cheapest sort on unsorted input (depending on number of
distinct values, cost of comparisons, etc.)

3) one with cheapest sort on partially sorted input (essentially what we
do with the incremental sort paths, but matching the pathkeys in a more
elaborate way)


2) costing the alternative orderings

I think we've already discussed various ways to leverage as much
available info as possible (extended stats, MCVs, ...) but I think the
patch only does some of it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Fri, May 15, 2020 at 01:52:20AM +0200, Tomas Vondra wrote:
>
> I wonder if anyone has plans to try again with this optimization in v14
> cycle? The patches no longer apply thanks to the incremental sort patch,
> but I suppose fixing that should not be extremely hard.
>
> The 2020-07 CF is still a couple weeks away, but it'd be good to know if
> there are any plans to revive this. I'm willing to spend some time on
> reviewing / testing this, etc.

Yes, if you believe that this patch has potential, I would love to pick
it up again.

> I've only quickly skimmed the old thread, but IIRC there were two main
> challenges in getting the optimization right:
>
>
> 1) deciding which orderings are interesting / worth additional work
>
> I think we need to consider these orderings, in addition to the one
> specified in GROUP BY:
>
> 1) as specified in ORDER BY (if different from 1)

What is the idea behind considering this ordering?



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Sat, May 16, 2020 at 02:24:31PM +0200, Dmitry Dolgov wrote:
>> On Fri, May 15, 2020 at 01:52:20AM +0200, Tomas Vondra wrote:
>>
>> I wonder if anyone has plans to try again with this optimization in v14
>> cycle? The patches no longer apply thanks to the incremental sort patch,
>> but I suppose fixing that should not be extremely hard.
>>
>> The 2020-07 CF is still a couple weeks away, but it'd be good to know if
>> there are any plans to revive this. I'm willing to spend some time on
>> reviewing / testing this, etc.
>
>Yes, if you believe that this patch has potential, I would love to pick
>it up again.
>

I think it's worth another try. I've been reminded of this limitation
when working on the incremental sort, which significantly increases the
number of orderings that we can sort efficiently. But we can't possibly
leverage that unless it matches the GROUP BY ordering.

The attached WIP patch somewhat addresses this by trying to reorder the
group_pathkeys so allow leveraging sort of existing ordering (even just
partial, with incremental sort).

>> I've only quickly skimmed the old thread, but IIRC there were two main
>> challenges in getting the optimization right:
>>
>>
>> 1) deciding which orderings are interesting / worth additional work
>>
>> I think we need to consider these orderings, in addition to the one
>> specified in GROUP BY:
>>
>> 1) as specified in ORDER BY (if different from 1)
>
>What is the idea behind considering this ordering?

I'll try to answer this in general, i.e. what I think this patch needs
to consider. Hopefully that'll answer why we should look at ORDER BY
pathkeys too ...

Reordering the pathkeys has both local and global effects, and we need
to consider all of that to make the optimization reliable, otherwise
we'll inevitably end up with trivial regressions.


The local effects are trivial - it's for example reordering the pathkeys
to make the explicit sort as cheap as possible. This thread already
discussed a number of things to base this on - ndistinct for columns,
cost of comparison function, ... In any case, this is something we can
decide locally, when building the grouping paths.


The global effects are much harder to tackle, because the decision can't
be made locally when building the grouping paths. It requires changes
both below and above the point where we build grouping paths.

An example of a decision we need to make before we even get to building
a grouping path is which index paths to build. Currently we only build
index paths with "useful" pathkeys, and without tweaking that we'll
never even see the index in add_paths_to_grouping_rel().

But there are also decisions that can be made only after we build the
grouping paths. For example, we may have both GROUP BY and ORDER BY, and
there is no "always correct" way to combine those. In some cases it may
be correct to use the same pathkeys, in other cases it's better to use
different ones (which will require an extra Sort, with additional cost).


So I don't think there will be a single "interesting" grouping pathkeys
(i.e. root->group_pathkeys), but a collection of pathkeys. And we'll
need to build grouping paths for all of those, and leave the planner to
eventually pick the one giving us the cheapest plan ...

A "brute-force" approach would be to generate all possible orderings of
group_pathkeys, but that's probably not going to fly - it might easily
cause an explosion in number of paths we track, etc. So we'll need to
pick/prioritize orderings that are more likely to give us efficient
plans, and ORDER BY seems like a good example because it means we won't
need an explicit sort.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Sat, May 16, 2020 at 04:56:09PM +0200, Tomas Vondra wrote:
>
> So I don't think there will be a single "interesting" grouping pathkeys
> (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll
> need to build grouping paths for all of those, and leave the planner to
> eventually pick the one giving us the cheapest plan ...
>
> A "brute-force" approach would be to generate all possible orderings of
> group_pathkeys, but that's probably not going to fly - it might easily
> cause an explosion in number of paths we track, etc. So we'll need to
> pick/prioritize orderings that are more likely to give us efficient
> plans, and ORDER BY seems like a good example because it means we won't
> need an explicit sort.

Yes, I see. I've already rebased the original version of patch and now
working on your suggestions. In the meantime one question:

> But there are also decisions that can be made only after we build the
> grouping paths. For example, we may have both GROUP BY and ORDER BY, and
> there is no "always correct" way to combine those. In some cases it may
> be correct to use the same pathkeys, in other cases it's better to use
> different ones (which will require an extra Sort, with additional cost).

Do I understand correctly, your idea is that in some cases it's cheaper
to use different order for GROUP BY than in ORDER BY even with an extra
sorting? In the current patch implementation there is an assumption that
says it's always better to match the order of pathkeys for both GROUP
BY/ORDER BY (which means that the only degree of freedom there is to
reorder the tail, which in turn makes both "unsorted input" and
"partially sorted input" cases from your original email essentially the
same). Can you show such an example when this assumption is invalid?

> I've only quickly skimmed the old thread, but IIRC there were two main
> challenges in getting the optimization right:
>
> 1) deciding which orderings are interesting / worth additional work
>
> I think we need to consider these orderings, in addition to the one
> specified in GROUP BY:

Yes, looks like the current patch implementation together with
preprocess_groupclause already implements this, although maybe not that
flexible.

> 2) costing the alternative orderings
>
> I think we've already discussed various ways to leverage as much
> available info as possible (extended stats, MCVs, ...) but I think the
> patch only does some of it.

Right, there were couple of ideas what to do in case of a few groups
which are too big they can invalidate current assumptions about costs.
E.g. do not apply reordering if we detected such situation, or use
"conservative" approach and take the biggest group for estimations.



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Sat, Jun 20, 2020 at 04:30:10PM +0200, Dmitry Dolgov wrote:
>> On Sat, May 16, 2020 at 04:56:09PM +0200, Tomas Vondra wrote:
>>
>> So I don't think there will be a single "interesting" grouping pathkeys
>> (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll
>> need to build grouping paths for all of those, and leave the planner to
>> eventually pick the one giving us the cheapest plan ...
>>
>> A "brute-force" approach would be to generate all possible orderings of
>> group_pathkeys, but that's probably not going to fly - it might easily
>> cause an explosion in number of paths we track, etc. So we'll need to
>> pick/prioritize orderings that are more likely to give us efficient
>> plans, and ORDER BY seems like a good example because it means we won't
>> need an explicit sort.
>
>Yes, I see. I've already rebased the original version of patch and now
>working on your suggestions. In the meantime one question:
>
>> But there are also decisions that can be made only after we build the
>> grouping paths. For example, we may have both GROUP BY and ORDER BY, and
>> there is no "always correct" way to combine those. In some cases it may
>> be correct to use the same pathkeys, in other cases it's better to use
>> different ones (which will require an extra Sort, with additional cost).
>
>Do I understand correctly, your idea is that in some cases it's cheaper
>to use different order for GROUP BY than in ORDER BY even with an extra
>sorting? In the current patch implementation there is an assumption that
>says it's always better to match the order of pathkeys for both GROUP
>BY/ORDER BY (which means that the only degree of freedom there is to
>reorder the tail, which in turn makes both "unsorted input" and
>"partially sorted input" cases from your original email essentially the
>same). Can you show such an example when this assumption is invalid?
>

Yes, that is mostly the point - I don't think we can assume simply using
the ORDER BY pathkeys (if specified) for grouping is optimal. As a
trivial counter-example, consider this

CREATE TABLE t (
   a int,
   b int,
   c int);

INSERT INTO t SELECT 1000 * random(), 1000 * random(), i
   FROM generate_series(1,10000000) s(i);

CREATE INDEX ON t (a, b, c);

VACUUM ANALYZE t;


And now some simple queries (you may need to tweak the planning options
a bit, to get these plans - disable hashagg etc.).


test=# explain analyze select a, b, count(c) from t group by a, b;
                                                                    QUERY PLAN
                         
 

-------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.037..10509.505 rows=1001923 loops=1)
    Group Key: a, b
    ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.024..5189.908rows=10000000 loops=1)
 
          Heap Fetches: 0
  Planning Time: 0.113 ms
  Execution Time: 10941.677 ms
(6 rows)


test=# explain analyze select a,b, count(c) from t group by a, b order by a, b;
                                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.033..10606.925 rows=1001923 loops=1)
    Group Key: a, b
    ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.020..5240.332rows=10000000 loops=1)
 
          Heap Fetches: 0
  Planning Time: 0.100 ms
  Execution Time: 11043.358 ms
(6 rows)


test=# explain analyze select a,b, count(c) from t group by a, b order by b, a;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=1658556.19..1768558.12 rows=1000018 width=16) (actual time=14707.676..25533.053 rows=1001923
loops=1)
    Group Key: b, a
    ->  Sort  (cost=1658556.19..1683556.63 rows=10000175 width=12) (actual time=14707.659..20011.024 rows=10000000
loops=1)
          Sort Key: b, a
          Sort Method: external merge  Disk: 219200kB
          ->  Seq Scan on t  (cost=0.00..154056.75 rows=10000175 width=12) (actual time=0.028..4751.244 rows=10000000
loops=1)
  Planning Time: 0.103 ms
  Execution Time: 25989.412 ms
(8 rows)


test=# explain analyze select * from (select a,b, count(c) from t group by a, b offset 0) foo order by b,a;
                                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=515759.00..518259.04 rows=1000018 width=16) (actual time=11281.094..11783.920 rows=1001923 loops=1)
    Sort Key: t.b, t.a
    Sort Method: external merge  Disk: 34880kB
    ->  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.064..10507.256 rows=1001923
loops=1)
          Group Key: t.a, t.b
          ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.052..5209.939rows=10000000 loops=1)
 
                Heap Fetches: 0
  Planning Time: 0.111 ms
  Execution Time: 12218.370 ms
(9 rows)


To sum this up:

   grouping (a,b): 10941 ms
   grouping (a,b) + ordering (a,b): 11043
   grouping (b,a) + ordering (b,a): 25989
   grouping (a,b) + ordering (b,a): 12218

So, it's fast as long as we can use the IOS, even if we have to do an
extra Sort at the end. This obviously relies on the grouping to reduce
the number of rows (in this case from 10M to 1M), which makes the extra
cost much cheaper.

I'm pretty sure I could construct similar examples e.g. with incremental
sort, and so on.

Does this explain why I think we can't make the assumption that it's
always better to match the pathkeys?

FWIW, I think the change of plan for the third query (where we specify
"GROUP BY a,b" but the planner decides to just change that) is rather
annoying, and it clearly makes things worse :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC: GROUP BY optimization

От
Robert Haas
Дата:
On Sat, May 16, 2020 at 10:56 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> The local effects are trivial - it's for example reordering the pathkeys
> to make the explicit sort as cheap as possible. This thread already
> discussed a number of things to base this on - ndistinct for columns,
> cost of comparison function, ... In any case, this is something we can
> decide locally, when building the grouping paths.
>
> The global effects are much harder to tackle, because the decision can't
> be made locally when building the grouping paths. It requires changes
> both below and above the point where we build grouping paths.
>
> An example of a decision we need to make before we even get to building
> a grouping path is which index paths to build. Currently we only build
> index paths with "useful" pathkeys, and without tweaking that we'll
> never even see the index in add_paths_to_grouping_rel().
>
> But there are also decisions that can be made only after we build the
> grouping paths. For example, we may have both GROUP BY and ORDER BY, and
> there is no "always correct" way to combine those. In some cases it may
> be correct to use the same pathkeys, in other cases it's better to use
> different ones (which will require an extra Sort, with additional cost).
>
> So I don't think there will be a single "interesting" grouping pathkeys
> (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll
> need to build grouping paths for all of those, and leave the planner to
> eventually pick the one giving us the cheapest plan ...

I agree with all of this and I think it's really good analysis. Part
of the reason why the planner isn't that sophisticated in this area is
that, for a long time, we didn't use paths at this level, and so it
was much harder to write any kind of viable patch to consider
alternatives. With Tom's planner path-ification word there should be a
lot more potential for optimization here, but, as you say, we need to
do that by leveraging the existing costing machinery, not just via
simple heuristics. It also strikes me that one of the problems in this
area is that the statistics we currently gather don't seem to be
entirely useful or reliable for aggregate planning. I wonder if there
are extensions to the extended statistics mechanism, or even just more
things we should gather during a routine ANALYZE, that would enable us
to estimate things better here. The most obvious thing is that
n_distinct is often wildly inaccurate, but it's deeper than that. For
instance, we need some way to estimate how many groups you're going to
get when you filter on a and then group by b that doesn't assume
uniform distribution. And we need also need something that doesn't
just output a number of groups, but gives you some inkling of the
sizes of those groups: 1 giant group and a bunch of little ones isn't
the same as a bunch of equally sized groups. I don't know that these
are things we really have much chance of figuring out with the
currently-available information, though.

Sorry if this is hijacking the thread a bit; I don't mean to
discourage work on this specific patch. I'm just wondering if we need
to think a little bigger to see our way to a good solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Mon, Jun 22, 2020 at 11:50:49AM -0400, Robert Haas wrote:
>On Sat, May 16, 2020 at 10:56 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> The local effects are trivial - it's for example reordering the
>> pathkeys to make the explicit sort as cheap as possible. This thread
>> already discussed a number of things to base this on - ndistinct for
>> columns, cost of comparison function, ... In any case, this is
>> something we can decide locally, when building the grouping paths.
>>
>> The global effects are much harder to tackle, because the decision
>> can't be made locally when building the grouping paths. It requires
>> changes both below and above the point where we build grouping paths.
>>
>> An example of a decision we need to make before we even get to
>> building a grouping path is which index paths to build. Currently we
>> only build index paths with "useful" pathkeys, and without tweaking
>> that we'll never even see the index in add_paths_to_grouping_rel().
>>
>> But there are also decisions that can be made only after we build the
>> grouping paths. For example, we may have both GROUP BY and ORDER BY,
>> and there is no "always correct" way to combine those. In some cases
>> it may be correct to use the same pathkeys, in other cases it's
>> better to use different ones (which will require an extra Sort, with
>> additional cost).
>>
>> So I don't think there will be a single "interesting" grouping
>> pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
>> And we'll need to build grouping paths for all of those, and leave
>> the planner to eventually pick the one giving us the cheapest plan
>> ...
>
> I agree with all of this and I think it's really good analysis. Part
> of the reason why the planner isn't that sophisticated in this area is
> that, for a long time, we didn't use paths at this level, and so it
> was much harder to write any kind of viable patch to consider
> alternatives.  With Tom's planner path-ification word there should be
> a lot more potential for optimization here, but, as you say, we need
> to do that by leveraging the existing costing machinery, not just via
> simple heuristics.

Agreed.

> It also strikes me that one of the problems in this area is that the
> statistics we currently gather don't seem to be entirely useful or
> reliable for aggregate planning.

I don't think that's an immediate issue. I don't think anyone started
implementing these optimizations and decided not to do that because of
lack of statistics.

The reason why we don't have those optimizations is more that we decided
not to even consider those optimizations, possibly because of planner
limitations. Of course, the optimizations may require additional stats,
but that's only step #2.

> I wonder if there are extensions to the extended statistics mechanism,
> or even just more things we should gather during a routine ANALYZE,
> that would enable us to estimate things better here. The most obvious
> thing is that n_distinct is often wildly inaccurate, but it's deeper
> than that. For instance, we need some way to estimate how many groups
> you're going to get when you filter on a and then group by b that
> doesn't assume uniform distribution. And we need also need something
> that doesn't just output a number of groups, but gives you some inkling
> of the sizes of those groups: 1 giant group and a bunch of little ones
> isn't the same as a bunch of equally sized groups. I don't know that
> these are things we really have much chance of figuring out with the
> currently-available information, though.
>

Not sure. I think the extended stats we have now (ndistinct coeffs,
multi-column MCV lists) should be good basis for these decisions, even
with skewed data. Of course, we may need to add something else, but I'm
not sure what would that be.

The main limitation of extended stats is it's at the per-table level.
Once you do a join, it's all over - we don't have that capability :-(

> Sorry if this is hijacking the thread a bit; I don't mean to
> discourage work on this specific patch. I'm just wondering if we need
> to think a little bigger to see our way to a good solution.
>

I think it's fine continuing on this optimization. Either we'll be able
to get it done with existing stats, or we'll find what other stats we
need to sufficiently good heuristics.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC: GROUP BY optimization

От
Pavel Borisov
Дата:

I wonder if anyone has plans to try again with this optimization in v14
cycle? The patches no longer apply thanks to the incremental sort patch,
but I suppose fixing that should not be extremely hard.

I found this feature interesting and I'd like to join. 
PFA the updated version of the patch made fully compatible with changes in v13 and I suppose it is ready to be reviewed for v14.

Main things changed:
1. Patch is made compatible with new lists representation ( 1cff1b95ab6d )
2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and ba3e76cc571e )
3. Tests are changed to represent changes in keys numbering and naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 )

As always your ideas are very much welcome!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com
Вложения

Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Mon, Oct 26, 2020 at 12:44:58PM +0400, Pavel Borisov wrote:
> >
> >
> > I wonder if anyone has plans to try again with this optimization in v14
> > cycle? The patches no longer apply thanks to the incremental sort patch,
> > but I suppose fixing that should not be extremely hard.
> >
> > I found this feature interesting and I'd like to join.
> PFA the updated version of the patch made fully compatible with changes in
> v13 and I suppose it is ready to be reviewed for v14.
>
> Main things changed:
> 1. Patch is made compatible with new lists representation ( 1cff1b95ab6d )
> 2. Patch is made compatible with the incremental sort ( d2d8a229bc58 and
> ba3e76cc571e )
> 3. Tests are changed to represent changes in keys numbering and
> naming convention in the plan ( 55a1954da16 and 6ef77cf46e8 )
>
> As always your ideas are very much welcome!

Hi,

Thanks for your interest! FYI there is a new thread about this topic [1]
with the next version of the patch and more commentaries (I've created
it for visibility purposes, but probably it also created some confusion,
sorry for that).

[1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com



Re: POC: GROUP BY optimization

От
Pavel Borisov
Дата:
Thanks for your interest! FYI there is a new thread about this topic [1]
with the next version of the patch and more commentaries (I've created
it for visibility purposes, but probably it also created some confusion,
sorry for that).

Thanks!

I made a very quick look at your updates and noticed that it is intended to be simple and some parts of the code are removed as they have little test coverage. I'd propose vice versa to increase test coverage to enjoy more precise cost calculation and probably partial grouping.

Or maybe it's worth to benchmark both patches and then re-decide what we want more to have a more complicated or a simpler version.

Good to know that this feature is not stuck anymore and we have more than one proposal.
Thanks!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
> > Thanks for your interest! FYI there is a new thread about this topic [1]
> > with the next version of the patch and more commentaries (I've created
> > it for visibility purposes, but probably it also created some confusion,
> > sorry for that).
> >
> > Thanks!
>
> I made a very quick look at your updates and noticed that it is intended to
> be simple and some parts of the code are removed as they have little test
> coverage. I'd propose vice versa to increase test coverage to enjoy more
> precise cost calculation and probably partial grouping.
>
> Or maybe it's worth to benchmark both patches and then re-decide what we
> want more to have a more complicated or a simpler version.
>
> Good to know that this feature is not stuck anymore and we have more than
> one proposal.
> Thanks!

Just to clarify, the patch that I've posted in another thread mentioned
above is not an alternative proposal, but a development of the same
patch I had posted in this thread. As mentioned in [1], reduce of
functionality is an attempt to reduce the scope, and as soon as the base
functionality looks good enough it will be returned back.

[1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote:
>> On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
>> > Thanks for your interest! FYI there is a new thread about this topic [1]
>> > with the next version of the patch and more commentaries (I've created
>> > it for visibility purposes, but probably it also created some confusion,
>> > sorry for that).
>> >
>> > Thanks!
>>
>> I made a very quick look at your updates and noticed that it is intended to
>> be simple and some parts of the code are removed as they have little test
>> coverage. I'd propose vice versa to increase test coverage to enjoy more
>> precise cost calculation and probably partial grouping.
>>
>> Or maybe it's worth to benchmark both patches and then re-decide what we
>> want more to have a more complicated or a simpler version.
>>
>> Good to know that this feature is not stuck anymore and we have more than
>> one proposal.
>> Thanks!
>
>Just to clarify, the patch that I've posted in another thread mentioned
>above is not an alternative proposal, but a development of the same
>patch I had posted in this thread. As mentioned in [1], reduce of
>functionality is an attempt to reduce the scope, and as soon as the base
>functionality looks good enough it will be returned back.
>

I find it hard to follow two similar threads trying to do the same (or
very similar) things in different ways. Is there any chance to join
forces and produce a single patch series merging the changes? With the
"basic" functionality at the beginning, then patches with the more
complex stuff. That's the usual way, I think.

As I said in my response on the other thread [1], I think constructing
additional paths with alternative orderings of pathkeys is the right
approach. Otherwise we can't really deal with optimizations above the
place where we consider this optimization.

That's essentially what I was trying in explain May 16 response [2]
when I actually said this:

    So I don't think there will be a single "interesting" grouping
    pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
    And we'll need to build grouping paths for all of those, and leave
    the planner to eventually pick the one giving us the cheapest plan.

I wouldn't go as far as saying the approach in this patch (i.e. picking
one particular ordering) is doomed, but it's going to be very hard to
make it work reliably. Even if we get the costing *at this node* right,
who knows how it'll affect costing of the nodes above us?

So if I can suggest something, I'd merge the two patches, adopting the
path-based approach. With the very basic functionality/costing in the
first patch, and the more advanced stuff in additional patches.

Does that make sense?


regards


[1] https://www.postgresql.org/message-id/20200901210743.lutgvnfzntvhuban%40development

[2] https://www.postgresql.org/message-id/20200516145609.vm7nrqy7frj4ha6r%40development

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: POC: GROUP BY optimization

От
Dmitry Dolgov
Дата:
> On Tue, Oct 27, 2020 at 09:19:51PM +0100, Tomas Vondra wrote:
> On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote:
> > > On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
> > > > Thanks for your interest! FYI there is a new thread about this topic [1]
> > > > with the next version of the patch and more commentaries (I've created
> > > > it for visibility purposes, but probably it also created some confusion,
> > > > sorry for that).
> > > >
> > > > Thanks!
> > >
> > > I made a very quick look at your updates and noticed that it is intended to
> > > be simple and some parts of the code are removed as they have little test
> > > coverage. I'd propose vice versa to increase test coverage to enjoy more
> > > precise cost calculation and probably partial grouping.
> > >
> > > Or maybe it's worth to benchmark both patches and then re-decide what we
> > > want more to have a more complicated or a simpler version.
> > >
> > > Good to know that this feature is not stuck anymore and we have more than
> > > one proposal.
> > > Thanks!
> >
> > Just to clarify, the patch that I've posted in another thread mentioned
> > above is not an alternative proposal, but a development of the same
> > patch I had posted in this thread. As mentioned in [1], reduce of
> > functionality is an attempt to reduce the scope, and as soon as the base
> > functionality looks good enough it will be returned back.
> >
>
> I find it hard to follow two similar threads trying to do the same (or
> very similar) things in different ways. Is there any chance to join
> forces and produce a single patch series merging the changes? With the
> "basic" functionality at the beginning, then patches with the more
> complex stuff. That's the usual way, I think.
>
> As I said in my response on the other thread [1], I think constructing
> additional paths with alternative orderings of pathkeys is the right
> approach. Otherwise we can't really deal with optimizations above the
> place where we consider this optimization.
>
> That's essentially what I was trying in explain May 16 response [2]
> when I actually said this:
>
>    So I don't think there will be a single "interesting" grouping
>    pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
>    And we'll need to build grouping paths for all of those, and leave
>    the planner to eventually pick the one giving us the cheapest plan.
>
> I wouldn't go as far as saying the approach in this patch (i.e. picking
> one particular ordering) is doomed, but it's going to be very hard to
> make it work reliably. Even if we get the costing *at this node* right,
> who knows how it'll affect costing of the nodes above us?
>
> So if I can suggest something, I'd merge the two patches, adopting the
> path-based approach. With the very basic functionality/costing in the
> first patch, and the more advanced stuff in additional patches.
>
> Does that make sense?

Yes, and from what I understand it's already what had happened in the
newer thread [1]. To avoid any confusion, there are no "two patches" at
least from my side, and what I've posted in [1] is the continuation of
this work, but with path-based approach adopted and a bit less
functionality (essentially I've dropped everything what was not covered
with tests in the original patch).

In case if I'm missing something and Pavel's proposal is significantly
different from the original patch (if I understand correctly, at the
moment the latest patch posted here is a rebase and adjusting the old
patch to work with the latest changes in master, right?), then indeed
they could be merged, but please in the newer thread [1].

[1]: https://www.postgresql.org/message-id/flat/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com



Re: POC: GROUP BY optimization

От
Pavel Borisov
Дата:
In case if I'm missing something and Pavel's proposal is significantly
different from the original patch (if I understand correctly, at the
moment the latest patch posted here is a rebase and adjusting the old
patch to work with the latest changes in master, right?), then indeed
they could be merged, but please in the newer thread [1].

Sure, my patch has the only difference from the previous Theodor's code for compatibility with v.13, though it is not small, and I appreciate the changes in paths processing. The only thing that caused my notice, is that some useful changes which I've mentioned before, are discarded now. But as long as they are planned to be put in later it is completely fine. I agree to discuss the thing in any thread, though I don't quite understand the reason for a switch.

Still I don't see a problem.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: POC: GROUP BY optimization

От
Ibrar Ahmed
Дата:


On Wed, Mar 10, 2021 at 4:05 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

I take a look at the patch today. Attached is the v12 and a separate
patch with some comment tweaks and review comments.


1) I see the patch results in some plan changes in postgres_fdw. I
assume it's somehow related to the sort costing changes, but I find it a
bit suspicious. Why should the plan change from this:

  Foreign Scan
    Remote SQL: ... ORDER BY ... OFFSET 100 LIMIT 10;

to

  Limit
    Foreign Scan
      Remote SQL: ... ORDER BY ...

But even if this is due to changes to order by costing, postponing the
limit seems like a net loss - the sort happens before the limit, and by
postponing the limit after foreign scan we need to transfer 100 more
rows. So what's causing this?

The difference in plan cost seems fairly substantial (old: 196.52, new:
241.94). But maybe that's OK and expected.


2) I wonder how much more expensive (in terms of CPU) is the new sort
costing code. It's iterative, and it's calling functions to calculate
number of groups etc. I wonder what's the impact simple queries?


3) It's not clear to me why we need "fake var" protection? Why would the
estimate_num_groups() fail for that?


4) In one of the places a comment says DEFAULT_NUM_DISTINCT is not used
because we're dealing with multiple columns. That makes sense, but maybe
we should use DEFAULT_NUM_DISTINCT at least to limit the range. That is,
with N columns we should restrict the nGroups estimate by:

    min = DEFAULT_NUM_DISTINCT
    max = Min(pow(DEFAULT_NUM_DISTINCT, ncolumns), tuples)

Also, it's not clear what's the logic behind the formula:

    nGroups = ceil(2.0 + sqrt(tuples) *
       list_length(pathkeyExprs) / list_length(pathkeys));

A comment explaining that would be helpful.


5) The compute_cpu_sort_cost comment includes two references (in a quite
mixed-up way), and then there's another reference in the code. I suggest
we list all of them in the function comment.


6) There's a bunch of magic values (various multipliers etc.). It's not
clear which values are meant to be equal, etc. Let's introduce nicer
constants with reasonable names.


7) The comment at get_cheapest_group_keys_order is a bit misleading,
because it talks about "uniqueness" - but that's not what we're looking
at here (I mean, is the column unique or not). We're looking at number
of distinct values in the column, which is a different thing. Also, it'd
be good to roughly explain what get_cheapest_group_keys_order does - how
it calculates the order (permutations up to 4 values, etc.).


8) The comment at compute_cpu_sort_cost should also explain basics of
the algorithm. I tried doing something like that, but I'm not sure I got
all the details right and it probably needs further improvements.


9) The main concern I have is still about the changes in planner.c, and
I think I've already shared it before. The code takes the grouping
pathkeys, and just reshuffles them to what it believes is a better order
(cheaper, ...). That is, there's on input pathkey list, and it gets
transformed into another pathkey list. The problem is that even if this
optimizes the sort, it may work against some optimization in the upper
part of the plan.

Imagine a query that does something like this:

   SELECT a, b, count(*) FROM (
      SELECT DISTINCT a, b, c, d FROM x
   ) GROUP BY a, b;

Now, from the costing perspective, it may be cheaper to do the inner
grouping by "c, d, a, b" for example. But that works directly against
the second grouping, which would benefit from having the input sorted by
"a, b". How exactly would this behaves depends on the number of distinct
values in the columns, how expensive the comparisons are for each
column, and so on. But you get the idea.

I admit I haven't managed to construct a reasonably query that'd be
significantly harmed by this, but I haven't spent a lot of time on it.

I'm pretty sure I used this trick in the past, when doing aggregations
on large data sets, where it was much better to "pipe" the data through
multiple GroupAggregate nodes.

For this reason I think the code generating paths should work a bit like
get_useful_pathkeys_for_relation and generate_useful_gather_paths.

* group_keys_reorder_by_pathkeys should not produce just one list of
  pathkeys, but multiple interesting options. At the moment it should
  produce at least the input grouping pathkeys (as specified in the
  query), and the "optimal" pathkeys (by lowest cost).

* the places calling group_keys_reorder_by_pathkeys should loop on the
  result, and generate separate path for each option.

I'd guess in the future we'll "peek forward" in the plan and see if
there are other interesting pathkeys (that's the expectation for
get_useful_pathkeys_for_relation).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

The patch does not apply successfully, can you please take a look at that

http://cfbot.cputube.org/patch_33_1651.log

=== applying patch ./0001-v12-20210309.patch

atching file src/include/utils/selfuncs.h
Hunk #1 FAILED at 198.
1 out of 1 hunk FAILED -- saving rejects to file src/include/utils/selfuncs.h.rej


Based on @Tomas Vondra comments, and patch does not apply, I am changing the status to "Waiting On Author".

--

Ibrar Ahmed

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Hi,

here is an updated version of this patch, with some significant changes.

The main change that instead of calling get_cheapest_group_keys_order
directly, the planner now calls get_useful_group_keys_orderings and gets
multiple "interesting" pathkey orderings instead of just a single one.
The callers then loop over these orderings and construct paths for all
of them. This idea is similar to get_useful_pathkeys_for_relation()
added by incremental sort.

FWIW this addresses point (9) from my last review - I started with it,
because it was the main thing affecting the overall architecture. The
remaining bits are more "local".

I haven't investigated how expensive those changes are (in terms of
planning overhead), but the number of extra orderings is fairly low, and
I'd expect most of the paths to be eliminated fairly quickly.

I've also added / improved a number of comments etc. but I'm sure more
cleanup is needed.


The other comments from the review still apply - I'm particularly
concerned about the (1) point, i.e. plan changes in postgres_fdw. Those
seem to be rather strange (LIMIT not being pushed down in queries
without any grouping). I'd bet this is due to changes in sort costing
and does not seem very desirable.



regards


[1]
https://www.postgresql.org/message-id/22c44f98-bfa8-8630-62b5-5155e11eb284%40enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 7/20/21 3:12 PM, Tomas Vondra wrote:
> ...
>
> The other comments from the review still apply - I'm particularly
> concerned about the (1) point, i.e. plan changes in postgres_fdw. Those
> seem to be rather strange (LIMIT not being pushed down in queries
> without any grouping). I'd bet this is due to changes in sort costing
> and does not seem very desirable.
> 

I did look at this a bit closer, and yes - this very much seems like a
costing issue in the patch. The first query in postgres_fdw that changes
switches from a query with LIMIT pushed to remote

                              QUERY PLAN
------------------------------------------------------------------------
 Foreign Scan  (cost=196.29..196.52 rows=10 width=14)
   Output: t1.c1, t2.c1, t1.c3
   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1
               INNER JOIN "S 1"."T 1" r2 ON ... LIMIT ... OFFSET ...
(4 rows)

to the LIMIT executed locally

                              QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=241.72..241.94 rows=10 width=14)
   Output: t1.c1, t2.c1, t1.c3
   ->  Foreign Scan  (cost=239.47..261.97 rows=1000 width=14)
         Output: t1.c1, t2.c1, t1.c3
         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
         Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1"
         r1 INNER JOIN ...
(6 rows)

The FDW code runs explain on both queries - with and without LIMIT
pushed to remote side, to get estimates, and without the patch it gets this:

                                  QUERY PLAN
------------------------------------------------------------------------------
 Sort  (cost=106.97..109.47 rows=1000 width=14)
   Sort Key: r1.c3, r1."C 1"
   ->  Hash Join  (cost=33.50..57.14 rows=1000 width=14)
         Hash Cond: (r1."C 1" = r2."C 1")
         ->  Seq Scan on "T 1" r1  (cost=0.00..21.00 rows=1000 width=10)
         ->  Hash  (cost=21.00..21.00 rows=1000 width=4)
               ->  Seq Scan on "T 1" r2  (cost=0.00..21.00 rows=1000
width=4)
(7 rows)

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit  (cost=96.29..96.32 rows=10 width=14)
   ->  Sort  (cost=96.04..98.54 rows=1000 width=14)
         Sort Key: r1.c3, r1."C 1"
         ->  Hash Join  (cost=33.50..57.14 rows=1000 width=14)
               Hash Cond: (r1."C 1" = r2."C 1")
               ->  Seq Scan on "T 1" r1  (cost=0.00..21.00 rows=1000
width=10)
               ->  Hash  (cost=21.00..21.00 rows=1000 width=4)
                     ->  Seq Scan on "T 1" r2  (cost=0.00..21.00
rows=1000 width=4)
(8 rows)


while with the patch it gets this:

                                  QUERY PLAN
------------------------------------------------------------------------------
 Sort  (cost=139.47..141.97 rows=1000 width=14)
   Sort Key: r1.c3, r1."C 1"
   ->  Hash Join  (cost=33.50..57.14 rows=1000 width=14)
         Hash Cond: (r1."C 1" = r2."C 1")
         ->  Seq Scan on "T 1" r1  (cost=0.00..21.00 rows=1000 width=10)
         ->  Hash  (cost=21.00..21.00 rows=1000 width=4)
               ->  Seq Scan on "T 1" r2  (cost=0.00..21.00 rows=1000
width=4)
(7 rows)

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit  (cost=145.75..145.77 rows=10 width=14)
   ->  Sort  (cost=145.50..148.00 rows=1000 width=14)
         Sort Key: r1.c3, r1."C 1"
         ->  Hash Join  (cost=33.50..57.14 rows=1000 width=14)
               Hash Cond: (r1."C 1" = r2."C 1")
               ->  Seq Scan on "T 1" r1  (cost=0.00..21.00 rows=1000
width=10)
               ->  Hash  (cost=21.00..21.00 rows=1000 width=4)
                     ->  Seq Scan on "T 1" r2  (cost=0.00..21.00
rows=1000 width=4)
(8 rows)


Notice that the costs get "inverted"

                       master            patched
   ----------------------------------------------
    no limit   106.97..109.47     139.47..141.97
    limit       96.29.. 96.32     145.75..145.77

so the limit looks a bit more expensive - just enough so that it seems
cheaper to transfer more rows and execute the limit locally.

IMO this looks rather suspicious (or wrong), and compute_cpu_sort_cost
may need some fixes. I wonder why differs from the old code so much ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Hi,

after a bit more time spent on this, I found that the issue is due to
this chunk of code in

    if (heapSort)
    {
        if (tuplesPerPrevGroup < output_tuples)
            /* comparing only inside output_tuples */
            correctedNGroups =
                ceil(2.0 * output_tuples /
                     (tuplesPerPrevGroup / nGroups));
        else
            /* two groups - in output and out */
            correctedNGroups = 2.0;
    }
    else
        correctedNGroups = nGroups;

    if (correctedNGroups <= 1.0)
        correctedNGroups = 2.0;
    else
        correctedNGroups = ceil(correctedNGroups);
    per_tuple_cost += totalFuncCost * LOG2(correctedNGroups);


There's a couple issues, mostly due to differences in handling of cases
with different heapSort flag. A full-sort (no LIMIT clause) we have
heapSort=false, and hence the execution simply jumps to

    correctedNGroups = nGroups;

while for LIMIT we do heapSort=true, in which case we also start with

    tuplesPerPrevGroup = ntuples;

That is almost certainly greater than output_tuples (=limit+offset), so
the first if condition in calculating correctedNGroups can't possibly be
true, and we simply end with

    correctedNGroups = 2.0;

in the first loop. Which seems pretty bogus - why would there be just
two groups? When processing the first expression, it's as if there was
one big "prev group" with all the tuples, so why not to just use nGroups
as it is?

This seems to confuse the costing quite a bit - enough to produce the
"inversed" costs with/without LIMIT, and picking the other plan.

I've simplified the costing a bit, and the attached version actually
undoes all the "suspicious" plan changes in postgres_fdw. It changes one
new plan, but that seems somewhat reasonable, as it pushes sort to the
remote side.

But after looking at the costing function, I have a bunch of additional
comments and questions:


1) I looked at the resources mentioned as sources the formulas came
from, but I've been unable to really match the algorithm to them. The
quicksort paper is particularly "dense", the notation seems to be very
different, and none of the theorems seem like an obvious fit. Would be
good to make the relationship clearer in comments etc.

For the Sedgewick course it's even worse - it's way too extensive to
just point at it and say "We're using ideas from this!" because no one
is going to know which chapter/section to look at. We need to be a bit
more specific about the reference.


2) I'm a bit puzzled by the "heapsort" chunk, actually. How come we need
it now, when we didn't need that before? In a way, the difference in
behavior between heasort and non-heapsort is what triggered the plan
changes ...

FWIW It's quite possible I tweaked the costing incorrectly, but it ends
up choosing the right plans purely by accident.


3) I'm getting a bit skeptical about the various magic coefficients that
are meant to model higher costs with non-uniform distribution. But
consider that we do this, for example:

   tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);

but then in the next loop we call estimate_num_groups_incremental and
pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
may have various strange consequences - we'll calculate the nGroups
based on the inflated value, and we'll calculate tuplesPerPrevGroup from
that again - that seems susceptible to "amplification".

We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
nGroups estimate in the next loop - but not linearly. An then we'll
calculate the inflated tuplesPerPrevGroup and estimated nGroup ...

That seems pretty dubious, with hard to explain behavior, IMO.

If we want to keep applying these coefficients, we need to do that in a
way that does not affect the subsequent loop like this - we might tweak
the per_tuple_cost formula, for example, not tuplesPerPrevGroup.


4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
estimate_num_groups_incremental. In principle yes, if we use "group
size" from the previous step, then the returned value is the number of
new groups after adding the "new" pathkey.

But even if we ignore the issues with amplification mentioned in (3),
there's an issue with non-linear behavior in estimate_num_groups,
because at some point it's calculating

    D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))

where N - total rows, p - sample size, n - number of distinct values.
And if we have (N1,n1) and (N2,n2) then the ratio of calculated
estimated (which is pretty much what calculating group size does)

    D(N2,n2,p2) / D(N1,n1,p1)

which will differ depending on p1 and p2. And if we're tweaking the
tuplesPerPrevGroup all the time, that's really annoying, as it may make
the groups smaller or larger, which is unpredictable and annoying, and I
wonder if it might go against the idea of penalizing tuplesPerPrevGroup
to some extent.

We could simply use the input "tuples" value here, and then divide the
current and previous estimate to calculate the number of new groups.



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: POC: GROUP BY optimization

От
"Andrey V. Lepikhov"
Дата:
I keep work on this patch. Here is intermediate results.

On 7/22/21 3:58 AM, Tomas Vondra wrote:
> in the first loop. Which seems pretty bogus - why would there be just
> two groups? When processing the first expression, it's as if there was
> one big "prev group" with all the tuples, so why not to just use nGroups
> as it is?

I think, heapsort code seems very strange. Look into fallback case. It 
based on an output_tuples value. Maybe we should use nGroups value here, 
but based on a number of output_tuples?

 > 1) I looked at the resources mentioned as sources the formulas came
 > from, but I've been unable to really match the algorithm to them. The
 > quicksort paper is particularly "dense", the notation seems to be very
 > different, and none of the theorems seem like an obvious fit. Would be
 > good to make the relationship clearer in comments etc.

Fixed (See attachment).

> 3) I'm getting a bit skeptical about the various magic coefficients that
> are meant to model higher costs with non-uniform distribution. But
> consider that we do this, for example:
> 
>     tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
> 
> but then in the next loop we call estimate_num_groups_incremental and
> pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
> may have various strange consequences - we'll calculate the nGroups
> based on the inflated value, and we'll calculate tuplesPerPrevGroup from
> that again - that seems susceptible to "amplification".
> 
> We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
> nGroups estimate in the next loop - but not linearly. An then we'll
> calculate the inflated tuplesPerPrevGroup and estimated nGroup ...

Weighting coefficient '1.5' shows our desire to minimize the number of 
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order 
pathkeys by decreasing of uniqueness.
I think, it's ok.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 7/22/21 3:58 AM, Tomas Vondra wrote:
> 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
> estimate_num_groups_incremental. In principle yes, if we use "group
> size" from the previous step, then the returned value is the number of
> new groups after adding the "new" pathkey.
> But even if we ignore the issues with amplification mentioned in (3),
> there's an issue with non-linear behavior in estimate_num_groups,
> because at some point it's calculating
> 
>     D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))
> 
> where N - total rows, p - sample size, n - number of distinct values.
> And if we have (N1,n1) and (N2,n2) then the ratio of calculated
> estimated (which is pretty much what calculating group size does)
> 
>     D(N2,n2,p2) / D(N1,n1,p1)
> 
> which will differ depending on p1 and p2. And if we're tweaking the
> tuplesPerPrevGroup all the time, that's really annoying, as it may make
> the groups smaller or larger, which is unpredictable and annoying, and I
> wonder if it might go against the idea of penalizing tuplesPerPrevGroup
> to some extent.
> We could simply use the input "tuples" value here, and then divide the
> current and previous estimate to calculate the number of new groups.
tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
we should use result of previous estimation as a limit for the next 
during incremental estimation. Maybe we simply limit the 
tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
value? - see in attachment patch to previous fixes.

-- 
regards,
Andrey Lepikhov
Postgres Professional
Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 1/21/22 12:09, Andrey Lepikhov wrote:
> On 7/22/21 3:58 AM, Tomas Vondra wrote:
>> 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
>> estimate_num_groups_incremental. In principle yes, if we use "group
>> size" from the previous step, then the returned value is the number of
>> new groups after adding the "new" pathkey.
>> But even if we ignore the issues with amplification mentioned in (3),
>> there's an issue with non-linear behavior in estimate_num_groups,
>> because at some point it's calculating
>>
>>     D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))
>>
>> where N - total rows, p - sample size, n - number of distinct values.
>> And if we have (N1,n1) and (N2,n2) then the ratio of calculated
>> estimated (which is pretty much what calculating group size does)
>>
>>     D(N2,n2,p2) / D(N1,n1,p1)
>>
>> which will differ depending on p1 and p2. And if we're tweaking the
>> tuplesPerPrevGroup all the time, that's really annoying, as it may make
>> the groups smaller or larger, which is unpredictable and annoying, and I
>> wonder if it might go against the idea of penalizing tuplesPerPrevGroup
>> to some extent.
>> We could simply use the input "tuples" value here, and then divide the
>> current and previous estimate to calculate the number of new groups.
 >
> tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
> we should use result of previous estimation as a limit for the next 
> during incremental estimation. Maybe we simply limit the 
> tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
> value? - see in attachment patch to previous fixes.
> 

Yes, I think that's a reasonable defense - it makes no sense to exceed 
the group size in the preceding step, which could happen with small 
number of groups or some unexpected ndistinct estimate.

The other thing we could do is reduce the coefficient gradually - so 
it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so 
on. But it seems somewhat arbitrary (I certainly don't have some sound 
theoretical justification ...).

I've merged most of the fixes you reported. I've skipped the path_save 
removal in planner.c, because that seems incorrect - if there are 
multiple pathkeys, we must start with the original path, not the 
modified one we built in the last iteration. Or am I missing something?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 22/1/2022 01:34, Tomas Vondra wrote:
> The other thing we could do is reduce the coefficient gradually - so 
> it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so 
> on. But it seems somewhat arbitrary (I certainly don't have some sound 
> theoretical justification ...).

I think, it hasn't a reason to increase complexity without any theory at 
the bottom. Simple solution allows to realize problems much faster, if 
they arise.

> ... I've skipped the path_save 
> removal in planner.c, because that seems incorrect - if there are 
> multiple pathkeys, we must start with the original path, not the 
> modified one we built in the last iteration. Or am I missing something
You are right, I misunderstood the idea of path_save variable.

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: POC: GROUP BY optimization

От
"Andrey V. Lepikhov"
Дата:
> On 7/22/21 3:58 AM, Tomas Vondra wrote:
> I've simplified the costing a bit, and the attached version actually
> undoes all the "suspicious" plan changes in postgres_fdw. It changes one
> new plan, but that seems somewhat reasonable, as it pushes sort to the
> remote side.

I tried to justify heap-sort part of the compute_cpu_sort_cost() routine 
and realized, that here we may have a mistake.
After a week of efforts, I don't found any research papers on dependence 
of bounded heap-sort time compexity on number of duplicates.

So, I suppose self-made formula, based on simple logical constructions:

1. We should base on initial formula: cost ~ N*LOG2(M), where M - 
output_tuples.
2. Realize, that full representation of this formula is:

cost ~ N*LOG2(min{N,M})

3. In the case of multicolumn, number of comparisons for each next 
column can be estimated by the same formula, but arranged to a number of 
tuples per group:

comparisons ~ input * LOG2(min{input,M})

4. Realize, that for the case, when M > input, we should change this 
formula a bit:

comparisons ~ max{input,M} * LOG2(min{input,M})

Remember, that in our case M << tuples.
So, general formula for bounded heap sort can be written as:

cost ~ N * sum(max{n_i,M}/N * LOG2(min{n_i,M})), i=1,ncols

where n_1 == N, n_i - number of tuples per group, estimated from 
previous iteration.

In attachment - an implementation of this approach.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
"Andrey V. Lepikhov"
Дата:
On 1/22/22 01:34, Tomas Vondra wrote:
> 

I rebased (with minor fixes) this patch onto current master.

Also, second patch dedicated to a problem of "varno 0" (fake_var).
I think, this case should make the same estimations as in the case of 
varno != 0, but no any stats found. So I suppose to restrict number of 
groups with min of a number of incoming tuples and DEFAULT_NUM_DISTINCT 
value.

-- 
regards,
Andrey Lepikhov
Postgres Professional
Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 2/10/22 10:00, Andrey V. Lepikhov wrote:
> On 1/22/22 01:34, Tomas Vondra wrote:
>>
> 
> I rebased (with minor fixes) this patch onto current master.
> 
> Also, second patch dedicated to a problem of "varno 0" (fake_var).
> I think, this case should make the same estimations as in the case of
> varno != 0, but no any stats found. So I suppose to restrict number of
> groups with min of a number of incoming tuples and DEFAULT_NUM_DISTINCT
> value.
> 

Thanks for the rebase. The two proposed changes (tweaked costing and
simplified fake_var handling) seem fine to me. I think the last thing
that needs to be done is cleanup of the debug GUCs, which I added to
allow easier experimentation with the patch.

I probably won't remove the GUCs entirely, though. I plan to add a
single GUC that would enable/disable this optimization. I'm not a huge
fan of adding more and more GUCs, but in this case it's probably the
right thing to do given the complexity of estimating cost with
correlated columns etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: POC: GROUP BY optimization

От
"Andrey V. Lepikhov"
Дата:
On 3/15/22 13:26, Tomas Vondra wrote:
> Thanks for the rebase. The two proposed changes (tweaked costing and
> simplified fake_var handling) seem fine to me. I think the last thing
> that needs to be done is cleanup of the debug GUCs, which I added to
> allow easier experimentation with the patch.
Thanks, I'm waiting for the last step.
> 
> I probably won't remove the GUCs entirely, though. I plan to add a
> single GUC that would enable/disable this optimization. I'm not a huge
> fan of adding more and more GUCs, but in this case it's probably the
> right thing to do given the complexity of estimating cost with
> correlated columns etc.
Agree. Because it is a kind of automation we should allow user to switch 
it off in the case of problems or manual tuning.

Also, I looked through this patch. It has some minor problems:
1. Multiple typos in the patch comment.
2. The term 'cardinality of a key' - may be replace with 'number of 
duplicates'?

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Hi,

Here's a rebased/improved version of the patch, with smaller parts
addressing various issues. There are seven parts:

0001 - main part, just rebased

0002 - replace the debug GUC options with a single GUC to disable the
       optimization if needed

0003 - minor code cleanup, removal of unnecessary variable

0004 - various comment fixes (rewordings, typos, ...)

0005 - a minor code simplification, addressing FIXMEs from 0004

0006 - adds the new GUC to the docs

0007 - demonstrates plan changes with a disabled optimization

The first 6 parts should be squashed and committed at one, I only kept
them separate for clarity. The 0007 is merely a demonstration of the new
GUC and that it disables the optimization.

> Agree. Because it is a kind of automation we should allow user to switch
> it off in the case of problems or manual tuning.
> > Also, I looked through this patch. It has some minor problems:
> 1. Multiple typos in the patch comment.

I went through the comments and checked all of them for grammar mistakes
and typos using a word processor, so hopefully that should be OK. But
maybe there's still something wrong.

> 2. The term 'cardinality of a key' - may be replace with 'number of
> duplicates'?

No, cardinality means "number of distinct values", so "duplicates" would
be wrong. And I think "cardinality" is well established term, so I think
it's fine.

BTW I named the GUC enable_group_by_reordering, I wonder if it should be
named differently, e.g. enable_groupby_reordering? Opinions?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: POC: GROUP BY optimization

От
Zhihong Yu
Дата:


On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

Here's a rebased/improved version of the patch, with smaller parts
addressing various issues. There are seven parts:

0001 - main part, just rebased

0002 - replace the debug GUC options with a single GUC to disable the
       optimization if needed

0003 - minor code cleanup, removal of unnecessary variable

0004 - various comment fixes (rewordings, typos, ...)

0005 - a minor code simplification, addressing FIXMEs from 0004

0006 - adds the new GUC to the docs

0007 - demonstrates plan changes with a disabled optimization

The first 6 parts should be squashed and committed at one, I only kept
them separate for clarity. The 0007 is merely a demonstration of the new
GUC and that it disables the optimization.

> Agree. Because it is a kind of automation we should allow user to switch
> it off in the case of problems or manual tuning.
> > Also, I looked through this patch. It has some minor problems:
> 1. Multiple typos in the patch comment.

I went through the comments and checked all of them for grammar mistakes
and typos using a word processor, so hopefully that should be OK. But
maybe there's still something wrong.

> 2. The term 'cardinality of a key' - may be replace with 'number of
> duplicates'?

No, cardinality means "number of distinct values", so "duplicates" would
be wrong. And I think "cardinality" is well established term, so I think
it's fine.

BTW I named the GUC enable_group_by_reordering, I wonder if it should be
named differently, e.g. enable_groupby_reordering? Opinions?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,

For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:

multiple parametes need to be

  parametes -> parameters

leave more expensive comparions

  comparions -> comparisons

+       if (has_fake_var == false)

The above can be written as:

       if (!has_fake_var)

+           nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / list_length(pathkeys));

Looks like the value of tuples doesn't change inside the loop.
You can precompute sqrt(tuples) outside the loop and store the value in a variable.

+       return -1;
+   else if (a->cost == b->cost)
+       return 0;
+   return 1;

the keyword 'else' is not needed.

+ * Returns newly allocated lists. If no reordering is possible (or needed),
+ * the lists are set to NIL.
+ */
+static bool
+get_cheapest_group_keys_order(PlannerInfo *root, double nrows,

It seems the comment for return value doesn't match the bool return type.

+   /* If this optimization is disabled, we're done. */
+   if (!debug_cheapest_group_by)

It seems enable_cheapest_group_by would be better name for the flag.

Cheers

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 3/29/22 04:02, Zhihong Yu wrote:
> 
> 
> On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> 
>     Hi,
> 
>     Here's a rebased/improved version of the patch, with smaller parts
>     addressing various issues. There are seven parts:
> 
>     0001 - main part, just rebased
> 
>     0002 - replace the debug GUC options with a single GUC to disable the
>            optimization if needed
> 
>     0003 - minor code cleanup, removal of unnecessary variable
> 
>     0004 - various comment fixes (rewordings, typos, ...)
> 
>     0005 - a minor code simplification, addressing FIXMEs from 0004
> 
>     0006 - adds the new GUC to the docs
> 
>     0007 - demonstrates plan changes with a disabled optimization
> 
>     The first 6 parts should be squashed and committed at one, I only kept
>     them separate for clarity. The 0007 is merely a demonstration of the new
>     GUC and that it disables the optimization.
> 
>     > Agree. Because it is a kind of automation we should allow user to
>     switch
>     > it off in the case of problems or manual tuning.
>     > > Also, I looked through this patch. It has some minor problems:
>     > 1. Multiple typos in the patch comment.
> 
>     I went through the comments and checked all of them for grammar mistakes
>     and typos using a word processor, so hopefully that should be OK. But
>     maybe there's still something wrong.
> 
>     > 2. The term 'cardinality of a key' - may be replace with 'number of
>     > duplicates'?
> 
>     No, cardinality means "number of distinct values", so "duplicates" would
>     be wrong. And I think "cardinality" is well established term, so I think
>     it's fine.
> 
>     BTW I named the GUC enable_group_by_reordering, I wonder if it should be
>     named differently, e.g. enable_groupby_reordering? Opinions?
> 
> 
>     regards
> 
>     -- 
>     Tomas Vondra
>     EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
>     The Enterprise PostgreSQL Company
> 
> Hi,
> 
> For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:
> 
> multiple parametes need to be
> 
>   parametes -> parameters
> 
> leave more expensive comparions
> 
>   comparions -> comparisons
> 
> +       if (has_fake_var == false)
> 
> The above can be written as:
> 
>        if (!has_fake_var)
> 

All of this was already fixed in one of the subsequent "fixup" patches.
Attached is a patch merging 0001-0006, which is what I proposed to get
committed.

> +           nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) /
> list_length(pathkeys));
> 
> Looks like the value of tuples doesn't change inside the loop.
> You can precompute sqrt(tuples) outside the loop and store the value in
> a variable.
> 

IMHO it makes the formula harder to read, and the effect is not going to
be measurable - we're processing only a couple elements. If the loop was
~100 iterations or more, maybe it'd have impact.

> +       return -1;
> +   else if (a->cost == b->cost)
> +       return 0;
> +   return 1;
> 
> the keyword 'else' is not needed.
> 

True, but this is how comparators are usually implemented.

> + * Returns newly allocated lists. If no reordering is possible (or needed),
> + * the lists are set to NIL.
> + */
> +static bool
> +get_cheapest_group_keys_order(PlannerInfo *root, double nrows,
> 
> It seems the comment for return value doesn't match the bool return type.
> 

Yup, this is a valid point. I've fixed/reworded the comment a bit.

> +   /* If this optimization is disabled, we're done. */
> +   if (!debug_cheapest_group_by)
> 
> It seems enable_cheapest_group_by would be better name for the flag.
> 

This was renamed to enable_order_by_reordering in one of the fixup patches.

Attached is a patch merging 0001 and all the fixup patches, i.e. the
patch I plan to commit.

There was a minor test failure - the new GUC was not added to the sample
config file, so 003_check_guc.pl was failing.

I'm not including the 0007 part, because that's meant to demonstrate
plan changes with disabled optimization, which would confuse cfbot.

regards

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
Pushed, after going through the patch once more, running check-world
under valgrind, and updating the commit message.

Thanks to everyone who reviewed/tested this patch!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
David Rowley
Дата:
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Pushed, after going through the patch once more, running check-world
> under valgrind, and updating the commit message.

I'm just in this general area of the code again today and wondered
about the header comment for the preprocess_groupclause() function.

It says:

 * In principle it might be interesting to consider other orderings of the
 * GROUP BY elements, which could match the sort ordering of other
 * possible plans (eg an indexscan) and thereby reduce cost.  We don't
 * bother with that, though.  Hashed grouping will frequently win anyway.

I'd say this commit makes that paragraph mostly obsolete.  It's only
true now in the sense that we don't try orders that suit some index
that would provide pre-sorted results for a GroupAggregate path.  The
comment leads me to believe that we don't do anything at all to find a
better order, and that's not true now.

David



Re: POC: GROUP BY optimization

От
David Rowley
Дата:
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Pushed, after going through the patch once more, running check-world
> under valgrind, and updating the commit message.

I'm still working in this area and I noticed that db0d67db2 updated
some regression tests in partition_aggregate.out without any care as
to what the test was testing.

The comment above the test reads:

-- Without ORDER BY clause, to test Gather at top-most path

and you've changed the expected plan from being a parallel plan with a
Gather to being a serial plan. So it looks like the test might have
become useless.

I see that the original plan appears to come back with some
adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a
bit strange to me that the changes with this patch would cause a
change of plan for this. There is only 1 GROUP BY column in the query
in question. There's no rearrangement to do with a single column GROUP
BY.

David



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 7/15/22 07:18, David Rowley wrote:
> On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Pushed, after going through the patch once more, running check-world
>> under valgrind, and updating the commit message.
> 
> I'm still working in this area and I noticed that db0d67db2 updated
> some regression tests in partition_aggregate.out without any care as
> to what the test was testing.
> 
> The comment above the test reads:
> 
> -- Without ORDER BY clause, to test Gather at top-most path
> 
> and you've changed the expected plan from being a parallel plan with a
> Gather to being a serial plan. So it looks like the test might have
> become useless.
> 

I agree this is a mistake in db0d67db2 that makes the test useless.

> I see that the original plan appears to come back with some
> adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a
> bit strange to me that the changes with this patch would cause a
> change of plan for this. There is only 1 GROUP BY column in the query
> in question. There's no rearrangement to do with a single column GROUP
> BY.
> 

It might seem a bit strange, but the patch tweaked the costing a bit, so
it's not entirely unexpected. I'd bet the plan cost changed just a teeny
bit, but enough to change the cheapest plan. The costing changed for all
group counts, including a single group.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Michael Paquier
Дата:
On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
> I agree this is a mistake in db0d67db2 that makes the test useless.

Tomas, please note that this is an open item assigned to you.  Are you
planning to do something with these regression tests by beta3?
--
Michael

Вложения

Re: POC: GROUP BY optimization

От
David Rowley
Дата:
On Tue, 2 Aug 2022 at 22:21, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
> > I agree this is a mistake in db0d67db2 that makes the test useless.
>
> Tomas, please note that this is an open item assigned to you.  Are you
> planning to do something with these regression tests by beta3?

There's still something to do there for PG15, but 1349d2790 (just gone
in to master) made those two tests run as a parallel query again.

David



Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 8/2/22 13:14, David Rowley wrote:
> On Tue, 2 Aug 2022 at 22:21, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
>>> I agree this is a mistake in db0d67db2 that makes the test useless.
>>
>> Tomas, please note that this is an open item assigned to you.  Are you
>> planning to do something with these regression tests by beta3?
> 
> There's still something to do there for PG15, but 1349d2790 (just gone
> in to master) made those two tests run as a parallel query again.
> 

Hi,

I've been looking at this, and as I speculated before it's due to the
sort costing changing a little bit. On PG14, the costing of the plans
looks like this:

 Gather  (cost=1869.39..2823.15 rows=8 width=52)

and with disabled parallelism, it's like this:

 Append  (cost=998.04..3000.64 rows=10 width=52)

so there's a (fairly small) diffrence in favor of the parallel plan. But
on PG15 it's the other way around - the selected non-parallel one is
costed like this:

 Append  (cost=779.41..2490.61 rows=10 width=52)

and by setting parallel_setup_cost=0 you get this:

 Gather  (cost=700.34..1531.76 rows=8 width=52)

So with the setup cost included it's ~2531, and it loses to the simple
plan. This happens because the patch changed sort costing - the same
sort on PG14 and PG15 looks like this:

 PG14: ->  Sort  (cost=998.04..1028.04 rows=12000 width=13)

 PG15: ->  Sort  (cost=779.41..809.41 rows=12000 width=13)

As mentioned, the commit tweaked sort costing - before it was pretty
much just

  comparison_cost * tuples * LOG2(tuples)

but the patch needs to cost different pathkey orderings, and consider
that maybe we don't need to compare all the keys (which the original
costing kind assumes). That's the whole point of this optimization.

The costing (compute_cpu_sort_cost) considers a couple other things,
like cost of the comparator function for the data type, width of the
values, groupings determined by preceding keys, and so on.

It might seem strange that a query with a single pathkey changes, but
that single pathkey is costed the same way, of course. In principle we
might have a special costing for this case, but I'd bet that would
result in pretty surprising inconsistencies when adding a sort key
(going from 1 to 2 keys).

So I don't think the current costing is wrong, but it certainly is more
complex. But the test does not test what it intended - I have two ideas
how to make it work:

1) increase the number of rows in the table

2) increase cpu_operator_cost (for that one test?)

3) tweak the costing somehow, to increase the cost a bit

Both (1) and (2) work - I've tried doubling the number of rows or
setting the operator cost to 0.005, and that does the trick, but maybe a
smaller change would be enough.

I don't like (3) very much - changing the costing just to get the same
test behavior as on older release does not seem very principled. Yes,
maybe it should be tweaked, but not because of a regression test.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: POC: GROUP BY optimization

От
David Rowley
Дата:
On Thu, 18 Aug 2022 at 02:46, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> So I don't think the current costing is wrong, but it certainly is more
> complex. But the test does not test what it intended - I have two ideas
> how to make it work:
>
> 1) increase the number of rows in the table
>
> 2) increase cpu_operator_cost (for that one test?)
>
> 3) tweak the costing somehow, to increase the cost a bit

Why not, 4) SET parallel_setup_cost = 0;  there are plenty of other
places we do just that so we get a parallel plan without having to
generate enough cost to drown out the parallel worker startup cost.

Here are a couple of patches to demo the idea.

David

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 8/18/22 03:32, David Rowley wrote:
> On Thu, 18 Aug 2022 at 02:46, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> So I don't think the current costing is wrong, but it certainly is more
>> complex. But the test does not test what it intended - I have two ideas
>> how to make it work:
>>
>> 1) increase the number of rows in the table
>>
>> 2) increase cpu_operator_cost (for that one test?)
>>
>> 3) tweak the costing somehow, to increase the cost a bit
> 
> Why not, 4) SET parallel_setup_cost = 0;  there are plenty of other
> places we do just that so we get a parallel plan without having to
> generate enough cost to drown out the parallel worker startup cost.
> 
> Here are a couple of patches to demo the idea.
> 

Yeah, that's an option too. I should have mentioned it along with the
cpu_operator_cost.

BTW would you mind taking a look at the costing? I think it's fine, but
it would be good if someone not involved in the patch takes a look.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
"Jonathan S. Katz"
Дата:
On 8/18/22 3:29 AM, Tomas Vondra wrote:
> 
> 
> On 8/18/22 03:32, David Rowley wrote:

>> Here are a couple of patches to demo the idea.
>>
> 
> Yeah, that's an option too. I should have mentioned it along with the
> cpu_operator_cost.
> 
> BTW would you mind taking a look at the costing? I think it's fine, but
> it would be good if someone not involved in the patch takes a look.

With RMT hat on, just a friendly reminder that this is still on the open 
items list[1] -- we have the Beta 4 release next week and it would be 
great if we can get this resolved prior to it.

Thanks!

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items
[2] 
https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6c69@postgresql.org

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 8/30/22 15:15, Jonathan S. Katz wrote:
> On 8/18/22 3:29 AM, Tomas Vondra wrote:
>>
>>
>> On 8/18/22 03:32, David Rowley wrote:
> 
>>> Here are a couple of patches to demo the idea.
>>>
>>
>> Yeah, that's an option too. I should have mentioned it along with the
>> cpu_operator_cost.
>>
>> BTW would you mind taking a look at the costing? I think it's fine, but
>> it would be good if someone not involved in the patch takes a look.
> 
> With RMT hat on, just a friendly reminder that this is still on the open
> items list[1] -- we have the Beta 4 release next week and it would be
> great if we can get this resolved prior to it.
> 

I know. I'll fix this by tweaking one of the cost gucs, mentioned by David.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
"Jonathan S. Katz"
Дата:
On 9/1/22 9:06 AM, Tomas Vondra wrote:
> 
> 
> On 8/30/22 15:15, Jonathan S. Katz wrote:
>> On 8/18/22 3:29 AM, Tomas Vondra wrote:
>>>
>>>
>>> On 8/18/22 03:32, David Rowley wrote:
>>
>>>> Here are a couple of patches to demo the idea.
>>>>
>>>
>>> Yeah, that's an option too. I should have mentioned it along with the
>>> cpu_operator_cost.
>>>
>>> BTW would you mind taking a look at the costing? I think it's fine, but
>>> it would be good if someone not involved in the patch takes a look.
>>
>> With RMT hat on, just a friendly reminder that this is still on the open
>> items list[1] -- we have the Beta 4 release next week and it would be
>> great if we can get this resolved prior to it.
>>
> 
> I know. I'll fix this by tweaking one of the cost gucs, mentioned by David.

Great -- thanks for the update!

Jonathan

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 9/1/22 16:05, Jonathan S. Katz wrote:
> On 9/1/22 9:06 AM, Tomas Vondra wrote:
>>
>>
>> On 8/30/22 15:15, Jonathan S. Katz wrote:
>>> On 8/18/22 3:29 AM, Tomas Vondra wrote:
>>>>
>>>>
>>>> On 8/18/22 03:32, David Rowley wrote:
>>>
>>>>> Here are a couple of patches to demo the idea.
>>>>>
>>>>
>>>> Yeah, that's an option too. I should have mentioned it along with the
>>>> cpu_operator_cost.
>>>>
>>>> BTW would you mind taking a look at the costing? I think it's fine, but
>>>> it would be good if someone not involved in the patch takes a look.
>>>
>>> With RMT hat on, just a friendly reminder that this is still on the open
>>> items list[1] -- we have the Beta 4 release next week and it would be
>>> great if we can get this resolved prior to it.
>>>
>>
>> I know. I'll fix this by tweaking one of the cost gucs, mentioned by
>> David.
> 
> Great -- thanks for the update!
> 

I've pushed the fix to 15+master. In the end I just used David's patches
that set parallel_setup_cost to 0.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
"Jonathan S. Katz"
Дата:
On 9/4/22 6:14 PM, Tomas Vondra wrote:

> I've pushed the fix to 15+master. In the end I just used David's patches
> that set parallel_setup_cost to 0.

Thanks! I have closed the open item.

Jonathan

Вложения

Re: POC: GROUP BY optimization

От
Michael Paquier
Дата:
On Mon, Sep 05, 2022 at 12:14:48AM +0200, Tomas Vondra wrote:
> I've pushed the fix to 15+master. In the end I just used David's patches
> that set parallel_setup_cost to 0.

Thanks, Tomas!
--
Michael

Вложения

Re: POC: GROUP BY optimization

От
David Rowley
Дата:
On Wed, 13 Jul 2022 at 15:37, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm just in this general area of the code again today and wondered
> about the header comment for the preprocess_groupclause() function.
>
> It says:
>
>  * In principle it might be interesting to consider other orderings of the
>  * GROUP BY elements, which could match the sort ordering of other
>  * possible plans (eg an indexscan) and thereby reduce cost.  We don't
>  * bother with that, though.  Hashed grouping will frequently win anyway.
>
> I'd say this commit makes that paragraph mostly obsolete.  It's only
> true now in the sense that we don't try orders that suit some index
> that would provide pre-sorted results for a GroupAggregate path.  The
> comment leads me to believe that we don't do anything at all to find a
> better order, and that's not true now.

I've just pushed a fix for this.

David



Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 3/10/2022 21:56, Tom Lane wrote:
 > Revert "Optimize order of GROUP BY keys".
 >
 > This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
 > several follow-on fixes.
 > ...
 > Since we're hard up against the release deadline for v15, let's
 > revert these changes for now.  We can always try again later.

It may be time to restart the project. As a first step, I rebased the 
patch on the current master. It wasn't trivial because of some latest 
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to 
the reasons uttered in the revert commit.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 7/20/23 08:37, Andrey Lepikhov wrote:
> On 3/10/2022 21:56, Tom Lane wrote:
>> Revert "Optimize order of GROUP BY keys".
>>
>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>> several follow-on fixes.
>> ...
>> Since we're hard up against the release deadline for v15, let's
>> revert these changes for now.  We can always try again later.
> 
> It may be time to restart the project. As a first step, I rebased the
> patch on the current master. It wasn't trivial because of some latest
> optimizations (a29eab, 1349d27 and 8d83a5d).
> Now, Let's repeat the review and rewrite the current path according to
> the reasons uttered in the revert commit.

I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now.  We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 
> I think the fundamental task is to make the costing more reliable, and
> the commit message 443df6e2db points out a couple challenges in this
> area. Not sure how feasible it is to address enough of them ...
> 
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
> 
> 2) estimating quicksort comparisons - This relies on ndistinct
> estimates, and I'm not sure how much more reliable we can make those.
> Probably not much :-( Not sure what to do about this, the only thing I
> can think of is to track "reliability" of the estimates and only do the
> reordering if we have high confidence in the estimates. That means we'll
> miss some optimization opportunities, but it should limit the risk.
For me personally, the most challenging issue is:
3. Imbalance, induced by the cost_sort() changes. It may increase or 
decrease the contribution of sorting to the total cost compared to other 
factors and change choice of sorted paths.

-- 
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now.  We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 
> I think the fundamental task is to make the costing more reliable, and
> the commit message 443df6e2db points out a couple challenges in this
> area. Not sure how feasible it is to address enough of them ...
> 
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
> 
> 2) estimating quicksort comparisons - This relies on ndistinct
> estimates, and I'm not sure how much more reliable we can make those.
> Probably not much :-( Not sure what to do about this, the only thing I
> can think of is to track "reliability" of the estimates and only do the
> reordering if we have high confidence in the estimates. That means we'll
> miss some optimization opportunities, but it should limit the risk.
I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding 
the new cost model at all. We can sort GROUP BY columns according to the 
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main 
supporting point here is that with this patch, people couldn't optimize 
the query by themselves, organizing the order of the columns in a more 
optimal way. But now we have at least the GUC to switch off the 
behaviour introduced here. Also, some extensions, like the well-known 
pg_hint_plan, can help with automation.
So, how about committing of the undoubted part of the feature and 
working on the cost model in a new thread?

[1] 
https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com

-- 
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:
On 7/24/23 04:10, Andrey Lepikhov wrote:
> On 20/7/2023 18:46, Tomas Vondra wrote:
>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>> On 3/10/2022 21:56, Tom Lane wrote:
>>>> Revert "Optimize order of GROUP BY keys".
>>>>
>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>>> several follow-on fixes.
>>>> ...
>>>> Since we're hard up against the release deadline for v15, let's
>>>> revert these changes for now.  We can always try again later.
>>>
>>> It may be time to restart the project. As a first step, I rebased the
>>> patch on the current master. It wasn't trivial because of some latest
>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>> Now, Let's repeat the review and rewrite the current path according to
>>> the reasons uttered in the revert commit.
>>
>> I think the fundamental task is to make the costing more reliable, and
>> the commit message 443df6e2db points out a couple challenges in this
>> area. Not sure how feasible it is to address enough of them ...
>>
>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>> some microbenchmarks and tuning the costs for the most expensive cases.
>>
>> 2) estimating quicksort comparisons - This relies on ndistinct
>> estimates, and I'm not sure how much more reliable we can make those.
>> Probably not much :-( Not sure what to do about this, the only thing I
>> can think of is to track "reliability" of the estimates and only do the
>> reordering if we have high confidence in the estimates. That means we'll
>> miss some optimization opportunities, but it should limit the risk.
> I read up on the history of this thread.
> As I see, all the problems mentioned above can be beaten by excluding
> the new cost model at all. We can sort GROUP BY columns according to the
> 'ndistinct' value.
> I see the reason for introducing the cost model in [1]. The main
> supporting point here is that with this patch, people couldn't optimize
> the query by themselves, organizing the order of the columns in a more
> optimal way. But now we have at least the GUC to switch off the
> behaviour introduced here. Also, some extensions, like the well-known
> pg_hint_plan, can help with automation.

I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.

> So, how about committing of the undoubted part of the feature and
> working on the cost model in a new thread?
> 

But Tom's commit message says this:

    Worse, to arrive at estimates of the number of calls made to the
    lower-order-column comparison functions, the code needs to make
    estimates of the numbers of distinct values of multiple columns,
    which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 24/7/2023 16:56, Tomas Vondra wrote:
> On 7/24/23 04:10, Andrey Lepikhov wrote:
>> On 20/7/2023 18:46, Tomas Vondra wrote:
>>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>>> On 3/10/2022 21:56, Tom Lane wrote:
>>>>> Revert "Optimize order of GROUP BY keys".
>>>>>
>>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>>>> several follow-on fixes.
>>>>> ...
>>>>> Since we're hard up against the release deadline for v15, let's
>>>>> revert these changes for now.  We can always try again later.
>>>>
>>>> It may be time to restart the project. As a first step, I rebased the
>>>> patch on the current master. It wasn't trivial because of some latest
>>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>>> Now, Let's repeat the review and rewrite the current path according to
>>>> the reasons uttered in the revert commit.
>>>
>>> I think the fundamental task is to make the costing more reliable, and
>>> the commit message 443df6e2db points out a couple challenges in this
>>> area. Not sure how feasible it is to address enough of them ...
>>>
>>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>>> some microbenchmarks and tuning the costs for the most expensive cases.
>>>
>>> 2) estimating quicksort comparisons - This relies on ndistinct
>>> estimates, and I'm not sure how much more reliable we can make those.
>>> Probably not much :-( Not sure what to do about this, the only thing I
>>> can think of is to track "reliability" of the estimates and only do the
>>> reordering if we have high confidence in the estimates. That means we'll
>>> miss some optimization opportunities, but it should limit the risk.
>> I read up on the history of this thread.
>> As I see, all the problems mentioned above can be beaten by excluding
>> the new cost model at all. We can sort GROUP BY columns according to the
>> 'ndistinct' value.
>> I see the reason for introducing the cost model in [1]. The main
>> supporting point here is that with this patch, people couldn't optimize
>> the query by themselves, organizing the order of the columns in a more
>> optimal way. But now we have at least the GUC to switch off the
>> behaviour introduced here. Also, some extensions, like the well-known
>> pg_hint_plan, can help with automation.
> 
> I think the main concern is that if we reorder the group keys and get it
> wrong, it's a regression. If that happens often (due to costing based on
> poor stats), it's a problem. Yes, there's a GUC, but that's a rather
> blunt instrument, unfortunately.
I see. My point here is if the ndistinct of one column is much more than 
the ndistinct of another, it is more probable that this correlation will 
be kept in the grouping phase. Here we can get some regression, which 
can be overweighed by the possibility below.
> 
>> So, how about committing of the undoubted part of the feature and
>> working on the cost model in a new thread?
>>
> 
> But Tom's commit message says this:
> 
>      Worse, to arrive at estimates of the number of calls made to the
>      lower-order-column comparison functions, the code needs to make
>      estimates of the numbers of distinct values of multiple columns,
>      which are necessarily even less trustworthy than per-column stats.
> 
> so I'm not sure this really counts as "undoubted".
Don't try to estimate multiple  columns - just sort columns according to 
the value of ndistinct as a heuristic.
I think we should estimate the number of values of multiple columns only 
if we have extended statistics on these columns. And this can extend the 
applicability of extended statistics.

The suggestion above is just an attempt to gather low-hanging fruit 
right now. If it is not applicable, we will go a long way.

-- 
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Tomas Vondra
Дата:

On 7/24/23 14:04, Andrey Lepikhov wrote:
> On 24/7/2023 16:56, Tomas Vondra wrote:
>> On 7/24/23 04:10, Andrey Lepikhov wrote:
>>> On 20/7/2023 18:46, Tomas Vondra wrote:
>>>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>>>> On 3/10/2022 21:56, Tom Lane wrote:
>>>>>> Revert "Optimize order of GROUP BY keys".
>>>>>>
>>>>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>>>>> several follow-on fixes.
>>>>>> ...
>>>>>> Since we're hard up against the release deadline for v15, let's
>>>>>> revert these changes for now.  We can always try again later.
>>>>>
>>>>> It may be time to restart the project. As a first step, I rebased the
>>>>> patch on the current master. It wasn't trivial because of some latest
>>>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>>>> Now, Let's repeat the review and rewrite the current path according to
>>>>> the reasons uttered in the revert commit.
>>>>
>>>> I think the fundamental task is to make the costing more reliable, and
>>>> the commit message 443df6e2db points out a couple challenges in this
>>>> area. Not sure how feasible it is to address enough of them ...
>>>>
>>>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>>>> some microbenchmarks and tuning the costs for the most expensive cases.
>>>>
>>>> 2) estimating quicksort comparisons - This relies on ndistinct
>>>> estimates, and I'm not sure how much more reliable we can make those.
>>>> Probably not much :-( Not sure what to do about this, the only thing I
>>>> can think of is to track "reliability" of the estimates and only do the
>>>> reordering if we have high confidence in the estimates. That means
>>>> we'll
>>>> miss some optimization opportunities, but it should limit the risk.
>>> I read up on the history of this thread.
>>> As I see, all the problems mentioned above can be beaten by excluding
>>> the new cost model at all. We can sort GROUP BY columns according to the
>>> 'ndistinct' value.
>>> I see the reason for introducing the cost model in [1]. The main
>>> supporting point here is that with this patch, people couldn't optimize
>>> the query by themselves, organizing the order of the columns in a more
>>> optimal way. But now we have at least the GUC to switch off the
>>> behaviour introduced here. Also, some extensions, like the well-known
>>> pg_hint_plan, can help with automation.
>>
>> I think the main concern is that if we reorder the group keys and get it
>> wrong, it's a regression. If that happens often (due to costing based on
>> poor stats), it's a problem. Yes, there's a GUC, but that's a rather
>> blunt instrument, unfortunately.
> I see. My point here is if the ndistinct of one column is much more than
> the ndistinct of another, it is more probable that this correlation will
> be kept in the grouping phase. Here we can get some regression, which
> can be overweighed by the possibility below.

I think the word "probable" hits what the problem is. Because what if it
isn't? Also, we don't actually need ndistinct for individual attributes,
but for the groups of attributes. Imagine you do

GROUP BY a, b

so you need to estimate whether to do (a,b) or (b,a). You need to calculate

  ndistinct(a,b) / ndistinct(a)

and

  ndistinct(b,a) / ndistinct(b)

And similarly for more grouping keys. This is why we have ndistinct
extended statistics, mostly.

>>
>>> So, how about committing of the undoubted part of the feature and
>>> working on the cost model in a new thread?
>>>
>>
>> But Tom's commit message says this:
>>
>>      Worse, to arrive at estimates of the number of calls made to the
>>      lower-order-column comparison functions, the code needs to make
>>      estimates of the numbers of distinct values of multiple columns,
>>      which are necessarily even less trustworthy than per-column stats.
>>
>> so I'm not sure this really counts as "undoubted".
> Don't try to estimate multiple  columns - just sort columns according to
> the value of ndistinct as a heuristic.

Surely you realize how easy such simple heuristics can fail, right?

> I think we should estimate the number of values of multiple columns only
> if we have extended statistics on these columns. And this can extend the
> applicability of extended statistics.
> 

I don't see why this should estimate ndistinct differently than every
other place. That'd certainly be surprising. I can be convinced, but
there needs to be sound justification why that's the right way to do
(i.e. why it's less likely to cause poor planning choices).

> The suggestion above is just an attempt to gather low-hanging fruit
> right now. If it is not applicable, we will go a long way.
> 
I'm not sure this qualities as "low hanging fruit" - that generally
means simple changes with little risk. But you're using it for rather
simplistic heuristic that can easily misfire (I think). At least that's
my impression, I can be convinced otherwise.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
Hi,
Here is the patch rebased on the current master. Also, I fixed some 
minor slips and one static analyzer warning.
This is just for adding to the next commitfest and enforcing work with 
this patch.

One extra difference in newly added postgres_fdw tests is caused by this 
patch - see changes in the query plan in attachment.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now.  We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so 
many different operators: 26 with fixed width and 16 with variable width:

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
   JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
   JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something like 
below:

CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
     nrows integer,
     exec_time float
)  AS $$
DECLARE
   data json;
   step integer;
BEGIN
   SET work_mem='1GB';

   FOR step IN 0..3 LOOP
     SELECT pow(100, step)::integer INTO nrows;
     DROP TABLE IF EXISTS test CASCADE;
     EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
                     FROM generate_series(1,%s) AS gs;', typ, nrows);

     EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
       SELECT * FROM test ORDER BY (x) DESC INTO data;
     SELECT data->0->'Execution Time' INTO exec_time;
     RETURN NEXT;
   END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow 
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for 
the integer type (as a basis) we can calculate costs for other 
operators. Variable-width types, i think, could require more complex 
technique to check dependency on the length.

Does this way look worthwhile?

-- 
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Andrey Lepikhov
Дата:
On 20/7/2023 18:46, Tomas Vondra wrote:
> 2) estimating quicksort comparisons - This relies on ndistinct
> estimates, and I'm not sure how much more reliable we can make those.
> Probably not much :-( Not sure what to do about this, the only thing I
> can think of is to track "reliability" of the estimates and only do the
> reordering if we have high confidence in the estimates. That means we'll
> miss some optimization opportunities, but it should limit the risk.
According to this issue, I see two options:
1. Go through the grouping column list and find the most reliable one. 
If we have a unique column or column with statistics on the number of 
distinct values, which is significantly more than ndistincts for other 
grouping columns, we can place this column as the first in the grouping. 
It should guarantee the reliability of such a decision, isn't it?
2. If we have extended statistics on distinct values and these 
statistics cover some set of first columns in the grouping list, we can 
optimize these positions. It also looks reliable.

Any thoughts?

-- 
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
New version of the patch. Fixed minor inconsistencies and rebased onto 
current master.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi!

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
> New version of the patch. Fixed minor inconsistencies and rebased onto
> current master.

Thank you (and other authors) for working on this subject.  Indeed to
GROUP BY clauses are order-agnostic.  Reordering them in the most
suitable order could give up significant query planning benefits.  I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

3) I see that get_useful_group_keys_orderings() makes 3 calls to
get_cheapest_group_keys_order() function.  Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys.  However, cost estimation implies the system
catalog lookups (that is quite expensive).  I wonder if we could
change the algorithm.  Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it.  So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest.  I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls.  So, it probably should be
somewhere in create_ordinary_grouping_paths().

4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 21/12/2023 17:53, Alexander Korotkov wrote:
> On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> New version of the patch. Fixed minor inconsistencies and rebased onto
>> current master.
> Thank you (and other authors) for working on this subject.  Indeed to
> GROUP BY clauses are order-agnostic.  Reordering them in the most
> suitable order could give up significant query planning benefits.  I
> went through the thread: I see significant work has been already made
> on this patch, the code is quite polished.
Maybe, but issues, mentioned in [1], still not resolved. It is the only 
reason, why this thread hasn't been active.
> I'd like to make some notes.
> 1) As already mentioned, there is clearly a repetitive pattern for the
> code following after get_useful_group_keys_orderings() calls.  I think
> it would be good to extract it into a separate function.  Please, do
> this as a separate patch coming before the group-by patch. That would
> simplify the review.
Yeah, these parts of code a bit different. I will try to make common 
routine.
> 2) I wonder what planning overhead this patch could introduce?  Could
> you try to measure the worst case?  What if we have a table with a lot
> of indexes and a long list of group-by clauses partially patching
> every index.  This should give us an understanding on whether we need
> a separate GUC to control this feature.
Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to
> get_cheapest_group_keys_order() function.  Each time
> get_cheapest_group_keys_order() performs the cost estimate and
> reorders the free keys.  However, cost estimation implies the system
> catalog lookups (that is quite expensive).  I wonder if we could
> change the algorithm.  Could we just sort the group-by keys by cost
> once, save this ordering and then just re-use it.  So, every time we
> need to reorder a group by, we can just pull the required keys to the
> top and use saved ordering for the rest.  I also wonder if we could do
> this once for add_paths_to_grouping_rel() and
> create_partial_grouping_paths() calls.  So, it probably should be
> somewhere in create_ordinary_grouping_paths().
Thanks for the idea!> 4) I think we can do some optimizations when 
enable_incremental_sort
> == off.  Then in get_useful_group_keys_orderings() we should only deal
> with input_path fully matching the group-by clause, and try only full
> match of group-by output to the required order.
Oh, we had designed before the incremental sort was invented. Will see 
what we can do here.

[1] 
https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru> wrote:
> >> New version of the patch. Fixed minor inconsistencies and rebased onto
> >> current master.
> > Thank you (and other authors) for working on this subject.  Indeed to
> > GROUP BY clauses are order-agnostic.  Reordering them in the most
> > suitable order could give up significant query planning benefits.  I
> > went through the thread: I see significant work has been already made
> > on this patch, the code is quite polished.
> Maybe, but issues, mentioned in [1], still not resolved. It is the only
> reason, why this thread hasn't been active.

Yes, this makes sense.  I have a couple of points from me on this subject.
1) The patch reorders GROUP BY items not only to make comparison
cheaper but also to match the ordering of input paths and to match the
ORDER BY clause.  Thus, even if we leave aside for now sorting GROUP
BY items by their cost, the patch will remain valuable.
2) An accurate estimate of the sorting cost is quite a difficult task.
What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)?  Within the group, we could keep the original
order of items.

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> 2) An accurate estimate of the sorting cost is quite a difficult task.

Indeed.

> What if we make a simple rule of thumb that sorting integers and
> floats is cheaper than sorting numerics and strings with collation C,
> in turn, that is cheaper than sorting collation-aware strings
> (probably more groups)?  Within the group, we could keep the original
> order of items.

I think it's a fool's errand to even try to separate different sort
column orderings by cost.  We simply do not have sufficiently accurate
cost information.  The previous patch in this thread got reverted because
of that (well, also some implementation issues, but mostly that), and
nothing has happened to make me think that another try will fare any
better.

            regards, tom lane



Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > 2) An accurate estimate of the sorting cost is quite a difficult task.
>
> Indeed.
>
> > What if we make a simple rule of thumb that sorting integers and
> > floats is cheaper than sorting numerics and strings with collation C,
> > in turn, that is cheaper than sorting collation-aware strings
> > (probably more groups)?  Within the group, we could keep the original
> > order of items.
>
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

If there is a choice of what to compare first: 8-bytes integers or
collation-aware texts possibly toasted, then the answer is pretty
evident for me.  For sure, there are cases then this choice is wrong.
But even if all the integers appear to be the same, the penalty isn't
that much.

Besides sorting column orderings by cost, this patch also tries to
match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
you think there is a chance for the second part if we leave the cost
part aside?

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.

> Besides sorting column orderings by cost, this patch also tries to
> match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
> you think there is a chance for the second part if we leave the cost
> part aside?

I think it's definitely reasonable to try to match up available
orderings, because that doesn't really require fine distinctions
of cost: either it matches or it doesn't.  Eliminating a sort step
entirely is clearly a win.  (Incremental sort complicates this though.
I doubt our cost model for incremental sorts is any good either, so
I am not eager to rely on that more heavily.)

            regards, tom lane



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 27/12/2023 11:15, Alexander Korotkov wrote:
> On Wed, Dec 27, 2023 at 5:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alexander Korotkov <aekorotkov@gmail.com> writes:
>>> 2) An accurate estimate of the sorting cost is quite a difficult task.
>>
>> Indeed.
>>
>>> What if we make a simple rule of thumb that sorting integers and
>>> floats is cheaper than sorting numerics and strings with collation C,
>>> in turn, that is cheaper than sorting collation-aware strings
>>> (probably more groups)?  Within the group, we could keep the original
>>> order of items.
>>
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.  We simply do not have sufficiently accurate
>> cost information.  The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.
To be clear. In [1], I mentioned we can perform micro-benchmarks and 
structure costs of operators. At least for fixed-length operators, it is 
relatively easy. So, the main block here is an accurate prediction of 
ndistincts for different combinations of columns. Does it make sense to 
continue to design the feature in the direction of turning on choosing 
between different sort column orderings if we have extended statistics 
on the columns?

[1] 
https://www.postgresql.org/message-id/e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
> To be clear. In [1], I mentioned we can perform micro-benchmarks and 
> structure costs of operators. At least for fixed-length operators, it is 
> relatively easy.

I repeat what I said: this is a fool's errand.  You will not get
trustworthy results even for the cases you measured, let alone
all the rest.  I'd go as far as to say I would not believe your
microbenchmarks, because they would only apply for one platform,
compiler, backend build, phase of the moon, etc.

            regards, tom lane



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 27/12/2023 12:07, Tom Lane wrote:
> Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
>> To be clear. In [1], I mentioned we can perform micro-benchmarks and
>> structure costs of operators. At least for fixed-length operators, it is
>> relatively easy.
> 
> I repeat what I said: this is a fool's errand.  You will not get
> trustworthy results even for the cases you measured, let alone
> all the rest.  I'd go as far as to say I would not believe your
> microbenchmarks, because they would only apply for one platform,
> compiler, backend build, phase of the moon, etc.

Thanks for the explanation.
I removed all cost-related codes. It still needs to be finished; I will 
smooth the code further and rewrite regression tests - many of them 
without cost-dependent reorderings look silly. Also, remember 
Alexander's remarks, which must be implemented, too.
But already here, it works well. Look:

Preliminaries:
CREATE TABLE t(x int, y int, z text, w int);
INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs
   FROM generate_series(1,10000) AS gs;
CREATE INDEX abc ON t(x,y);
ANALYZE t;
SET enable_hashagg = 'off';

This patch eliminates unneeded Sort operation:
explain SELECT x,y FROM t GROUP BY (x,y);
explain SELECT x,y FROM t GROUP BY (y,x);

Engages incremental sort:
explain SELECT x,y FROM t GROUP BY (x,y,z,w);
explain SELECT x,y FROM t GROUP BY (z,y,w,x);
explain SELECT x,y FROM t GROUP BY (w,z,x,y);
explain SELECT x,y FROM t GROUP BY (w,x,z,y);

Works with subqueries:
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1
GROUP BY (w,x,z,y);
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1
GROUP BY (w,x,z,y);

But arrangement with an ORDER BY clause doesn't work:

DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);

I think the reason is that the sort_pathkeys and group_pathkeys are 
physically different structures, and we can't just compare pointers here.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> But arrangement with an ORDER BY clause doesn't work:
>
> DROP INDEX abc;
> explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);
>
> I think the reason is that the sort_pathkeys and group_pathkeys are
> physically different structures, and we can't just compare pointers here.

I haven't yet looked into the code.  But this looks strange to me.
Somehow, optimizer currently matches index pathkeys to ORDER BY
pathkeys.  If GROUP BY pathkeys could be matched to index pathkeys,
then it should be possible to match them to ORDER BY pathkeys too.


------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 28/12/2023 18:29, Alexander Korotkov wrote:
> On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> But arrangement with an ORDER BY clause doesn't work:
>>
>> DROP INDEX abc;
>> explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);
>>
>> I think the reason is that the sort_pathkeys and group_pathkeys are
>> physically different structures, and we can't just compare pointers here.
> 
> I haven't yet looked into the code.  But this looks strange to me.
> Somehow, optimizer currently matches index pathkeys to ORDER BY
> pathkeys.  If GROUP BY pathkeys could be matched to index pathkeys,
> then it should be possible to match them to ORDER BY pathkeys too.
Oh, I found the mistake: I got used to using GROUP BY and ORDER BY on 
many columns with round brackets. In the case of the grouping list, it 
doesn't change anything. But ordering treats it as a WholeRowVar and 
breaks group-by arrangement. Look:
explain (COSTS OFF) SELECT relname,reltuples FROM pg_class
GROUP BY relname,reltuples ORDER BY reltuples,relname;

  Group
    Group Key: reltuples, relname
    ->  Sort
          Sort Key: reltuples, relname
          ->  Seq Scan on pg_class
But:
explain (COSTS OFF) SELECT relname,reltuples FROM pg_class
GROUP BY relname,reltuples ORDER BY (reltuples,relname);

  Sort
    Sort Key: (ROW(reltuples, relname))
    ->  Group
          Group Key: relname, reltuples
          ->  Sort
                Sort Key: relname, reltuples
                ->  Seq Scan on pg_class

So, let's continue to work.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
Here is a new version of GROUP-BY optimization without sort model.

On 21/12/2023 17:53, Alexander Korotkov wrote:
> I'd like to make some notes.
> 
> 1) As already mentioned, there is clearly a repetitive pattern for the
> code following after get_useful_group_keys_orderings() calls.  I think
> it would be good to extract it into a separate function.  Please, do
> this as a separate patch coming before the group-by patch. That would
> simplify the review.
Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't 
practical, because it blows out the interface of the routine.

> 2) I wonder what planning overhead this patch could introduce?  Could
> you try to measure the worst case?  What if we have a table with a lot
> of indexes and a long list of group-by clauses partially patching
> every index.  This should give us an understanding on whether we need
> a separate GUC to control this feature.
In current implementation I don't anticipate any significant overhead. 
GUC is needed here to allow users adhere their own ordering and to 
disable feature in the case of problems.

> 4) I think we can do some optimizations when enable_incremental_sort
> == off.  Then in get_useful_group_keys_orderings() we should only deal
> with input_path fully matching the group-by clause, and try only full
> match of group-by output to the required order.
Hm, is it really make sense in current implementation?

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
vignesh C
Дата:
On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
>
> Here is a new version of GROUP-BY optimization without sort model.
>
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > I'd like to make some notes.
> >
> > 1) As already mentioned, there is clearly a repetitive pattern for the
> > code following after get_useful_group_keys_orderings() calls.  I think
> > it would be good to extract it into a separate function.  Please, do
> > this as a separate patch coming before the group-by patch. That would
> > simplify the review.
> Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
> practical, because it blows out the interface of the routine.
>
> > 2) I wonder what planning overhead this patch could introduce?  Could
> > you try to measure the worst case?  What if we have a table with a lot
> > of indexes and a long list of group-by clauses partially patching
> > every index.  This should give us an understanding on whether we need
> > a separate GUC to control this feature.
> In current implementation I don't anticipate any significant overhead.
> GUC is needed here to allow users adhere their own ordering and to
> disable feature in the case of problems.
>
> > 4) I think we can do some optimizations when enable_incremental_sort
> > == off.  Then in get_useful_group_keys_orderings() we should only deal
> > with input_path fully matching the group-by clause, and try only full
> > match of group-by output to the required order.
> Hm, is it really make sense in current implementation?

CFBot shows the following errors at [1] with:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
‘estimate_num_groups’:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
implicit declaration of function ‘estimate_num_groups_incremental’
[-Wimplicit-function-declaration]
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
[08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
previous prototype for ‘estimate_num_groups_incremental’
[-Wmissing-prototypes]
[08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
*root, List *groupExprs,
[08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
conflicting types for ‘estimate_num_groups_incremental’
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
previous implicit declaration of ‘estimate_num_groups_incremental’ was
here
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,

[1] - https://cirrus-ci.com/task/5074942069309440

Regards,
Vignesh



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 9/1/2024 16:45, vignesh C wrote:
> On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
>>
>> Here is a new version of GROUP-BY optimization without sort model.
>>
>> On 21/12/2023 17:53, Alexander Korotkov wrote:
>>> I'd like to make some notes.
>>>
>>> 1) As already mentioned, there is clearly a repetitive pattern for the
>>> code following after get_useful_group_keys_orderings() calls.  I think
>>> it would be good to extract it into a separate function.  Please, do
>>> this as a separate patch coming before the group-by patch. That would
>>> simplify the review.
>> Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
>> practical, because it blows out the interface of the routine.
>>
>>> 2) I wonder what planning overhead this patch could introduce?  Could
>>> you try to measure the worst case?  What if we have a table with a lot
>>> of indexes and a long list of group-by clauses partially patching
>>> every index.  This should give us an understanding on whether we need
>>> a separate GUC to control this feature.
>> In current implementation I don't anticipate any significant overhead.
>> GUC is needed here to allow users adhere their own ordering and to
>> disable feature in the case of problems.
>>
>>> 4) I think we can do some optimizations when enable_incremental_sort
>>> == off.  Then in get_useful_group_keys_orderings() we should only deal
>>> with input_path fully matching the group-by clause, and try only full
>>> match of group-by output to the required order.
>> Hm, is it really make sense in current implementation?
> 
> CFBot shows the following errors at [1] with:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
> ‘estimate_num_groups’:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
> implicit declaration of function ‘estimate_num_groups_incremental’
> [-Wimplicit-function-declaration]
> [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
> [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
> previous prototype for ‘estimate_num_groups_incremental’
> [-Wmissing-prototypes]
> [08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
> *root, List *groupExprs,
> [08:33:28.813] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
> conflicting types for ‘estimate_num_groups_incremental’
> [08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
> previous implicit declaration of ‘estimate_num_groups_incremental’ was
> here
> [08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
Hmm, I don't see this old code in these patches. Resend 0002-* because 
of trailing spaces.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Pavel Borisov
Дата:
Hi, Andrei! 
Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.

AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new version of the whole patchset should be sent even if most patches are unchanged.

Regards,
Pavel Borisov

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi!

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> Hmm, I don't see this old code in these patches. Resend 0002-* because
>> of trailing spaces.
>
>
> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new
versionof the whole patchset should be sent even if most patches are unchanged. 

Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 11/1/2024 18:30, Alexander Korotkov wrote:
> Hi!
> 
> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>> Hmm, I don't see this old code in these patches. Resend 0002-* because
>>> of trailing spaces.
>>
>>
>> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new
versionof the whole patchset should be sent even if most patches are unchanged.
 
> 
> Please, find the revised patchset with some refactoring and comments
> improvement from me.  I'll continue looking into this.
The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more 
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a 
difference between PostgreSQL v.15 and v.16, which causes changes in the 
code of this feature.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 11/1/2024 18:30, Alexander Korotkov wrote:
> > On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >>> Hmm, I don't see this old code in these patches. Resend 0002-* because
> >>> of trailing spaces.
> >>
> >>
> >> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new
versionof the whole patchset should be sent even if most patches are unchanged. 
> >
> > Please, find the revised patchset with some refactoring and comments
> > improvement from me.  I'll continue looking into this.
> The patch looks better, thanks to your refactoring.
> I propose additional comments and tests to make the code more
> understandable (see attachment).
> I intended to look into this part of the code more, but the tests show a
> difference between PostgreSQL v.15 and v.16, which causes changes in the
> code of this feature.

Makes sense.  I've incorporated your changes into the patchset.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 13/1/2024 22:00, Alexander Korotkov wrote:
> On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> On 11/1/2024 18:30, Alexander Korotkov wrote:
>>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>>>> Hmm, I don't see this old code in these patches. Resend 0002-* because
>>>>> of trailing spaces.
>>>>
>>>>
>>>> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a new
versionof the whole patchset should be sent even if most patches are unchanged.
 
>>>
>>> Please, find the revised patchset with some refactoring and comments
>>> improvement from me.  I'll continue looking into this.
>> The patch looks better, thanks to your refactoring.
>> I propose additional comments and tests to make the code more
>> understandable (see attachment).
>> I intended to look into this part of the code more, but the tests show a
>> difference between PostgreSQL v.15 and v.16, which causes changes in the
>> code of this feature.
> 
> Makes sense.  I've incorporated your changes into the patchset.
One more improvement. To underpin code change:

-               return cur_ec;  /* Match! */
+           {
+               /*
+                * Match!
+                *
+                * Copy the sortref if it wasn't set yet. That may happen if
+                * the ec was constructed from WHERE clause, i.e. it doesn't
+                * have a target reference at all.
+                */
+               if (cur_ec->ec_sortref == 0 && sortref > 0)
+                   cur_ec->ec_sortref = sortref;
+               return cur_ec;
+           }

I propose the test (see attachment). It shows why we introduce this 
change: GROUP-BY should juggle not only pathkeys generated by explicit 
sort operators but also planner-made, likewise in this example, by 
MergeJoin.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Sun, Jan 14, 2024 at 2:14 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 13/1/2024 22:00, Alexander Korotkov wrote:
> > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru> wrote:
> >> On 11/1/2024 18:30, Alexander Korotkov wrote:
> >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >>>>> Hmm, I don't see this old code in these patches. Resend 0002-* because
> >>>>> of trailing spaces.
> >>>>
> >>>>
> >>>> AFAIK, cfbot does not seek old versions of patchset parts in previous messages. So for it to run correctly, a
newversion of the whole patchset should be sent even if most patches are unchanged. 
> >>>
> >>> Please, find the revised patchset with some refactoring and comments
> >>> improvement from me.  I'll continue looking into this.
> >> The patch looks better, thanks to your refactoring.
> >> I propose additional comments and tests to make the code more
> >> understandable (see attachment).
> >> I intended to look into this part of the code more, but the tests show a
> >> difference between PostgreSQL v.15 and v.16, which causes changes in the
> >> code of this feature.
> >
> > Makes sense.  I've incorporated your changes into the patchset.
> One more improvement. To underpin code change:
>
> -               return cur_ec;  /* Match! */
> +           {
> +               /*
> +                * Match!
> +                *
> +                * Copy the sortref if it wasn't set yet. That may happen if
> +                * the ec was constructed from WHERE clause, i.e. it doesn't
> +                * have a target reference at all.
> +                */
> +               if (cur_ec->ec_sortref == 0 && sortref > 0)
> +                   cur_ec->ec_sortref = sortref;
> +               return cur_ec;
> +           }
>
> I propose the test (see attachment). It shows why we introduce this
> change: GROUP-BY should juggle not only pathkeys generated by explicit
> sort operators but also planner-made, likewise in this example, by
> MergeJoin.

Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.

Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.

Thanks
Richard

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Thank you for providing the test case relevant for this code change.
>> The revised patch incorporating this change is attached.  Now the
>> patchset looks good to me.  I'm going to push it if there are no
>> objections.
>
> Seems I'm late for the party.  Can we hold for several more days?  I'd
> like to have a review on this patch.

Sure, NP.  I'll hold it till your review.

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 15/1/2024 13:42, Richard Guo wrote:
> 
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com 
> <mailto:aekorotkov@gmail.com>> wrote:
> 
>     Thank you for providing the test case relevant for this code change.
>     The revised patch incorporating this change is attached.  Now the
>     patchset looks good to me.  I'm going to push it if there are no
>     objections.
> 
> 
> Seems I'm late for the party.  Can we hold for several more days?  I'd
> like to have a review on this patch.
Get on board! It looks like this feature needs as much review as 
possible (likewise SJE).

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Alena Rybakina
Дата:
On 15.01.2024 12:46, Andrei Lepikhov wrote:
> On 15/1/2024 13:42, Richard Guo wrote:
>>
>> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
>> <aekorotkov@gmail.com <mailto:aekorotkov@gmail.com>> wrote:
>>
>>     Thank you for providing the test case relevant for this code change.
>>     The revised patch incorporating this change is attached. Now the
>>     patchset looks good to me.  I'm going to push it if there are no
>>     objections.
>>
>>
>> Seems I'm late for the party.  Can we hold for several more days?  I'd
>> like to have a review on this patch.
> Get on board! It looks like this feature needs as much review as 
> possible (likewise SJE).
>
Hi! Thank you for your work on this issue! I believe that this will help 
the scheduler to make a more optimal query plan here and therefore speed 
up their execution.

I have reviewed patches and noticed that we can add some code 
refactoring. I have attached a diff file (group_by.diff) to this email.

The changes involve spelling corrections, renaming variables and porting 
some common parts.


In addition, I have a few questions, since some points in the code 
remained unclear to me.

1.  I didn't understand why we have a question in the comment next to 
the enable_group_by_reordering variable in 
src/backend/optimizer/path/pathkeys.c file, I assumed it was spelling 
and fixed it in the diff file.

2. Why do we set the variable (path = path_save) here 
(add_paths_to_grouping_rel function) if we change its variable below and 
we can pass path_save as a parameter?

foreach(lc2, pathkey_orderings)
{
     PathKeyInfo *info = (PathKeyInfo *) lfirst(lc2);

     /* restore the path (we replace it in the loop) */
     path = path_save;

     path = make_ordered_path(root,
                              grouped_rel,
                              path,
                              cheapest_path,
                              info->pathkeys);
     if (path == NULL)
         continue;

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Вложения

Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> Thank you for providing the test case relevant for this code change.
>> The revised patch incorporating this change is attached.  Now the
>> patchset looks good to me.  I'm going to push it if there are no
>> objections.
>
> Seems I'm late for the party.  Can we hold for several more days?  I'd
> like to have a review on this patch.

Sure, NP.  I'll hold it till your review.

Thanks.  Appreciate that.

I have briefly reviewed this patch and here are some comments.

* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys.  This does not make sense.  I guess this is a typo and
the real intention is to check against root->sort_pathkeys.

--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
                                           root->num_groupby_pathkeys);

        if (n > 0 &&
-           (enable_incremental_sort || n == list_length(path->pathkeys)))
+           (enable_incremental_sort || n == list_length(root->sort_pathkeys)))

However, I think this is also incorrect.  When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.


* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos.  Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great.  Consider the query below:

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

explain select count(*) from t group by a, b;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
   Group Key: a, b
   ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 width=8)
(3 rows)

The same path with group keys {a, b} is created and added twice.


* Part of the work performed in this patch overlaps with that of
preprocess_groupclause.  They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
perform this work only once.


* When reordering the GROUP BY keys, I think the following checks need
some improvements.

+       /*
+        * Since 1349d27 pathkey coming from underlying node can be in the
+        * root->group_pathkeys but not in the processed_groupClause. So, we
+        * should be careful here.
+        */
+       sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+                                           *group_clauses);
+       if (!sgc)
+           /* The grouping clause does not cover this pathkey */
+           break;

I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it.  Also, how
can we guarantee that the returned GROUP BY item is sortable?  Should we
check that with OidIsValid(sgc->sortop)?

Thanks
Richard

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi!

Thank you for your review.  The revised patchset is attached.

On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
>> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> >>
>> >> Thank you for providing the test case relevant for this code change.
>> >> The revised patch incorporating this change is attached.  Now the
>> >> patchset looks good to me.  I'm going to push it if there are no
>> >> objections.
>> >
>> > Seems I'm late for the party.  Can we hold for several more days?  I'd
>> > like to have a review on this patch.
>>
>> Sure, NP.  I'll hold it till your review.
>
>
> Thanks.  Appreciate that.
>
> I have briefly reviewed this patch and here are some comments.
>
> * When trying to match the ordering of GROUP BY to that of ORDER BY in
> get_useful_group_keys_orderings, this patch checks against the length of
> path->pathkeys.  This does not make sense.  I guess this is a typo and
> the real intention is to check against root->sort_pathkeys.
>
> --- a/src/backend/optimizer/path/pathkeys.c
> +++ b/src/backend/optimizer/path/pathkeys.c
> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
>                                            root->num_groupby_pathkeys);
>
>         if (n > 0 &&
> -           (enable_incremental_sort || n == list_length(path->pathkeys)))
> +           (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
>
> However, I think this is also incorrect.  When incremental sort is
> disabled, it is reasonable to consider reordering the GROUP BY keys only
> if the number of matching pathkeys is equal to the length of
> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.

Hmm... Why should this be list_length(root->group_pathkeys) while
we're targeting root->sort_pathkeys.  I yet changed that to
list_length(root->sort_pathkeys).

> * When the original ordering of GROUP BY keys matches the path's
> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
> generate duplicate PathKeyInfos.  Consequently, this duplication would
> lead to the creation and addition of identical paths multiple times.
> This is not great.  Consider the query below:
>
> create table t (a int, b int);
> create index on t (a, b);
> set enable_hashagg to off;
>
> explain select count(*) from t group by a, b;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
>    Group Key: a, b
>    ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 width=8)
> (3 rows)
>
> The same path with group keys {a, b} is created and added twice.

I tried to address that.

> * Part of the work performed in this patch overlaps with that of
> preprocess_groupclause.  They are both trying to adjust the ordering of
> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
> perform this work only once.

Andrei, could you take a look.

> * When reordering the GROUP BY keys, I think the following checks need
> some improvements.
>
> +       /*
> +        * Since 1349d27 pathkey coming from underlying node can be in the
> +        * root->group_pathkeys but not in the processed_groupClause. So, we
> +        * should be careful here.
> +        */
> +       sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
> +                                           *group_clauses);
> +       if (!sgc)
> +           /* The grouping clause does not cover this pathkey */
> +           break;
>
> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
> valid before calling get_sortgroupref_clause_noerr with it.  Also, how
> can we guarantee that the returned GROUP BY item is sortable?  Should we
> check that with OidIsValid(sgc->sortop)?

Added.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 16/1/2024 22:05, Alexander Korotkov wrote:
> On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
>> * When trying to match the ordering of GROUP BY to that of ORDER BY in
>> get_useful_group_keys_orderings, this patch checks against the length of
>> path->pathkeys.  This does not make sense.  I guess this is a typo and
>> the real intention is to check against root->sort_pathkeys.
Thanks! It is really my blunder - fresh look works.
>>
>> --- a/src/backend/optimizer/path/pathkeys.c
>> +++ b/src/backend/optimizer/path/pathkeys.c
>> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
>>                                             root->num_groupby_pathkeys);
>>
>>          if (n > 0 &&
>> -           (enable_incremental_sort || n == list_length(path->pathkeys)))
>> +           (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
>>
>> However, I think this is also incorrect.  When incremental sort is
>> disabled, it is reasonable to consider reordering the GROUP BY keys only
>> if the number of matching pathkeys is equal to the length of
>> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.
> 
> Hmm... Why should this be list_length(root->group_pathkeys) while
> we're targeting root->sort_pathkeys.  I yet changed that to
> list_length(root->sort_pathkeys).
I think, in the first case, when we are trying to arrange group-by keys 
according to underlying pathkeys with incremental sort = off, it makes 
sense to do if we fetch all group-by keys regardless of a more or equal 
number of path keys the incoming path contains. The code and test case 
are in step1.txt.
> 
>> * When the original ordering of GROUP BY keys matches the path's
>> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
>> generate duplicate PathKeyInfos.  Consequently, this duplication would
>> lead to the creation and addition of identical paths multiple times.
>> This is not great.  Consider the query below:
>>
>> create table t (a int, b int);
>> create index on t (a, b);
>> set enable_hashagg to off;
>>
>> explain select count(*) from t group by a, b;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
>>     Group Key: a, b
>>     ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 width=8)
>> (3 rows)
>>
>> The same path with group keys {a, b} is created and added twice.
> 
> I tried to address that.
> 
>> * Part of the work performed in this patch overlaps with that of
>> preprocess_groupclause.  They are both trying to adjust the ordering of
>> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
>> perform this work only once.
> 
> Andrei, could you take a look.
As I see, the PathKeyInfo list often contains duplicated pathkeys, 
coming from either sort_pathkeys or path->pathkeys orderings. So, I can 
propose to check duplicates each time (see step2.txt in attachment).
> 
>> * When reordering the GROUP BY keys, I think the following checks need
>> some improvements.
>>
>> +       /*
>> +        * Since 1349d27 pathkey coming from underlying node can be in the
>> +        * root->group_pathkeys but not in the processed_groupClause. So, we
>> +        * should be careful here.
>> +        */
>> +       sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
>> +                                           *group_clauses);
>> +       if (!sgc)
>> +           /* The grouping clause does not cover this pathkey */
>> +           break;
>>
>> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
>> valid before calling get_sortgroupref_clause_noerr with it.  Also, how
>> can we guarantee that the returned GROUP BY item is sortable?  Should we
>> check that with OidIsValid(sgc->sortop)?
> 
> Added.
Reviewed it, looks good.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
Just forgotten second attachment.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi!

I've applied your changes with minor editing, thank you.

On Thu, Jan 18, 2024 at 11:49 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> >> * Part of the work performed in this patch overlaps with that of
> >> preprocess_groupclause.  They are both trying to adjust the ordering of
> >> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
> >> perform this work only once.
> >
> > Andrei, could you take a look.
> As I see, the PathKeyInfo list often contains duplicated pathkeys,
> coming from either sort_pathkeys or path->pathkeys orderings. So, I can
> propose to check duplicates each time (see step2.txt in attachment).

Actually I asked to recheck if we can cut some part of
preprocess_groupclause() given that we're reordering the pathkeys
later.  It seems that we can remove everything except the work with a
grouping set.  I've done this in the revised patchset.

I'm going to push this if there are no objections.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Nathan Bossart
Дата:
A recent buildfarm failure [0] seems to indicate a name collision with the
"abc" index in the aggregates.sql test and the "abc" table in
namespace.sql.

[0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> A recent buildfarm failure [0] seems to indicate a name collision with the
> "abc" index in the aggregates.sql test and the "abc" table in
> namespace.sql.
>
> [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14

Thank you for catching this.  Fixed.

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
vignesh C
Дата:
On Thu, 25 Jan 2024 at 01:15, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > A recent buildfarm failure [0] seems to indicate a name collision with the
> > "abc" index in the aggregates.sql test and the "abc" table in
> > namespace.sql.
> >
> > [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2024-01-24%2014%3A05%3A14
>
> Thank you for catching this.  Fixed.

Since the patch has been committed, I have marked this entry as committed.

Regards,
Vignesh



Re: POC: GROUP BY optimization

От
Robert Haas
Дата:
On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

I'm late to the party, but I'd like to better understand what's being
argued here. If you're saying that, for some particular planner
problem, we should prefer a solution that doesn't need to know about
the relative cost of various sorts over one that does, I agree, for
exactly the reason that you state: our knowledge of sort costs won't
be reliable, and we will make mistakes. That's true in lots of
situations, not just related to sorts,
because estimation is a hard problem. Heuristics not based on cost are
going to be, in many cases, more accurate than heuristics based on
cost. They're also often cheaper, since they often let us reject
possible approaches very early, without all the bother of a cost
comparison.

But if you're saying that it's utterly impossible to know whether
sorting text will be cheaper or more expensive than sorting 4-byte
integers, and that if a particular problem can be solved only by
knowing which one is cheaper we should just give up, then I disagree.
In the absence of any other information, it must be right, at the very
least, to bank on varlena data types being more expensive to sort than
fixed-length data types. How much more expensive is hard to know,
because toasted blobs are going to be more expensive to sort than
short varlenas. But even before you reach the comparison function, a
pass-by-value datum has a significantly lower access cost than a
pass-by-reference datum. The fact that the pass-by-reference value
might be huge only compounds the problem.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.  We simply do not have sufficiently accurate
>> cost information.  The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.

> I'm late to the party, but I'd like to better understand what's being
> argued here.

What I am saying is that we don't have sufficiently accurate cost
information to support the sort of logic that got committed and
reverted before.  I did not mean to imply that it's not possible
to have such info, only that it is not present today.  IOW, what
I'm saying is that if you want to write code that tries to make
a cost-based preference of one sorting over another, you *first*
need to put in a bunch of legwork to create more accurate cost
numbers.  Trying to make such logic depend on the numbers we have
today is just going to result in garbage in, garbage out.

Sadly, that's not a small task:

* We'd need to put effort into assigning more realistic procost
values --- preferably across the board, not just comparison functions.
As long as all the comparison functions have procost 1.0, you're
just flying blind.

* As you mentioned, there'd need to be some accounting for the
likely size of varlena inputs, and especially whether they might
be toasted.

* cost_sort knows nothing of the low-level sort algorithm improvements
we've made in recent years, such as abbreviated keys.

That's a lot of work, and I think it has to be done before we try
to build infrastructure on top, not afterwards.

            regards, tom lane



Re: POC: GROUP BY optimization

От
Robert Haas
Дата:
On Fri, Jan 26, 2024 at 10:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sadly, that's not a small task:
>
> * We'd need to put effort into assigning more realistic procost
> values --- preferably across the board, not just comparison functions.
> As long as all the comparison functions have procost 1.0, you're
> just flying blind.
>
> * As you mentioned, there'd need to be some accounting for the
> likely size of varlena inputs, and especially whether they might
> be toasted.
>
> * cost_sort knows nothing of the low-level sort algorithm improvements
> we've made in recent years, such as abbreviated keys.
>
> That's a lot of work, and I think it has to be done before we try
> to build infrastructure on top, not afterwards.

OK, that makes sense to me. Thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I'm going to push this if there are no objections.

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

            regards, tom lane



Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I'm going to push this if there are no objections.

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 10000 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

    explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

    CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.

Thanks
Richard

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 2/2/2024 09:02, Tom Lane wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
>> I'm going to push this if there are no objections.
> 
> One of the test cases added by this commit has not been very
> stable in the buildfarm.  Latest example is here:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
> 
> and I've seen similar failures intermittently on other machines.
> 
> I'd suggest building this test atop a table that is more stable
> than pg_class.  You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.
Yeah, It is my fault. Please, see in the attachment the patch fixing that.
-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 10000 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

    explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

    CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.

Here is the draft patch that fixes the issues I complained about in
upthread.

Thanks
Richard
Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 2/2/2024 11:06, Richard Guo wrote:
> 
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
> 
>     On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>         One of the test cases added by this commit has not been very
>         stable in the buildfarm.  Latest example is here:
> 
>         https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
> 
>         and I've seen similar failures intermittently on other machines.
> 
>         I'd suggest building this test atop a table that is more stable
>         than pg_class.  You're just waving a red flag in front of a bull
>         if you expect stable statistics from that during a regression run.
>         Nor do I see any particular reason for pg_class to be especially
>         suited to the test.
> 
> 
>     Yeah, it's not a good practice to use pg_class in this place.  While
>     looking through the test cases added by this commit, I noticed some
>     other minor issues that are not great.  Such as
> 
>     * The table 'btg' is inserted with 10000 tuples, which seems a bit
>     expensive for a test.  I don't think we need such a big table to test
>     what we want.
> 
>     * I don't see why we need to manipulate GUC max_parallel_workers and
>     max_parallel_workers_per_gather.
> 
>     * I think we'd better write the tests with the keywords being all upper
>     or all lower.  A mixed use of upper and lower is not great. Such as in
> 
>          explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> 
>     * Some comments for the test queries are not easy to read.
> 
>     * For this statement
> 
>          CREATE INDEX idx_y_x_z ON btg(y,x,w);
> 
>     I think the index name would cause confusion.  It creates an index on
>     columns y, x and w, but the name indicates an index on y, x and z.
> 
>     I'd like to write a draft patch for the fixes.
> 
> 
> Here is the draft patch that fixes the issues I complained about in
> upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Fri, Feb 2, 2024 at 12:40 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 2/2/2024 11:06, Richard Guo wrote:
>
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com
> <mailto:guofenglinux@gmail.com>> wrote:
>
>     On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>         One of the test cases added by this commit has not been very
>         stable in the buildfarm.  Latest example is here:
>
>         https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
>
>         and I've seen similar failures intermittently on other machines.
>
>         I'd suggest building this test atop a table that is more stable
>         than pg_class.  You're just waving a red flag in front of a bull
>         if you expect stable statistics from that during a regression run.
>         Nor do I see any particular reason for pg_class to be especially
>         suited to the test.
>
>
>     Yeah, it's not a good practice to use pg_class in this place.  While
>     looking through the test cases added by this commit, I noticed some
>     other minor issues that are not great.  Such as
>
>     * The table 'btg' is inserted with 10000 tuples, which seems a bit
>     expensive for a test.  I don't think we need such a big table to test
>     what we want.
>
>     * I don't see why we need to manipulate GUC max_parallel_workers and
>     max_parallel_workers_per_gather.
>
>     * I think we'd better write the tests with the keywords being all upper
>     or all lower.  A mixed use of upper and lower is not great. Such as in
>
>          explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
>     * Some comments for the test queries are not easy to read.
>
>     * For this statement
>
>          CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
>     I think the index name would cause confusion.  It creates an index on
>     columns y, x and w, but the name indicates an index on y, x and z.
>
>     I'd like to write a draft patch for the fixes.
>
>
> Here is the draft patch that fixes the issues I complained about in
> upthread.
 
I passed through the patch. Looks like it doesn't break anything. Why do
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like
"SELECT x,y,..."?

Nothing special.  Just making the test cases consistent as much as
possible.
 
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)
instead of full sort?

I think that's because the planner chooses to use (z, w, x) to perform
the mergejoin.  I did not delve into the details, but I guess the cost
estimation decides this is cheaper.

Hi Alexander,

What do you think about the revisions for the test cases?

Thanks
Richard

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi, Richard!

> What do you think about the revisions for the test cases?

I've rebased your patch upthread.  Did some minor beautifications.

> * The table 'btg' is inserted with 10000 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.

Your patch reduces the number of rows to 1000 tuples.  I found it
possible to further reduce it to 100 tuples.  That also allowed me to
save the plan in the test case introduced by e1b7fde418.

Please check if you're OK with the patch attached.

------
Regards,
Alexander Korotkov

Вложения

Re: POC: GROUP BY optimization

От
Maxim Orlov
Дата:
Hi!

Another issue on test introduced in 0452b461bc405. I think it may be unstable in some circumstances.
For example, if we'll try to use different BLCKSZ. See, I've made a little change in the number of tuples to be inserted:

$ git diff
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;

And the bulk extension is kicked, so we got zeroed pages in the relation. The plane is also changed,
switched to seq scan from index scan:
@@ -2734,7 +2734,7 @@
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 -- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@

 -- Engage incremental sort
 explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
-                   QUERY PLAN
--------------------------------------------------
+          QUERY PLAN
+------------------------------
  Group
    Group Key: x, y, z, w
-   ->  Incremental Sort
+   ->  Sort
          Sort Key: x, y, z, w
-         Presorted Key: x, y
-         ->  Index Scan using btg_x_y_idx on btg
-(6 rows)
+         ->  Seq Scan on btg
+(5 rows)
... and so on.

So, my proposal is simple. I think we need not just "ANALYZE btg", but
"VACUUM ANALYZE btg", to get rid of zeroed pages in this particular
case. PFA corresponding patch.

--
Best regards,
Maxim Orlov.
Вложения

Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Richard!

> What do you think about the revisions for the test cases?

I've rebased your patch upthread.  Did some minor beautifications.

> * The table 'btg' is inserted with 10000 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.

Your patch reduces the number of rows to 1000 tuples.  I found it
possible to further reduce it to 100 tuples.  That also allowed me to
save the plan in the test case introduced by e1b7fde418.

Please check if you're OK with the patch attached.

I looked through the v2 patch and have two comments.

* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan.  So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?

* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation".  This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath.  So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.

See the attached v3 patch.  I also made cosmetic tweaks to the comments,
and simplified a test case query.

Thanks
Richard
Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 22/2/2024 09:09, Richard Guo wrote:
> 
> On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com 
> <mailto:aekorotkov@gmail.com>> wrote:
> 
>     Hi, Richard!
> 
>      > What do you think about the revisions for the test cases?
> 
>     I've rebased your patch upthread.  Did some minor beautifications.
> 
>      > * The table 'btg' is inserted with 10000 tuples, which seems a bit
>      > expensive for a test.  I don't think we need such a big table to test
>      > what we want.
> 
>     Your patch reduces the number of rows to 1000 tuples.  I found it
>     possible to further reduce it to 100 tuples.  That also allowed me to
>     save the plan in the test case introduced by e1b7fde418.
> 
>     Please check if you're OK with the patch attached.
> 
> 
> I looked through the v2 patch and have two comments.
> 
> * The test case under "Check we don't pick aggregate path key instead of
> grouping path key" does not have EXPLAIN to show the plan.  So how can
> we ensure we do not mistakenly select the aggregate pathkeys instead of
> the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the 
zeros number in the output on different platforms here:
        avg
--------------------
  4.0000000000000000
  5.0000000000000000
It was why I'd used the format() function before. So, may we elaborate 
on the test and restrict the output?
> 
> * I don't think the test case introduced by e1b7fde418 is still needed,
> because we already have one under "Utilize the ordering of merge join to
> avoid a full Sort operation".  This kind of test case is just to ensure
> that we are able to utilize the ordering of the subplans underneath.  So
> it should be parallel to the test cases for utilize the ordering of
> index scan and subquery scan.
I confirm, this version also checks ec_sortref initialization in the 
case when ec are contructed from WHERE clause. Generally, I like more 
one test for one issue instead of one test for all at once. But it works 
and I don't have any reason to dispute it.

Also, I'm unsure about removing the disabling of the 
max_parallel_workers_per_gather parameter. Have you discovered the 
domination of the current plan over the partial one? Do the cost 
fluctuations across platforms not trigger a parallel plan?

What's more, I suggest to address here the complaint from [1]. As I see, 
cost difference between Sort and IncrementalSort strategies in that case 
is around 0.5. To make the test more stable I propose to change it a bit 
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10 
cost points.

[1] 
https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rBKXEoTwgzMcLaD5MPhfW0Lg@mail.gmail.com

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Richard Guo
Дата:

On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 22/2/2024 09:09, Richard Guo wrote:
> I looked through the v2 patch and have two comments.
>
> * The test case under "Check we don't pick aggregate path key instead of
> grouping path key" does not have EXPLAIN to show the plan.  So how can
> we ensure we do not mistakenly select the aggregate pathkeys instead of
> the grouping pathkeys?
 
I confirm it works correctly. I am not sure about the stability of the
zeros number in the output on different platforms here:
        avg
--------------------
  4.0000000000000000
  5.0000000000000000
It was why I'd used the format() function before. So, may we elaborate
on the test and restrict the output?

The avg() function on integer argument is commonly used in
aggregates.sql.  I don't think this is an issue.  See the first test
query in aggregates.sql.

SELECT avg(four) AS avg_1 FROM onek;
       avg_1
--------------------
 1.5000000000000000
(1 row)
 
> * I don't think the test case introduced by e1b7fde418 is still needed,
> because we already have one under "Utilize the ordering of merge join to
> avoid a full Sort operation".  This kind of test case is just to ensure
> that we are able to utilize the ordering of the subplans underneath.  So
> it should be parallel to the test cases for utilize the ordering of
> index scan and subquery scan.

Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?

The table used for testing contains only 100 tuples, which is the size
of only one page.  I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
 
What's more, I suggest to address here the complaint from [1]. As I see,
cost difference between Sort and IncrementalSort strategies in that case
is around 0.5. To make the test more stable I propose to change it a bit
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.

I don't think that's necessary.  With Incremental Sort the final cost
is:

    GroupAggregate  (cost=1.66..19.00 rows=100 width=25)

while with full Sort it is:

    GroupAggregate  (cost=16.96..19.46 rows=100 width=25)

With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost.  Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.

Thanks
Richard

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 22/2/2024 13:35, Richard Guo wrote:
> The avg() function on integer argument is commonly used in
> aggregates.sql.  I don't think this is an issue.  See the first test
> query in aggregates.sql.
Make sense
>      > it should be parallel to the test cases for utilize the ordering of
>      > index scan and subquery scan.
> 
>     Also, I'm unsure about removing the disabling of the
>     max_parallel_workers_per_gather parameter. Have you discovered the
>     domination of the current plan over the partial one? Do the cost
>     fluctuations across platforms not trigger a parallel plan?
> 
> 
> The table used for testing contains only 100 tuples, which is the size
> of only one page.  I don't believe it would trigger any parallel plans,
> unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently 
reduce it to zero, allowing PostgreSQL to make a decision based on 
costs. It sometimes works much better, because one small table in multi 
join can disallow an effective parallel plan.
> 
>     What's more, I suggest to address here the complaint from [1]. As I
>     see,
>     cost difference between Sort and IncrementalSort strategies in that
>     case
>     is around 0.5. To make the test more stable I propose to change it a
>     bit
>     and add a limit:
>     SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
>     It makes efficacy of IncrementalSort more obvious difference around 10
>     cost points.
> 
> 
> I don't think that's necessary.  With Incremental Sort the final cost
> is:
> 
>      GroupAggregate  (cost=1.66..19.00 rows=100 width=25)
> 
> while with full Sort it is:
> 
>      GroupAggregate  (cost=16.96..19.46 rows=100 width=25)
> 
> With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
> is cheaper on total cost.  Not to say that even if somehow we decide the
> two paths are fuzzily the same on total cost, the first path still
> dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how 
much cost can be added by bulk extension of the relation blocks. If 
Maxim will answer that it's enough to resolve his issue, why not?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Tom Lane
Дата:
I'm late to the party, and I apologize for not having paid attention
to this thread for months ... but I am wondering why 0452b461b got
committed.  It seems to add a substantial amount of complexity and
planner cycles in return for not much.  The reason why I'm dubious
about it can be found in this comment that the patch deleted:

- * In principle it might be interesting to consider other orderings of the
- * GROUP BY elements, which could match the sort ordering of other
- * possible plans (eg an indexscan) and thereby reduce cost.  We don't
- * bother with that, though.  Hashed grouping will frequently win anyway.

Indeed, if you remove the

SET enable_hashagg = off;
SET enable_seqscan = off;

lines added to aggregates.sql, you find that every single one of the
test cases added there changes plan, except for the one case that is
there to prove that the planner *doesn't* pick this type of plan.
That shows that the planner doesn't believe any of the plans developed
here are cheaper than other alternatives it would normally consider
(usually HashAgg).  So it sure seems like we're spending a lot of
effort on uninteresting plans.  Maybe this is an artifact of testing
against toy tables and a useful win could be shown on bigger tables,
but I don't see any indication in the thread that anyone actually
demonstrated that.

I might hold still for this patch anyway if the patch were simpler
and more obviously correct, but there are scary bits in it:

* The very first hunk causes get_eclass_for_sort_expr to have
side-effects on existing EquivalenceClass data structures.
What is the argument that that's not going to cause problems?
At the very least it introduces asymmetry, in that the first
caller wins the chance to change the EC's ec_sortref and later
callers won't be able to.

* I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which
I notice has already had one band-aid added to it since commit).
In particular, it seems to believe that the pathkeys and clauses
lists match one-for-one, but I seriously doubt that that invariant
remains guaranteed after the cleanup steps

    /* append the remaining group pathkeys (will be treated as not sorted) */
    *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
                                             *group_pathkeys);
    *group_clauses = list_concat_unique_ptr(new_group_clauses,
                                            *group_clauses);

For that to be reliable, the SortGroupClauses added to
new_group_clauses in the main loop have to be exactly those
that are associated with the same pathkeys in the old lists.
I doubt that that's necessarily true in the presence of redundant
grouping clauses.  (Maybe we can't get here with any redundant
grouping clauses, but still, we don't really guarantee uniqueness of
SortGroupClauses, much less that they are never copied which is what
you need if you want to believe that pointer equality is sufficient
for de-duping here.  PathKeys are explicitly made to be safe to compare
pointer-wise, but I know of no such guarantee for SortGroupClauses.)

* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does.  I suspect there are bugs lurking there, and
am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field.  This comment looks
pretty silly now too:

            /* Preprocess regular GROUP BY clause, if any */
            root->processed_groupClause = list_copy(parse->groupClause);

What "preprocessing" is going on there?  This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.

* Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
type name, and the comments provided for it are not going to educate
anybody.  What is the "association" between the pathkeys and clauses?
I guess the clauses are supposed to be SortGroupClauses, but not all
pathkeys match up to SortGroupClauses, so what then?  This is very
underspecified, and fuzzy thinking about data structures tends to lead
to bugs.

So I'm quite afraid that there are still bugs lurking here.
What's more, given that the plans this patch makes apparently
seldom win when you don't put a thumb on the scales, it might
take a very long time to isolate those bugs.  If the patch
produces a faulty plan (e.g. not sorted properly) we'd never
notice if that plan isn't chosen, and even if it is chosen
it probably wouldn't produce anything as obviously wrong as
a crash.

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.

            regards, tom lane



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> If this patch were producing better results I'd be more excited
> about putting more work into it.  But on the basis of what I'm
> seeing right now, I think maybe we ought to give up on it.
First, thanks for the deep review - sometimes, only a commit gives us a 
chance to get such observation :))).
On a broader note, introducing automatic group-by-order choosing is a 
step towards training the optimiser to handle poorly tuned incoming 
queries. While it's true that this may initially impact performance, 
it's crucial to weigh the potential benefits. So, beforehand, we should 
agree: Is it worth it?
If yes, I would say I see how often hashing doesn't work in grouping. 
Sometimes because of estimation errors, sometimes because grouping 
already has sorted input, sometimes in analytical queries when planner 
doesn't have enough memory for hashing. In analytical cases, the only 
way to speed up queries sometimes is to be smart with features like 
IncrementalSort and this one.
About low efficiency. Remember the previous version of the GROUP-BY 
optimisation - we disagreed on operator costs and the cost model in 
general. In the current version, we went the opposite - adding small 
features step-by-step. The current commit contains an integral part of 
the feature and is designed for safely testing the approach and adding 
more profitable parts like choosing group-by-order according to distinct 
values or unique indexes on grouping columns.
I have passed through the code being steered by the issues explained in 
detail. I see seven issues. Two of them definitely should be scrutinised 
right now, and I'm ready to do that.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> * The very first hunk causes get_eclass_for_sort_expr to have
> side-effects on existing EquivalenceClass data structures.
> What is the argument that that's not going to cause problems?
> At the very least it introduces asymmetry, in that the first
> caller wins the chance to change the EC's ec_sortref and later
> callers won't be able to.
Let me resolve issues bit by bit.
Addressing your first note, 'asymmetry,' I discovered this part of the 
code. Before the 8d83a5d, it could cause some artefacts, but since then, 
a kind of asymmetry has been introduced into the GROUP-BY clause list.
As you can see in the attached patch, GROUP-BY reordering works even 
when the asymmetry of the 8d83a5d chooses different keys.

At the same time, I agree that implicitly setting anything in an 
exporting function, which should just look up for EC, is a questionable 
coding style. To avoid possible usage issues (in extensions, for 
example), I propose setting it up afterwards, explicitly forcing this 
action by input parameter - see my attempt in the attachment.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> * I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which
> I notice has already had one band-aid added to it since commit).
> In particular, it seems to believe that the pathkeys and clauses
> lists match one-for-one, but I seriously doubt that that invariant
> remains guaranteed after the cleanup steps
> 
>      /* append the remaining group pathkeys (will be treated as not sorted) */
>      *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
>                                               *group_pathkeys);
>      *group_clauses = list_concat_unique_ptr(new_group_clauses,
>                                              *group_clauses);
> 
> For that to be reliable, the SortGroupClauses added to
> new_group_clauses in the main loop have to be exactly those
> that are associated with the same pathkeys in the old lists.
> I doubt that that's necessarily true in the presence of redundant
> grouping clauses.  (Maybe we can't get here with any redundant
> grouping clauses, but still, we don't really guarantee uniqueness of
> SortGroupClauses, much less that they are never copied which is what
> you need if you want to believe that pointer equality is sufficient
> for de-duping here.  PathKeys are explicitly made to be safe to compare
> pointer-wise, but I know of no such guarantee for SortGroupClauses.)
I spent a lot of time inventing situations with SortGroupClause 
duplicates. Unfortunately, it looks impossible so far. But because we 
really don't guarantee uniqueness, I changed the code to survive in this 
case. Also, I added assertion checking to be sure we don't have logical 
mistakes here - see attachment.
About the band-aid mentioned above - as I see, 4169850 introduces the 
same trick in planner.c. So, it looks like result of design of the 
current code.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> * It seems like root->processed_groupClause no longer has much to do
> with the grouping behavior, which is scary given how much code still
> believes that it does.  I suspect there are bugs lurking there, and
1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost 
of sort to the foreign grouping.
- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets, 
remove_useless_groupby_columns - we don't apply this feature in the case 
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of 
clauses.
- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend 
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in 
the case of hash grouping.

So, the only case we can impact current behavior lies in the 
postgres_fdw. But here we don't really know which plan will be chosen 
during planning on foreign node and stay the same behavior is legal for me.
> am not comforted by the fact that the patch changed exactly nothing
> in the pathnodes.h documentation of that field.  This comment looks
> pretty silly now too:
> 
>              /* Preprocess regular GROUP BY clause, if any */
>              root->processed_groupClause = list_copy(parse->groupClause);
> 
> What "preprocessing" is going on there?  This comment was adequate
> before, when the code line invoked preprocess_groupclause which had
> a bunch of commentary; but now it has to stand on its own and it's
> not doing a great job of that.
Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by 
clauses to the stage, where we already have underlying subtree and 
incoming path keys. In principle, we could return the preprocessing 
routine and arrange GROUP-BY with the ORDER-BY clause as it was before. 
But looking into the code, I found only one reason to do this: 
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get 
here, I think we can discover it later with no hurry. A good outcome 
here will be a test case that can show the importance of arranging 
GROUP-BY and ORDER-BY at an early stage.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
> type name, and the comments provided for it are not going to educate
> anybody.  What is the "association" between the pathkeys and clauses?
> I guess the clauses are supposed to be SortGroupClauses, but not all
> pathkeys match up to SortGroupClauses, so what then?  This is very
> underspecified, and fuzzy thinking about data structures tends to lead
> to bugs.
I'm not the best in English documentation and naming style. So, feel 
free to check my version.
> 
> So I'm quite afraid that there are still bugs lurking here.
> What's more, given that the plans this patch makes apparently
> seldom win when you don't put a thumb on the scales, it might
> take a very long time to isolate those bugs.  If the patch
> produces a faulty plan (e.g. not sorted properly) we'd never
> notice if that plan isn't chosen, and even if it is chosen
> it probably wouldn't produce anything as obviously wrong as
> a crash.
I added checkings on the proper order of pathkeys and clauses.
  If you really care about that, we should spend additional time and 
rewrite the code to generate an order of clauses somewhere in the plan 
creation phase. For example, during the create_group_plan call, we could 
use the processed_groupClause list, cycle through subpath->pathkeys, set 
the order, and detect whether the pathkeys list corresponds to the 
group-by or is enough to build a grouping plan.
Anyway, make this part of code more resistant to blunders is another story.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi, Andrei!

On Thu, Apr 18, 2024 at 11:54 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 4/12/24 06:44, Tom Lane wrote:
> > * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
> > type name, and the comments provided for it are not going to educate
> > anybody.  What is the "association" between the pathkeys and clauses?
> > I guess the clauses are supposed to be SortGroupClauses, but not all
> > pathkeys match up to SortGroupClauses, so what then?  This is very
> > underspecified, and fuzzy thinking about data structures tends to lead
> > to bugs.
> I'm not the best in English documentation and naming style. So, feel
> free to check my version.
> >
> > So I'm quite afraid that there are still bugs lurking here.
> > What's more, given that the plans this patch makes apparently
> > seldom win when you don't put a thumb on the scales, it might
> > take a very long time to isolate those bugs.  If the patch
> > produces a faulty plan (e.g. not sorted properly) we'd never
> > notice if that plan isn't chosen, and even if it is chosen
> > it probably wouldn't produce anything as obviously wrong as
> > a crash.
> I added checkings on the proper order of pathkeys and clauses.
>   If you really care about that, we should spend additional time and
> rewrite the code to generate an order of clauses somewhere in the plan
> creation phase. For example, during the create_group_plan call, we could
> use the processed_groupClause list, cycle through subpath->pathkeys, set
> the order, and detect whether the pathkeys list corresponds to the
> group-by or is enough to build a grouping plan.
> Anyway, make this part of code more resistant to blunders is another story.

Thank you for the fixes you've proposed.  I didn't look much into
details yet, but I think the main concern Tom expressed in [1] is
whether the feature is reasonable at all.  I think at this stage the
most important thing is to come up with convincing examples showing
how huge performance benefits it could cause.  I will return to this
later today and will try to provide some convincing examples.

Links
1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
jian he
Дата:
On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Thank you for the fixes you've proposed.  I didn't look much into
> details yet, but I think the main concern Tom expressed in [1] is
> whether the feature is reasonable at all.  I think at this stage the
> most important thing is to come up with convincing examples showing
> how huge performance benefits it could cause.  I will return to this
> later today and will try to provide some convincing examples.
>

hi.
I found a case where it improved performance.

+-- GROUP BY optimization by reorder columns
+CREATE TABLE btg AS SELECT
+ i % 100 AS x,
+ i % 100 AS y,
+ 'abc' || i % 10 AS z,
+ i AS w
+FROM generate_series(1,10000) AS i;
+CREATE INDEX abc ON btg(x,y);
+ANALYZE btg;
+
I change
+FROM generate_series(1,10000) AS i;
to
+ FROM generate_series(1, 1e6) AS i;

Then I found out about these 2 queries performance improved a lot.
A: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER
BY y, x \watch i=0.1 c=10
B: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, z, y ORDER
BY y, x, z, w \watch i=0.1 c=10

set (enable_seqscan,enable_hashagg) from on to off:
queryA execution time from 1533.013 ms to 533.430 ms
queryB execution time from 1996.817 ms to 497.020 ms



Re: POC: GROUP BY optimization

От
Andrei Lepikhov
Дата:
On 4/12/24 06:44, Tom Lane wrote:
> If this patch were producing better results I'd be more excited
> about putting more work into it.  But on the basis of what I'm
> seeing right now, I think maybe we ought to give up on it.
Let me show current cases where users have a profit with this tiny 
improvement (see queries and execution results in query.sql):
1. 'Not optimised query text' — when we didn't consider group-by 
ordering during database development.
2. 'Accidental pathkeys' - we didn't see any explicit orderings, but 
accidentally, the planner used merge join that caused some orderings and 
we can utilise it.
3. 'Uncertain scan path' — We have options regarding which index to use, 
and because of that, we can't predict the optimal group-by ordering 
before the start of query planning.
4. 'HashAgg V/S GroupAgg' — sometimes, the GroupAgg strategy outperforms 
HashAgg just because we don't need any ordering procedure at all.

And the last thing here — this code introduces the basics needed to add 
more sophisticated strategies, like ordering according to uniqueness or 
preferring to set constant-width columns first in grouping.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: POC: GROUP BY optimization

От
jian he
Дата:
On Fri, Apr 19, 2024 at 6:44 PM jian he <jian.universality@gmail.com> wrote:
>
> On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Thank you for the fixes you've proposed.  I didn't look much into
> > details yet, but I think the main concern Tom expressed in [1] is
> > whether the feature is reasonable at all.  I think at this stage the
> > most important thing is to come up with convincing examples showing
> > how huge performance benefits it could cause.  I will return to this
> > later today and will try to provide some convincing examples.
> >

hi.
previously preprocess_groupclause will not process cases
where no ORDER BY clause is specified.
commit 0452b461b will reorder the GROUP BY element even though no
ORDER BY clause is specified
, if there are associated indexes on it.
(hope I understand it correctly).


for example (when enable_hashagg is false)
explain(verbose) select count(*) FROM btg GROUP BY y,x;
in pg16 will not reorder, it will be as is: `GROUP BY y,x`

after commit 0452b461b, it will reorder to `GROUP BY x,y`.
because there is an index `btree (x, y)` (only one) associated with it.
if you drop the index `btree (x, y)` , it will be `GROUP BY y,x` as pg16.


This reordering GROUP BY element when no ORDER BY clause is not specified
is performant useful when the work_mem is small.
I've attached some tests comparing master with REL_16_STABLE to
demonstrate that.
all the tests attached are under the condition:
work_mem='64kB', buildtype=release, max_parallel_workers_per_gather=0.


one example:
CREATE TABLE btg5 AS
SELECT i::numeric % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i % 100000 AS w
FROM generate_series(1, 1e6) AS i;
CREATE INDEX btg5_x_y_idx ON btg5(x, y);

explain(analyze) SELECT count(*) FROM btg5 GROUP BY z, y, w, x;
in pg17,  the execution time is: 746.574 ms
in pg16,  the execution time is: 1693.483 ms

if I reorder it manually as:
`explain(analyze) SELECT count(*) FROM btg5 GROUP BY x, y, w, z;`
then in pg16, the execution time is 630.394 ms

Вложения

Re: POC: GROUP BY optimization

От
Alexander Korotkov
Дата:
Hi!

On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Thank you for the fixes you've proposed.  I didn't look much into
> details yet, but I think the main concern Tom expressed in [1] is
> whether the feature is reasonable at all.  I think at this stage the
> most important thing is to come up with convincing examples showing
> how huge performance benefits it could cause.  I will return to this
> later today and will try to provide some convincing examples.

I took a fresh look at 0452b461b, and have the following thoughts:
1) Previously, preprocess_groupclause() tries to reorder GROUP BY
clauses to match the required ORDER BY order.  It only reorders if
GROUP BY pathkeys are the prefix of ORDER BY pathkeys or vice versa.
So, both of them need to be satisfied by one ordering.  0452b461b also
tries to match GROUP BY clauses to ORDER BY clauses, but takes into
account an incremental sort.  Actually, instead of removing
preprocess_groupclause(), we could just teach it to take incremental
sort into account.
2) The get_useful_group_keys_orderings() function takes into account 3
orderings of pathkeys and clauses: original order as written in GROUP
BY, matching ORDER BY clauses as much as possible, and matching the
input path as much as possible.  Given that even before 0452b461b,
preprocess_groupclause() could change the original order of GROUP BY
clauses, so we don't need to distinguish the first two.  We could just
consider output of new preprocess_groupclause() taking into account an
incremental sort and the ordering matching the input path as much as
possible.  This seems like significant simplification.

Let me also describe my thoughts about the justification of the
feature itself.  As Tom pointed upthread, Sort + Grouping is generally
unlikely faster than Hash Aggregate.  The main point of this feature
is being able to match the order of GROUP BY clauses to the order of
the input path.  That input path could be Index Scan or Subquery.
Let's concentrate on Index Scan.  Index Scan can give us the required
order, so we can do grouping without Sort or with significantly
cheaper Incremental Sort.  That could be much faster than Hash
Aggregate.  But if we scan the full table (or its significant
fraction), Index Scan is typically much more expensive than Sequential
Scan because of random page access.  However, there are cases when
Index Scan could be cheaper.
1) If the heap row is wide and the index contains all the required
columns, Index Only Scan can be cheaper than Sequential Scan because
of lesser volume.
2) If the query predicate is selective enough and matches the index,
Index Scan might be significantly cheaper.  One may say that if the
query predicate is selective enough then there are not that many
matching rows, so aggregating them either way isn't a big deal anyway.
However, given that data volumes are growing tremendously, it's not
hard to imagine that the index selected a small fraction of table
rows, but they are still enough to be costly for aggregating.
Therefore, I think there are significant cases where matching GROUP BY
clauses to the order of the input path could give a substantial
improvement over Hash Aggregate.

While there are some particular use-cases by Jian He, I hope that
above could give some rationale.

------
Regards,
Alexander Korotkov



Re: POC: GROUP BY optimization

От
jian he
Дата:
hi.
I found an interesting case.

CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
  ->  Incremental Sort
         Sort Key: x, $, $, $
         Presorted Key: x
         ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
  ->  Incremental Sort
         Sort Key: x, y, $, $
         Presorted Key: x, y
         ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?



Re: POC: GROUP BY optimization

От
jian he
Дата:
hi
one more question (maybe a dumb one....)

drop table if exists t1;
CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;


EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY z,x,y,w
order by w;


                      QUERY PLAN
------------------------------------------------------
 GroupAggregate
   Output: count(*), w, z, x, y
   Group Key: t1.w, t1.x, t1.y, t1.z
   ->  Sort
         Output: w, z, x, y
         Sort Key: t1.w, t1.x, t1.y, t1.z
         ->  Index Scan using t1_x_y_idx on public.t1
               Output: w, z, x, y
(8 rows)


if you do
` Sort Key: t1.w, t1.x, t1.y, t1.z`

then the output is supposed to be:

Output: w, x, y, z

?