Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: POC: GROUP BY optimization
Дата
Msg-id 7a80a207-5e8e-b80c-476c-2d290b0526e9@enterprisedb.com
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Zhihong Yu <zyu@yugabyte.com>)
Ответы Re: POC: GROUP BY optimization
Список pgsql-hackers
On 3/29/22 04:02, Zhihong Yu wrote:
> 
> 
> On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> 
>     Hi,
> 
>     Here's a rebased/improved version of the patch, with smaller parts
>     addressing various issues. There are seven parts:
> 
>     0001 - main part, just rebased
> 
>     0002 - replace the debug GUC options with a single GUC to disable the
>            optimization if needed
> 
>     0003 - minor code cleanup, removal of unnecessary variable
> 
>     0004 - various comment fixes (rewordings, typos, ...)
> 
>     0005 - a minor code simplification, addressing FIXMEs from 0004
> 
>     0006 - adds the new GUC to the docs
> 
>     0007 - demonstrates plan changes with a disabled optimization
> 
>     The first 6 parts should be squashed and committed at one, I only kept
>     them separate for clarity. The 0007 is merely a demonstration of the new
>     GUC and that it disables the optimization.
> 
>     > Agree. Because it is a kind of automation we should allow user to
>     switch
>     > it off in the case of problems or manual tuning.
>     > > Also, I looked through this patch. It has some minor problems:
>     > 1. Multiple typos in the patch comment.
> 
>     I went through the comments and checked all of them for grammar mistakes
>     and typos using a word processor, so hopefully that should be OK. But
>     maybe there's still something wrong.
> 
>     > 2. The term 'cardinality of a key' - may be replace with 'number of
>     > duplicates'?
> 
>     No, cardinality means "number of distinct values", so "duplicates" would
>     be wrong. And I think "cardinality" is well established term, so I think
>     it's fine.
> 
>     BTW I named the GUC enable_group_by_reordering, I wonder if it should be
>     named differently, e.g. enable_groupby_reordering? Opinions?
> 
> 
>     regards
> 
>     -- 
>     Tomas Vondra
>     EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
>     The Enterprise PostgreSQL Company
> 
> Hi,
> 
> For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:
> 
> multiple parametes need to be
> 
>   parametes -> parameters
> 
> leave more expensive comparions
> 
>   comparions -> comparisons
> 
> +       if (has_fake_var == false)
> 
> The above can be written as:
> 
>        if (!has_fake_var)
> 

All of this was already fixed in one of the subsequent "fixup" patches.
Attached is a patch merging 0001-0006, which is what I proposed to get
committed.

> +           nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) /
> list_length(pathkeys));
> 
> Looks like the value of tuples doesn't change inside the loop.
> You can precompute sqrt(tuples) outside the loop and store the value in
> a variable.
> 

IMHO it makes the formula harder to read, and the effect is not going to
be measurable - we're processing only a couple elements. If the loop was
~100 iterations or more, maybe it'd have impact.

> +       return -1;
> +   else if (a->cost == b->cost)
> +       return 0;
> +   return 1;
> 
> the keyword 'else' is not needed.
> 

True, but this is how comparators are usually implemented.

> + * Returns newly allocated lists. If no reordering is possible (or needed),
> + * the lists are set to NIL.
> + */
> +static bool
> +get_cheapest_group_keys_order(PlannerInfo *root, double nrows,
> 
> It seems the comment for return value doesn't match the bool return type.
> 

Yup, this is a valid point. I've fixed/reworded the comment a bit.

> +   /* If this optimization is disabled, we're done. */
> +   if (!debug_cheapest_group_by)
> 
> It seems enable_cheapest_group_by would be better name for the flag.
> 

This was renamed to enable_order_by_reordering in one of the fixup patches.

Attached is a patch merging 0001 and all the fixup patches, i.e. the
patch I plan to commit.

There was a minor test failure - the new GUC was not added to the sample
config file, so 003_check_guc.pl was failing.

I'm not including the 0007 part, because that's meant to demonstrate
plan changes with disabled optimization, which would confuse cfbot.

regards

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints