Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'
От | Heikki Linnakangas |
---|---|
Тема | Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser' |
Дата | |
Msg-id | 61d7cb95-9194-42cf-8225-351703de9cb8@iki.fi обсуждение исходный текст |
Ответы |
Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'
|
Список | pgsql-hackers |
On 24/09/2024 08:08, Andrei Lepikhov wrote: > On 19/9/2024 09:55, Andrei Lepikhov wrote: >> This wrong prediction makes things much worse if the query has more >> upper query blocks. >> His question was: Why not consider the grouping column unique in the >> upper query block? It could improve estimations. >> After a thorough investigation, I discovered that in commit >> 4767bc8ff2 most of the work was already done for DISTINCT clauses. So, >> why not do the same for grouping? A sketch of the patch is attached. >> As I see it, grouping in this sense works quite similarly to DISTINCT, >> and we have no reason to ignore it. After applying the patch, you can >> see that prediction has been improved: >> >> Hash Right Join (cost=5.62..162.56 rows=50 width=36) >> > A regression test is added into new version. > The code looks tiny, simple and non-invasive - it will be easy to commit > or reject. So I add it to next commitfest. Looks good at a quick glance. > @@ -5843,11 +5852,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault) > } > > /* > - * If there is a unique index or DISTINCT clause for the variable, assume > - * it is unique no matter what pg_statistic says; the statistics could be > - * out of date, or we might have found a partial unique index that proves > - * the var is unique for this query. However, we'd better still believe > - * the null-fraction statistic. > + * If there is a unique index, DISTINCT or GROUP-BY clause for the variable, > + * assume it is unique no matter what pg_statistic says; the statistics > + * could be out of date, or we might have found a partial unique index that > + * proves the var is unique for this query. However, we'd better still > + * believe the null-fraction statistic. > */ > if (vardata->isunique) > stadistinct = -1.0 * (1.0 - stanullfrac); I wonder about the "we'd better still believe the null-fraction statistic" part. It makes sense for a unique index, but a DISTINCT or GROUP BY collapses all the NULLs to a single row. So I think there's some more work to be done here. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: