Re: improving GROUP BY estimation

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: improving GROUP BY estimation
Дата
Msg-id CAPpHfdtEiXqj52kuGMEnw6EShrzj-0tFYwwxPqzdFvxYfCr+tQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: improving GROUP BY estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: improving GROUP BY estimation
Список pgsql-hackers
Hi, Tomas!

I've assigned to review this patch.

I've checked version estimate-num-groups-v2.txt by Mark Dilger.
It applies to head cleanly, passes corrected regression tests.

About correlated/uncorrelated cases. I think now optimizer mostly assumes all distributions to be independent.
I think we should follow this assumption in this case also until we have fundamentally better option (like your multivariate statistics).

@@ -3438,9 +3438,9 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
  reldistinct = clamp;

  /*
- * Multiply by restriction selectivity.
+ * Estimate number of distinct values expected in given number of rows.
  */
- reldistinct *= rel->rows / rel->tuples;
+ reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows));

  /*
  * Update estimate of total distinct groups.

I think we need way more explanation in comments here (possibly with external links). For now, it looks like formula which appears from nowhere.

Also, note that rel->tuples gone away from formula parameters. So, we actually don't care about how may tuples are in the table. This is because this formula assumes that same tuple could be selected multiple times. For low numbers this can lead to some errors. Consider selecting 2 from 3 distinct tuples. If you can't select same tuple twice then you always selecting 2 distinct tuples. But according to formula you will select 5/3 in average. I think this error in small numbers is negligible, especially because getting rid of this error would require way more computations. But it worth mentioning in comments though.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [NOVICE] WHERE clause not used when index is used
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: pgbench small bug fix