Re: Showing applied extended statistics in explain

Поиск

Список

Период

Сортировка

От	Tatsuro Yamada
Тема	Re: Showing applied extended statistics in explain
Дата	18 января 2022 г. 14:31:30
Msg-id	CAOKkKFtc45uNFoWYOCo4St19ayxrh-_+4TnZtwxGZz6-3k_GSA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Showing applied extended statistics in explain (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы	Re: Showing applied extended statistics in explain (Jacob Champion <jchampion@timescale.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Tomas!

>> What exactly use case do you have in mind?
>Well, my goal was to help users investigating the plan/estimates,
>because once you create multiple "candidate" statistics objects it may
>get quite tricky to determine which of them were applied, in what order,
>etc. It's not all that straightforward, given the various heuristics we
>use to pick the "best" statistics, apply dependencies last, etc. And I
>don't quite want to teach the users those rules, because I consider them
>mostly implementation details that wee may want to tweak in the future.

You are right. This feature will be very useful for plan tuning with
extended statistics. Therefore, I would like to help develop it. :-D

>5) It includes just statistics name + clauses, but maybe we should
>include additional info (e.g estimate for that combination of clauses).

I thought the above sentence was about what level to aim for in the initial

version. Ideally, we would like to include additional information. However,

it is clear that the more things we have, the longer it will take to develop.
So, I think it is better to commit the initial version at a minimal level to

provide it to users quickly.

As long as an Extended stats name is displayed in EXPLAIN result, it is

possible to figure out which extended statistics are used. That information

alone is valuable to the user.

> 4) The info is collected always, but I guess we should do that only when
>in explain mode. Not sure how expensive it is.

In the case of pg_plan_advsr that I created, I used ProcessUtility_hook to
detect EXPLAIN command. It searches all queries to find T_ExplainStmt, and
set a flag. I guess we can't use this method in Postgres core, right?

If we have to collect the extra info for all query executions, we need to
reduce overhead. I mean collecting the minimum necessary.
To do that, I think it would be better to display less Extended stats info
in EXPLAIN results. For example, displaying only extended stats names is fine,
I guess. (I haven't understood your patch yet, so If I say wrong, sorry)

>6) The clauses in the grouping query are transformed to AND list, which
>is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

6) is related 5).
If we agree to remove showing quals of extended stats in EXPLAIN result,
We can solve this problem by removing the code. Is it okay?

>2) The deparsing is modeled (i.e. copied) from how we deal with index
>quals, but it's having issues with nested OR clauses, because there are
>nested RestrictInfo nodes and the deparsing does not expect that.
>
>3) It does not work for functional dependencies, because we effectively
>"merge" all functional dependencies and apply the entries. Not sure how
>to display this, but I think it should show the individual dependencies
>actually applied.
>
>7) It does not show statistics for individual expressions. I suppose
>examine_variable could add it to the rel somehow, and maybe we could do
>that with index expressions too?

I'm not sure about 2) 3) and 7) above, so I'd like to see some concrete examples

of queries. I would like to include it in the test pattern for regression testing.

To be fixed:

* StatisticExtInfo should have a _copy etc. node functionality now,
right? I've hit "unrecognized node type" with a prepared statement
because it's missing.

* Is there any particular reason behind choosing only some scan nodes to
display extended stats? E.g. BitmapHeapScan is missing.

* (New) In the case of Group by, we should put Extended stats info under the
Hash/Group Aggregate node (not under Scan node).

* (New) We have to create a regression test including the above test patterns.

Attached patch:

It is a rebased version on the head of the master because there were many Hunks

when I applied the previous patch on master.

I'll create regression tests firstly.

Regards,

Tatsuro Yamada

Вложения

0001-show-stats-in-explain-rebased.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Takashi Menjo
Дата: 18 января 2022 г., 13:58:35
Сообщение: Re: Map WAL segment files on PMEM as WAL buffers

Следующее

От: Alvaro Herrera
Дата: 18 января 2022 г., 15:22:55
Сообщение: Re: a misbehavior of partition row movement (?)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Showing applied extended statistics in explain

Вложения

Предыдущее

Следующее