Re: Showing applied extended statistics in explain

Поиск
Список
Период
Сортировка
От Tatsuro Yamada
Тема Re: Showing applied extended statistics in explain
Дата
Msg-id CAOKkKFtc45uNFoWYOCo4St19ayxrh-_+4TnZtwxGZz6-3k_GSA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Showing applied extended statistics in explain  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: Showing applied extended statistics in explain  (Jacob Champion <jchampion@timescale.com>)
Список pgsql-hackers
Hi Tomas!

>> What exactly use case do you have in mind?
>Well, my goal was to help users investigating the plan/estimates,
>because once you create multiple "candidate" statistics objects it may
>get quite tricky to determine which of them were applied, in what order,
>etc. It's not all that straightforward, given the various heuristics we
>use to pick the "best" statistics, apply dependencies last, etc. And I
>don't quite want to teach the users those rules, because I consider them
>mostly implementation details that wee may want to tweak in the future.

You are right. This feature will be very useful for plan tuning with
extended statistics. Therefore, I would like to help develop it. :-D


>5) It includes just statistics name + clauses, but maybe we should
>include additional info (e.g estimate for that combination of clauses).

I thought the above sentence was about what level to aim for in the initial 
version. Ideally, we would like to include additional information. However, 
it is clear that the more things we have, the longer it will take to develop.
So, I think it is better to commit the initial version at a minimal level to 
provide it to users quickly. 

As long as an Extended stats name is displayed in EXPLAIN result, it is 
possible to figure out which extended statistics are used. That information 
alone is valuable to the user.


> 4) The info is collected always, but I guess we should do that only when
>in explain mode. Not sure how expensive it is.

In the case of pg_plan_advsr that I created, I used ProcessUtility_hook to
detect EXPLAIN command. It searches all queries to find T_ExplainStmt, and
set a flag. I guess we can't use this method in Postgres core, right?

If we have to collect the extra info for all query executions, we need to
reduce overhead. I mean collecting the minimum necessary.
To do that, I think it would be better to display less Extended stats info
in EXPLAIN results. For example, displaying only extended stats names is fine,
I guess. (I haven't understood your patch yet, so If I say wrong, sorry)


>6) The clauses in the grouping query are transformed to AND list, which
>is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

6) is related 5).
If we agree to remove showing quals of extended stats in EXPLAIN result,
We can solve this problem by removing the code. Is it okay?


>2) The deparsing is modeled (i.e. copied) from how we deal with index
>quals, but it's having issues with nested OR clauses, because there are
>nested RestrictInfo nodes and the deparsing does not expect that.
>
>3) It does not work for functional dependencies, because we effectively
>"merge" all functional dependencies and apply the entries. Not sure how
>to display this, but I think it should show the individual dependencies
>actually applied.
>
>7) It does not show statistics for individual expressions. I suppose
>examine_variable could add it to the rel somehow, and maybe we could do
>that with index expressions too?

I'm not sure about 2) 3) and 7) above, so I'd like to see some concrete examples 
of queries. I would like to include it in the test pattern for regression testing.


To be fixed:

* StatisticExtInfo should have a _copy etc. node functionality now,
 right? I've hit "unrecognized node type" with a prepared statement
 because it's missing.

* Is there any particular reason behind choosing only some scan nodes to
 display extended stats? E.g. BitmapHeapScan is missing.

* (New) In the case of Group by, we should put Extended stats info under the
 Hash/Group Aggregate node (not under Scan node).

* (New) We have to create a regression test including the above test patterns.


Attached patch:

It is a rebased version on the head of the master because there were many Hunks 
when I applied the previous patch on master.

I'll create regression tests firstly.

Regards,
Tatsuro Yamada


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Takashi Menjo
Дата:
Сообщение: Re: Map WAL segment files on PMEM as WAL buffers
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: a misbehavior of partition row movement (?)