Обсуждение: [HACKERS] auto_explain : log queries with wrong estimation

Поиск
Список
Период
Сортировка

[HACKERS] auto_explain : log queries with wrong estimation

От
Adrien Nayrat
Дата:
Hi hackers,


I try to made a patch to auto_explain in order to log queries with wrong estimation.

I compare planned row id : queryDesc->planstate->plan->plan_rows

Vs ntuples : queryDesc->planstate->instrument->ntuples;

If I understand, instrumentation is used only with explain. So my patch works
only with explain (and segfault without).


Is there a simple way to get ntuples?

Attached a naive patch.

Thanks :)

--
Adrien NAYRAT

http://dalibo.com - http://dalibo.org

Вложения

Re: [HACKERS] auto_explain : log queries with wrong estimation

От
Maksim Milyutin
Дата:
On 24.08.2017 14:56, Adrien Nayrat wrote:

Hi hackers,

Hi,

I try to made a patch to auto_explain in order to log queries with wrong estimation.

I compare planned row id : queryDesc->planstate->plan->plan_rows

Vs ntuples : queryDesc->planstate->instrument->ntuples;

AFAICS you want to introduce two additional per-node variables:
 - auto_explain_log_estimate_ratio that denotes minimum ratio (>= 1) between real value and planned one. I would add 'min' prefix before 'ratio'.
 - auto_explain_log_estimate_min_rows - minimum absolute difference between those two values. IMHO this name is somewhat poor, the suffix 'min_diff_rows' looks better.
If real expressions (ratio and diff) exceed these threshold values both, you log this situation. I'm right?

If I understand, instrumentation is used only with explain. So my patch works
only with explain (and segfault without).

Instrumentation is initialized only with analyze (log_analyze is true)[1]

Is there a simple way to get ntuples?

It's interesting question. In one's time I didn't find any way to get the amount of tuples emitted from a node.


1. contrib/auto_explain/auto_explain.c:221
-- 
Regards,
Maksim Milyutin

Re: [HACKERS] auto_explain : log queries with wrong estimation

От
Adrien Nayrat
Дата:
On 08/24/2017 03:08 PM, Maksim Milyutin wrote:
[...]
>
> AFAICS you want to introduce two additional per-node variables:
>  - auto_explain_log_estimate_ratio that denotes minimum ratio (>= 1) between
> real value and planned one. I would add 'min' prefix before 'ratio'.
>  - auto_explain_log_estimate_min_rows - minimum absolute difference between
> those two values. IMHO this name is somewhat poor, the suffix 'min_diff_rows'
> looks better.
> If real expressions (ratio and diff) exceed these threshold values both, you log
> this situation. I'm right?

Yes, you're totaly right! I wonder if "ratio" is fine or if I should use "factor"?

[...]
>
> Instrumentation is initialized only with analyze (log_analyze is true)[1]

Good, I didn't notice instrumentation can be enabled in auto_explain's hook. I
added these lines and it works :

if (auto_explain_log_estimate_ratio || auto_explain_log_estimate_min_rows)
{    queryDesc->instrument_options |= INSTRUMENT_ROWS;
}

But I need to undestand how instrumentation works.

Thanks for your answer. I will continue my work, actually my patch is not
functionnal.


--
Adrien NAYRAT