Re: Making auto_explain more useful / convenient

Поиск
Список
Период
Сортировка
От Vladimir Churyukin
Тема Re: Making auto_explain more useful / convenient
Дата
Msg-id CAFSGpE2Vo5iB1Fiauazy_t4Em8ToBeWJRLqqWdk0wv7FOv86fg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making auto_explain more useful / convenient  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, Nov 11, 2023 at 7:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vladimir Churyukin <vladimir@churyukin.com> writes:
> Why not have an option to return EXPLAIN results as a NoticeResponse
> instead? That would make its usage more convenient.

That seems quite useless to me, and likely actually counterproductive.
If you are manually investigating query performance, you can just use
EXPLAIN directly.  The point of auto_explain, ISTM, is to capture info
about queries issued by automated applications.  So something like the
above could only work if you taught every one of your applications to
capture the NOTICE output, separate it from random other NOTICE
output, and then (probably) log it somewhere central for later
inspection.  That's a lot of code to write, and at the end you'd
only have effectively duplicated existing tooling such as pgbadger.
Also, what happens in applications you forgot to convert?


Sergey Kornilov just gave the right answer above in the thread for this one.
Unfortunately, there are a lot of scenarios where you can't use pgbadger or any other log analysis or it's not convenient.
There are a bunch of cloud hosted forks of postgres for example, not all of them give you this functionality.
In AWS for example you need to download all the logs first, which complicates it significantly.
The goal of this is not investigating performance of a single query but rather constant monitoring of a bunch (or all) queries, so you can detect
plan degradations right away.
 
> Another thing is tangentially related...
> I think it may be good to have a number of options to generate
> significantly shorter output similar to EXPLAIN. EXPLAIN is great, but
> sometimes people need more concise and specific information, for example
> total number of buffers and reads by certain query (this is pretty common),
> whether or not we had certain nodes in the plan (seq scan, scan of certain
> index(es)), how bad was cardinality misprediction on certain nodes, etc.

Maybe, but again I'm a bit skeptical.  IME you frequently don't know
what you're looking for until you've seen the bigger picture.  Zeroing
in on details like this could be pretty misleading.


If you don't know what you're looking for, then it's not very useful, I agree.
But in many cases you know. There are certain generic "signs of trouble"  that you can detect by
the amount of data the query processor scans, by cache hit rate for certain queries. presence of seq scans or scans of certain indexes,
large differences between predicted and actual rows, some other stuff that may be relevant to your app/queries specifically that you want to monitor.
We're already doing similar analysis on our side (a multi-terabyte db cluster with hundreds of millions to billions queries running daily).
But it's not efficient enough because:
1. the problem I mentioned above, access to logs is limited on cloud environments
2. explain output could be huge, it causes performance issues because of its size. compact output is much more preferable for mass processing
(it's even more important if this output is to notice messages rather than to logs, that's why I said it's tangentially related)

Since it seems the notice output is already possible, half of the problem is solved already.
I'll try to come up with possible options for more compact output then, unless you think it's completely futile.

thank you,
-Vladimir Churyukin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: pg_basebackup check vs Windows file path limits
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: pgsql: Don't trust unvalidated xl_tot_len.