RE: [Proposal] Add accumulated statistics for wait event

Поиск
Список
Период
Сортировка
От Phil Florent
Тема RE: [Proposal] Add accumulated statistics for wait event
Дата
Msg-id DB6PR0301MB22789286025B8F222E3CE770BAEA0@DB6PR0301MB2278.eurprd03.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [Proposal] Add accumulated statistics for wait event  (Michael Paquier <michael@paquier.xyz>)
Ответы RE: [Proposal] Add accumulated statistics for wait event
Список pgsql-hackers
Hi,

It's the same logic with any polling system. An integration calculation using monte-carlo method with only a few points won't be accurate enough and can even be completely wrong etc. 
Polling is OK to troubleshoot a problem on the fly but 2 points are not enough. A few seconds are needed to obtain good enough data, e.g 5-10 seconds of polling with a 0.1=>0.01s interval between 2 queries of the activity.
Polling a few seconds while the user is waiting is normally enough to say if a significant part of the waits are on the database. It's very important to know that. With 1 hour of accumulated statistics, a DBA will always see something to fix. But if the user waits 10 seconds on a particular screen and 1 second is spent on the database it often won't directly help.  
Polling gives great information with postgreSQL 10 but it was already useful to catch top queries etc. in older versions.
I always check if activity is adequately reported by my tool using known cases. I want to be sure it will report adequately things in real-world troubleshooting sessions. Sometimes there are bugs in my tool, once there was an issue with postgres (pgstat_report_activty() was not called by workers in parallel index creation)

Best regards
Phil

De : Michael Paquier <michael@paquier.xyz>
Envoyé : jeudi 4 octobre 2018 12:58
À : Phil Florent
Cc : Yotsunaga, Naoki; Tomas Vondra; pgsql-hackers@lists.postgresql.org
Objet : Re: [Proposal] Add accumulated statistics for wait event
 
On Thu, Oct 04, 2018 at 09:32:37AM +0000, Phil Florent wrote:
> I am a DB beginner, so please tell me. It says that you can find
> events that are bottlenecks in sampling, but as you saw above, you can
> not find events shorter than the sampling interval, right?

Yes, which is why it would be as simple as making the interval shorter,
still not too short so as it bloats the amount of information fetched
which needs to be stored and afterwards (perhaps) treated for analysis.
This gets rather close to signal processing.  A simple image is for
example, assuming that event A happens 100 times in an interval of 1s,
and event B only once in the same interval of 1s, then if the snapshot
interval is only 1s, then in the worst case A would be treated an equal
of B, which would be wrong.
--
Michael

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: pg_ls_tmpdir()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: SerializeParamList vs machines with strict alignment