Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

Поиск
Список
Период
Сортировка
От Ilya Kosmodemiansky
Тема Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
Дата
Msg-id CAG95seUJwZOrg2grS6emW_C=8jNqCwnfRZRJN0emZjqVo9wcEg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Oct 2, 2014 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Not just from a oracle DBA POV ;). Generally.

sure

>> Saying that, principally they mean an
>> Oracle Wait Interface analogue. The Basic idea is to have counters or
>> sensors all around database kernel to measure what a particular
>> backend is currently waiting for and how long/how often it waits.
>
> Yes, I can see that. I'm not sure whether lwlocks are the primary point
> I'd start with though. In many cases you'll wait on so called
> 'heavyweight' locks too...


I try to kill two birds with one stone: make some prepositional work
on main large topic and deliver some convenience about LWLock
diagnostics. Maybe I'm wrong, but it seems to me it is much easier
task to advocate some more desired feature: we have some heavyweight
locks diagnostics tools and they are better than for lwlocks.


>
>> Suppose we have a PostgreSQL instance under heavy write workload, but
>> we do not know any details. We could pull from time to time
>> pg_stat_lwlock function which would say pid n1 currently in
>> WALWriteLock and pid n2 in WALInsertLock. That means we should think
>> about write ahead log tuning. Or pid n1 is in some clog-related
>> LWLock, which means we need move clog to ramdisk. This is a stupid
>> example, but it shows how useful LWLock tracing could be for DBAs.
>> Even better idea is to collect daily LWLock distribution, find most
>> frequent of them etc.
>
> I think it's more complicated than that - but I also think it'd be a
> great help for DBAs and us postgres hackers.


Sure it is more complicated, the example is stupid, just to show the point.


>> An idea of this patch is to trace LWLocks with the lowest possible
>> performance impact. We put integer lwLockID into procarray, then
>> acquiring the LWLock we put its id to procarray and now we could pull
>> procarray using a function to see if particular pid holds LWLock.
>
> But a backend can hold more than one lwlock at the same time? I don't
> think that's something we can ignore.


Yes, this one of the next steps. I have not figure out yet, how to do
it less painfully than LWLOCK_STATS does.


> I personally am doubtful that it makes much sense to move this into an
> extension. It'll likely be tightly enough interlinked to backend code
> that I don't see the point. But I'd not be surprised if others feel
> differently.


Thats why I asked this question, and also because I have no idea where
exactly put this functions inside backend if not into extension. But
probably there are some more important tasks with this work than
moving the function inside, I could do this later if it will be
necessary.

>
> I generally don't think you'll get interesting data without a fair bit
> of additional work.

Sure

> The first problem that comes to my mind about collecting enough data is
> that we have a very large number of lwlocks (fixed_number + 2 *
> shared_buffers). One 'trivial' way of implementing this is to have a per
> backend array collecting the information, and then a shared one
> accumulating data from it over time. But I'm afraid that's not going to
> fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
>
> Perhaps we should somehow encode this different for individual lwlock
> tranches? It's far less problematic to collect all this information for
> all but the buffer lwlocks...

That is a good point. There are actually two things to keep in mind:
i) user interface, ii) implementation

i) Personally, as a DBA, I do not see much sense in unaggregated list
of pid, lwlockid, wait_time or something like that.

Much better to have aggregation by pid and lwlockid, for instance:
- pid
- lwlockid
- lwlockname
- total_count (or number of exclusive/shared acquirations that had to
wait as you suggest, since we have a lot of lwlocks I am doubtful
about how important is the information about non-waiting lwlocks)

ii) Am I correct, that you suggest to go trough MainLWLockTranche and
retrieve all available lwlock information to some structure like
lwLockCell structure I've used in my patch? Something like hash
lwlocid->usagecount?


Regards,
Ilya

>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: pgbench throttling latency limit
Следующее
От: Ilya Kosmodemiansky
Дата:
Сообщение: Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)