Обсуждение: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

Поиск
Список
Период
Сортировка

Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
Hi,

I have a patch which is actually not commitfest-ready now, but it
always better to start discussing proof of concept having some patch
instead of just an idea.

Since I'am a DBA rather than C programmer, I will appreciate any
suggestions/critics about the patch and code quality to make things
better.


What is all about.

>From an Oracle DBA's point of view, currently we have a lack of
performance diagnostics tools. Saying that, principally they mean an
Oracle Wait Interface analogue. The Basic idea is to have counters or
sensors all around database kernel to measure what a particular
backend is currently waiting for and how long/how often it waits.
Obviously, implementing such a complex system is not an easy task.
However decomposing the task and implementing some small diagnostics
tools proved to be a good solution: things like pg_stat_bgwriter,
pg_stat_statements or pg_stat_archiver make life significantly easier.
Implementing such histogram for LWLock tracing was my goal.


Why LWLock tracing is important.

Suppose we have a PostgreSQL instance under heavy write workload, but
we do not know any details. We could pull from time to time
pg_stat_lwlock function which would say pid n1 currently in
WALWriteLock and pid n2 in WALInsertLock. That means we should think
about write ahead log tuning. Or pid n1 is in some clog-related
LWLock, which means we need move clog to ramdisk. This is a stupid
example, but it shows how useful LWLock tracing could be for DBAs.
Even better idea is to collect daily LWLock distribution, find most
frequent of them etc.


Problems.

As far as I know, there are two major problems implementing LWLock
tracing in Postgres: Performance and stability of the server.

The patch https://commitfest.postgresql.org/action/patch_view?id=885
(discussion starts here I hope -
http://www.postgresql.org/message-id/4FE8CA2C.3030809@uptime.jp)
demonstrates performance problems; LWLOCK_STAT,  LOCK_DEBUG and
DTrace-like approach are slow, unsafe for production use and a bit
clumsy for using by DBA.


An Idea.

An idea of this patch is to trace LWLocks with the lowest possible
performance impact. We put integer lwLockID into procarray, then
acquiring the LWLock we put its id to procarray and now we could pull
procarray using a function to see if particular pid holds LWLock. Not
perfect, but if we see sometimes somebody consumes a lot of particular
LWLocks, we could investigate this matter in a more precise way using
another tool. Something like that was implemented in the attached
patch:

issuing pgbench  -c 50 -t 1000 -j 50

we have something like that:

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
-------------------------------+----------+------
 2014-10-01 15:11:43.848765+02 |       57 | 4257
(1 row)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
-------------------------------+----------+------
 2014-10-01 15:11:45.892428+02 |       67 | 4269
 2014-10-01 15:11:45.892428+02 |       67 | 4258
 2014-10-01 15:11:45.892428+02 |       57 | 4270
 2014-10-01 15:11:45.892428+02 |       67 | 4245
 2014-10-01 15:11:45.892428+02 |       67 | 4271
 2014-10-01 15:11:45.892428+02 |       57 | 4256
 2014-10-01 15:11:45.892428+02 |       54 | 4241
(7 rows)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
-------------------------------+----------+------
 2014-10-01 15:11:47.211024+02 |       58 | 4262
 2014-10-01 15:11:47.211024+02 |       69 | 4243
 2014-10-01 15:11:47.211024+02 |       69 | 4246
(3 rows)

postgres=# select now(),* from pg_stat_lwlock ;
 now | lwlockid | pid
-----+----------+-----
(0 rows)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
-------------------------------+----------+------
 2014-10-01 15:11:49.897357+02 |       55 | 4240
 2014-10-01 15:11:49.897357+02 |       61 | 4264
 2014-10-01 15:11:49.897357+02 |       55 | 4258
 2014-10-01 15:11:49.897357+02 |       61 | 4260
 2014-10-01 15:11:49.897357+02 |       61 | 4283
 2014-10-01 15:11:49.897357+02 |       62 | 4242

We could collect it to some view or table on a periodic basis.

Questions.

1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
(simply for test purposes). Is it OK, or better to implement it
somewhere inside pg_catalog or in another extension (for example
pg_stat_statements)?

2. Currently lwLockID is in main procarray (this is for PoC purpose
only). I know why procarray was split into two and I know why the main
one should be kept as small as possible. Anyway, which design approach
better: to keep it inside the main one (this is an important feature,
lwLockID is small and I use proc->lwLockId = T_ID(l) to get the id) or
to put it into another procarray (create new one or use existent)?

3. Which is the best way to retrieve the name of LWLock instead of
only its ID (for usability reasons, WALWriteLock looks more useful
than just 8)? To get it from tranche as T_NAME(lock) at the same time
as lwLockId seems to be not a nice idea, especially to put the name
into procarray to pull it. It spoils whole idea. Any advice?

4. Going through procarray to get pairs pid, lwlockid I do
LWLockAcquire(ProcArrayLock, LW_SHARED); I am not quite sure is a good
idea, because of its performance impact. Probably such a
histogram-style thing does not need strong consistency and
ProcArrayLock is not needed?

So, any thoughts, should I proceed implementing this feature?


Best regards,
Ilya


PS I am thankful to Simon Riggs and Bruce Momjian for discussing this
idea with me on PGCon, and especially to Simon who gave me a brief but
pretty useful tutorial how LWLock-related code lives in Postgres. Also
I am thankful to Heikki Linnakangas and Magnus Hagander for answering
some of my stupid questions about procarray internals.

--
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com

Вложения

Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Craig Ringer
Дата:
On 10/02/2014 12:19 AM, Ilya Kosmodemiansky wrote:

> From an Oracle DBA's point of view, currently we have a lack of
> performance diagnostics tools. 

Agreed. Sometimes very frustratingly so.

> Even better idea is to collect daily LWLock distribution, find most
> frequent of them etc.

While we could add this within PostgreSQL, I wonder if it's worth
looking at whether it can be done non-intrusively with operating system
facilities like perf.

I've had really good results using perf to trace and graph xlog
generation and other metrics in the past.

> The patch https://commitfest.postgresql.org/action/patch_view?id=885
> (discussion starts here I hope -
> http://www.postgresql.org/message-id/4FE8CA2C.3030809@uptime.jp)
> demonstrates performance problems; LWLOCK_STAT,  LOCK_DEBUG and
> DTrace-like approach are slow, unsafe for production use and a bit
> clumsy for using by DBA.

It's not at all clear to me that a DTrace-like (or perf-based, rather)
approach is unsafe, slow, or unsuitable for production use.

Resolving lock IDs to names might be an issue, though.

With appropriate wrapper tools I think we could have quite a useful
library of perf-based diagnostics and tracing tools for PostgreSQL.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-01 18:19:05 +0200, Ilya Kosmodemiansky wrote:
> I have a patch which is actually not commitfest-ready now, but it
> always better to start discussing proof of concept having some patch
> instead of just an idea.

That's a good way to start work on a topic like this.

> From an Oracle DBA's point of view, currently we have a lack of
> performance diagnostics tools.

Not just from a oracle DBA POV ;). Generally.

So I'm happy to see some focus on this!

> Saying that, principally they mean an
> Oracle Wait Interface analogue. The Basic idea is to have counters or
> sensors all around database kernel to measure what a particular
> backend is currently waiting for and how long/how often it waits.

Yes, I can see that. I'm not sure whether lwlocks are the primary point
I'd start with though. In many cases you'll wait on so called
'heavyweight' locks too...

> Suppose we have a PostgreSQL instance under heavy write workload, but
> we do not know any details. We could pull from time to time
> pg_stat_lwlock function which would say pid n1 currently in
> WALWriteLock and pid n2 in WALInsertLock. That means we should think
> about write ahead log tuning. Or pid n1 is in some clog-related
> LWLock, which means we need move clog to ramdisk. This is a stupid
> example, but it shows how useful LWLock tracing could be for DBAs.
> Even better idea is to collect daily LWLock distribution, find most
> frequent of them etc.

I think it's more complicated than that - but I also think it'd be a
great help for DBAs and us postgres hackers.

> An idea of this patch is to trace LWLocks with the lowest possible
> performance impact. We put integer lwLockID into procarray, then
> acquiring the LWLock we put its id to procarray and now we could pull
> procarray using a function to see if particular pid holds LWLock.

But a backend can hold more than one lwlock at the same time? I don't
think that's something we can ignore.

> Not
> perfect, but if we see sometimes somebody consumes a lot of particular
> LWLocks, we could investigate this matter in a more precise way using
> another tool. Something like that was implemented in the attached
> patch:
> 
> issuing pgbench  -c 50 -t 1000 -j 50
> 
> we have something like that:
> 
> postgres=# select now(),* from pg_stat_lwlock ;
>               now              | lwlockid | pid
> -------------------------------+----------+------
>  2014-10-01 15:11:43.848765+02 |       57 | 4257
> (1 row)

Hm. So you just collect the lwlockid and the pid? That doesn't sound
particularly interesting to me. In my opinion, you'd need at least:
* pid
* number of exclusive/shared acquirations
* number of exclusive/shared acquirations that had to wait
* total wait time of exclusive/shared acquirations

> Questions.
> 
> 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
> (simply for test purposes). Is it OK, or better to implement it
> somewhere inside pg_catalog or in another extension (for example
> pg_stat_statements)?

I personally am doubtful that it makes much sense to move this into an
extension. It'll likely be tightly enough interlinked to backend code
that I don't see the point. But I'd not be surprised if others feel
differently.

I generally don't think you'll get interesting data without a fair bit
of additional work.

The first problem that comes to my mind about collecting enough data is
that we have a very large number of lwlocks (fixed_number + 2 *
shared_buffers). One 'trivial' way of implementing this is to have a per
backend array collecting the information, and then a shared one
accumulating data from it over time. But I'm afraid that's not going to
fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...

Perhaps we should somehow encode this different for individual lwlock
tranches? It's far less problematic to collect all this information for
all but the buffer lwlocks...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Thu, Oct 2, 2014 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Not just from a oracle DBA POV ;). Generally.

sure

>> Saying that, principally they mean an
>> Oracle Wait Interface analogue. The Basic idea is to have counters or
>> sensors all around database kernel to measure what a particular
>> backend is currently waiting for and how long/how often it waits.
>
> Yes, I can see that. I'm not sure whether lwlocks are the primary point
> I'd start with though. In many cases you'll wait on so called
> 'heavyweight' locks too...


I try to kill two birds with one stone: make some prepositional work
on main large topic and deliver some convenience about LWLock
diagnostics. Maybe I'm wrong, but it seems to me it is much easier
task to advocate some more desired feature: we have some heavyweight
locks diagnostics tools and they are better than for lwlocks.


>
>> Suppose we have a PostgreSQL instance under heavy write workload, but
>> we do not know any details. We could pull from time to time
>> pg_stat_lwlock function which would say pid n1 currently in
>> WALWriteLock and pid n2 in WALInsertLock. That means we should think
>> about write ahead log tuning. Or pid n1 is in some clog-related
>> LWLock, which means we need move clog to ramdisk. This is a stupid
>> example, but it shows how useful LWLock tracing could be for DBAs.
>> Even better idea is to collect daily LWLock distribution, find most
>> frequent of them etc.
>
> I think it's more complicated than that - but I also think it'd be a
> great help for DBAs and us postgres hackers.


Sure it is more complicated, the example is stupid, just to show the point.


>> An idea of this patch is to trace LWLocks with the lowest possible
>> performance impact. We put integer lwLockID into procarray, then
>> acquiring the LWLock we put its id to procarray and now we could pull
>> procarray using a function to see if particular pid holds LWLock.
>
> But a backend can hold more than one lwlock at the same time? I don't
> think that's something we can ignore.


Yes, this one of the next steps. I have not figure out yet, how to do
it less painfully than LWLOCK_STATS does.


> I personally am doubtful that it makes much sense to move this into an
> extension. It'll likely be tightly enough interlinked to backend code
> that I don't see the point. But I'd not be surprised if others feel
> differently.


Thats why I asked this question, and also because I have no idea where
exactly put this functions inside backend if not into extension. But
probably there are some more important tasks with this work than
moving the function inside, I could do this later if it will be
necessary.

>
> I generally don't think you'll get interesting data without a fair bit
> of additional work.

Sure

> The first problem that comes to my mind about collecting enough data is
> that we have a very large number of lwlocks (fixed_number + 2 *
> shared_buffers). One 'trivial' way of implementing this is to have a per
> backend array collecting the information, and then a shared one
> accumulating data from it over time. But I'm afraid that's not going to
> fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
>
> Perhaps we should somehow encode this different for individual lwlock
> tranches? It's far less problematic to collect all this information for
> all but the buffer lwlocks...

That is a good point. There are actually two things to keep in mind:
i) user interface, ii) implementation

i) Personally, as a DBA, I do not see much sense in unaggregated list
of pid, lwlockid, wait_time or something like that.

Much better to have aggregation by pid and lwlockid, for instance:
- pid
- lwlockid
- lwlockname
- total_count (or number of exclusive/shared acquirations that had to
wait as you suggest, since we have a lot of lwlocks I am doubtful
about how important is the information about non-waiting lwlocks)

ii) Am I correct, that you suggest to go trough MainLWLockTranche and
retrieve all available lwlock information to some structure like
lwLockCell structure I've used in my patch? Something like hash
lwlocid->usagecount?


Regards,
Ilya

>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Thu, Oct 2, 2014 at 5:25 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> It's not at all clear to me that a DTrace-like (or perf-based, rather)
> approach is unsafe, slow, or unsuitable for production use.

> With appropriate wrapper tools I think we could have quite a useful
> library of perf-based diagnostics and tracing tools for PostgreSQL.

It is not actually very slow, overhead is quite reasonable since we
want such comprehensive performance diagnostics. About stability, I
have had a couple of issues with postgres crushes with dtrace and dos
not without. Most of them was on FreeBSD, which is still in use by
many people and were caused actually by freebsd dtrace, but for me it
is quite enough to have doubts about keeping dtrace aware build in
production.


OK, OK -  maybe things were changed last couple of years or will
change soon - still dtrace/perf is well enough for those who is
familiar with it, but you need a really convenient wrapper to make
oracle/db2 DBA happy with using such approach.


> Resolving lock IDs to names might be an issue, though.

I am afraid it is

>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Stephen Frost
Дата:
* Craig Ringer (craig@2ndquadrant.com) wrote:
> > The patch https://commitfest.postgresql.org/action/patch_view?id=885
> > (discussion starts here I hope -
> > http://www.postgresql.org/message-id/4FE8CA2C.3030809@uptime.jp)
> > demonstrates performance problems; LWLOCK_STAT,  LOCK_DEBUG and
> > DTrace-like approach are slow, unsafe for production use and a bit
> > clumsy for using by DBA.
>
> It's not at all clear to me that a DTrace-like (or perf-based, rather)
> approach is unsafe, slow, or unsuitable for production use.

I've certainly had it take production systems down (perf, specifically),
so I'd definitely consider it "unsafe".  I wouldn't say it's unusable,
but it's certainly not what we should have as the end-goal for PG.
Thanks,
    Stephen

Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> > 1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
> > (simply for test purposes). Is it OK, or better to implement it
> > somewhere inside pg_catalog or in another extension (for example
> > pg_stat_statements)?
>
> I personally am doubtful that it makes much sense to move this into an
> extension. It'll likely be tightly enough interlinked to backend code
> that I don't see the point. But I'd not be surprised if others feel
> differently.

I agree that this doesn't make sense as an extension.

> I generally don't think you'll get interesting data without a fair bit
> of additional work.

I'm not sure about this..

> The first problem that comes to my mind about collecting enough data is
> that we have a very large number of lwlocks (fixed_number + 2 *
> shared_buffers). One 'trivial' way of implementing this is to have a per
> backend array collecting the information, and then a shared one
> accumulating data from it over time. But I'm afraid that's not going to
> fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...

I was just going to suggest exactly this- a per-backend array which then
gets pushed into a shared area periodically.  Taking up 50MB per backend
is quite a bit though. :/

> Perhaps we should somehow encode this different for individual lwlock
> tranches? It's far less problematic to collect all this information for
> all but the buffer lwlocks...

Yeah, that seems like it would at least be a good approach to begin
with.
Thanks,
    Stephen

Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Bruce Momjian
Дата:
On Thu, Oct  2, 2014 at 11:50:14AM +0200, Andres Freund wrote:
> The first problem that comes to my mind about collecting enough data is
> that we have a very large number of lwlocks (fixed_number + 2 *
> shared_buffers). One 'trivial' way of implementing this is to have a per
> backend array collecting the information, and then a shared one
> accumulating data from it over time. But I'm afraid that's not going to
> fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
> 
> Perhaps we should somehow encode this different for individual lwlock
> tranches? It's far less problematic to collect all this information for
> all but the buffer lwlocks...

First, I think this could be a major Postgres feature, and I am excited
someone is working on this.

As far as gathering data, I don't think we are going to do any better in
terms of performance/simplicity/reliability than to have a single PGPROC
entry to record when we enter/exit a lock, and having a secondary
process scan the PGPROC array periodically.

What that gives us is almost zero overhead on backends, high
reliability, and the ability of the scan daemon to give higher weights
to locks that are held longer.  Basically, if you just stored the locks
you held and released, you either have to add timing overhead to the
backends, or you have no timing information collected.  By scanning
active locks, a short-lived lock might not be seen at all, while a
longer-lived lock might be seen by multiple scans.  What that gives us
is a weighting of the lock time with almost zero overhead.   If we want
finer-grained lock statistics, we just increase the number of scans per
second.

I am assuming almost no one cares about the number of locks, but rather
they care about cummulative lock durations.

I am having trouble seeing any other option that has such a good
cost/benefit profile.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Robert Haas
Дата:
On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian <bruce@momjian.us> wrote:
> I am assuming almost no one cares about the number of locks, but rather
> they care about cummulative lock durations.
>
> I am having trouble seeing any other option that has such a good
> cost/benefit profile.

I do think that the instrumentation data gathered by LWLOCK_STATS is
useful - very useful.

But it does have significant overhead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Fri, Oct 3, 2014 at 5:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
> As far as gathering data, I don't think we are going to do any better in
> terms of performance/simplicity/reliability than to have a single PGPROC
> entry to record when we enter/exit a lock, and having a secondary
> process scan the PGPROC array periodically.

That was the point.

>
> What that gives us is almost zero overhead on backends, high
> reliability, and the ability of the scan daemon to give higher weights
> to locks that are held longer.  Basically, if you just stored the locks
> you held and released, you either have to add timing overhead to the
> backends, or you have no timing information collected.  By scanning
> active locks, a short-lived lock might not be seen at all, while a
> longer-lived lock might be seen by multiple scans.  What that gives us
> is a weighting of the lock time with almost zero overhead.   If we want
> finer-grained lock statistics, we just increase the number of scans per
> second.

So I could add the function, which will accumulate the data in some
view/table (with weights etc). How it should be called? From specific
process? From some existing maintenance process such as autovacuum?
Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1
to 10 for 1 to 10 samples pro second?

>
> I am assuming almost no one cares about the number of locks, but rather
> they care about cummulative lock durations.

Oracle and DB2 measure both,  cummulative durations and counts.

>
> I am having trouble seeing any other option that has such a good
> cost/benefit profile.

At least cost. In Oracle documentation clearly stated, that it is all
about diagnostic convenience, performance impact is significant.

>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + Everyone has their own god. +



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Fri, Oct 3, 2014 at 5:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I do think that the instrumentation data gathered by LWLOCK_STATS is
> useful - very useful.

Sure, quite useful.

But how about this comment:
   /*    * The LWLock stats will be updated within a critical section, which    * requires allocating new hash entries.
Allocationswithin a critical    * section are normally not allowed because running out of memory would    * lead to a
PANIC,but LWLOCK_STATS is debugging code that's not normally    * turned on in production, so that's an acceptable
risk.The hash entries    * are small, so the risk of running out of memory is minimal in practice.    */
 


> But it does have significant overhead.

I will say that it is a bit more than overhead for production use.




-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Bruce Momjian
Дата:
On Fri, Oct  3, 2014 at 05:53:59PM +0200, Ilya Kosmodemiansky wrote:
> > What that gives us is almost zero overhead on backends, high
> > reliability, and the ability of the scan daemon to give higher weights
> > to locks that are held longer.  Basically, if you just stored the locks
> > you held and released, you either have to add timing overhead to the
> > backends, or you have no timing information collected.  By scanning
> > active locks, a short-lived lock might not be seen at all, while a
> > longer-lived lock might be seen by multiple scans.  What that gives us
> > is a weighting of the lock time with almost zero overhead.   If we want
> > finer-grained lock statistics, we just increase the number of scans per
> > second.
> 
> So I could add the function, which will accumulate the data in some
> view/table (with weights etc). How it should be called? From specific
> process? From some existing maintenance process such as autovacuum?
> Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1
> to 10 for 1 to 10 samples pro second?

Yes, that's the right approach.  You would implement it as a background
worker process, and a GUC as you described.  I assume it would populate
a view like we already do for the pg_stat_ views, and the counters could
be reset somehow.  I would pattern it after how we handle the pg_stat_
views.

> > I am assuming almost no one cares about the number of locks, but rather
> > they care about cummulative lock durations.
> 
> Oracle and DB2 measure both,  cummulative durations and counts.

Well, the big question is whether counts are really useful.  You did a
good job of explaining that when you find heavy clog or xlog lock usage
you would adjust your server.  What I am unclear about is why you would
adjust your server based on lock _counts_ and not cummulative lock
duration.  I don't think we want the overhead of accumulating
information that isn't useful.

> > I am having trouble seeing any other option that has such a good
> > cost/benefit profile.
> 
> At least cost. In Oracle documentation clearly stated, that it is all
> about diagnostic convenience, performance impact is significant.

Oh, we don't want to go there then, and I think this approach is a big
win.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-03 11:33:18 -0400, Bruce Momjian wrote:
> On Thu, Oct  2, 2014 at 11:50:14AM +0200, Andres Freund wrote:
> > The first problem that comes to my mind about collecting enough data is
> > that we have a very large number of lwlocks (fixed_number + 2 *
> > shared_buffers). One 'trivial' way of implementing this is to have a per
> > backend array collecting the information, and then a shared one
> > accumulating data from it over time. But I'm afraid that's not going to
> > fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
> > 
> > Perhaps we should somehow encode this different for individual lwlock
> > tranches? It's far less problematic to collect all this information for
> > all but the buffer lwlocks...
> 
> First, I think this could be a major Postgres feature, and I am excited
> someone is working on this.
> 
> As far as gathering data, I don't think we are going to do any better in
> terms of performance/simplicity/reliability than to have a single PGPROC
> entry to record when we enter/exit a lock, and having a secondary
> process scan the PGPROC array periodically.

I don't think that'll give meaningful results given the very short times
most lwlocks are held. And it'll give not very interesting results for
multiple lwlocks held at the same time - most of the time the 'earlier'
held ones are more interesting than the last acquired one...

> I am assuming almost no one cares about the number of locks, but rather
> they care about cummulative lock durations.

I actually don't think that's true. Every lock acquiration implies a
number of atomic locks. Those are expensive. And if you see individual
locks acquired a high number of times in multiple proceses that's
something important. It causes significant bus traffic between sockets,
while not necessarily visible in the lock held times.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-03 11:51:46 -0400, Robert Haas wrote:
> On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > I am assuming almost no one cares about the number of locks, but rather
> > they care about cummulative lock durations.
> >
> > I am having trouble seeing any other option that has such a good
> > cost/benefit profile.
> 
> I do think that the instrumentation data gathered by LWLOCK_STATS is
> useful - very useful.
> 
> But it does have significant overhead.

Have you ever analyzed where that overhead is with the current code?

I do wonder if having a per backend array in shmem indexed by the lockid
(inside its tranche) wouldn't be quite doable for the smaller tranches.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Bruce Momjian
Дата:
On Fri, Oct  3, 2014 at 11:15:13PM +0200, Andres Freund wrote:
> > As far as gathering data, I don't think we are going to do any better in
> > terms of performance/simplicity/reliability than to have a single PGPROC
> > entry to record when we enter/exit a lock, and having a secondary
> > process scan the PGPROC array periodically.
> 
> I don't think that'll give meaningful results given the very short times
> most lwlocks are held. And it'll give not very interesting results for

I figured you could get more data the more often you sampled.

> multiple lwlocks held at the same time - most of the time the 'earlier'
> held ones are more interesting than the last acquired one...

Well, I thought lock _waiting_ would be the most interesting measure,
not locks held.  Doesn't pg_locks show locks held?

> > I am assuming almost no one cares about the number of locks, but rather
> > they care about cummulative lock durations.
> 
> I actually don't think that's true. Every lock acquiration implies a
> number of atomic locks. Those are expensive. And if you see individual
> locks acquired a high number of times in multiple proceses that's
> something important. It causes significant bus traffic between sockets,
> while not necessarily visible in the lock held times.

True, but I don't think users are going to get much value from those
numbers, and they are hard to get.  Server developers might want to know
lock counts, but in those cases performance might not be as important.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Bruce Momjian
Дата:
On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
> > I actually don't think that's true. Every lock acquiration implies a
> > number of atomic locks. Those are expensive. And if you see individual
> > locks acquired a high number of times in multiple proceses that's
> > something important. It causes significant bus traffic between sockets,
> > while not necessarily visible in the lock held times.
> 
> True, but I don't think users are going to get much value from those
> numbers, and they are hard to get.  Server developers might want to know
> lock counts, but in those cases performance might not be as important.

In summary, I think there are three measurements we can take on locks:

1.  lock wait, from request to acquisition
2.  lock duration, from acquisition to release
3.  lock count

I think #1 is the most useful, and can be tracked by scanning a single
PGPROC lock entry per session (as already outlined), because you can't
wait on more than one lock at a time.

#2 would probably require multiple PGPROC lock entries, though I am
unclear how often a session holds multiple light-weight locks
concurrently.  #3 might require global counters in memory.

#1 seems the most useful from a user perspective, and we can perhaps
experiment with #2 and #3 once that is done.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Robert Haas
Дата:
On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
>> > I actually don't think that's true. Every lock acquiration implies a
>> > number of atomic locks. Those are expensive. And if you see individual
>> > locks acquired a high number of times in multiple proceses that's
>> > something important. It causes significant bus traffic between sockets,
>> > while not necessarily visible in the lock held times.
>>
>> True, but I don't think users are going to get much value from those
>> numbers, and they are hard to get.  Server developers might want to know
>> lock counts, but in those cases performance might not be as important.
>
> In summary, I think there are three measurements we can take on locks:
>
> 1.  lock wait, from request to acquisition
> 2.  lock duration, from acquisition to release
> 3.  lock count
>
> I think #1 is the most useful, and can be tracked by scanning a single
> PGPROC lock entry per session (as already outlined), because you can't
> wait on more than one lock at a time.
>
> #2 would probably require multiple PGPROC lock entries, though I am
> unclear how often a session holds multiple light-weight locks
> concurrently.  #3 might require global counters in memory.
>
> #1 seems the most useful from a user perspective, and we can perhaps
> experiment with #2 and #3 once that is done.

I agree with some of your thoughts on this, Bruce, but there are some
points I'm not so sure about.

I have a feeling that any system that involves repeatedly scanning the
procarray will either have painful performance impact (if it's
frequent) or catch only a statistically insignificant fraction of lock
acquisitions (if it's infrequent).  The reason I think there may be a
performance impact is because quite a number of heavily-trafficked
shared memory structures are bottlenecked on memory latency, so it's
easy to imagine that having an additional process periodically reading
them would increase cache-line bouncing and hurt performance.  We will
probably need some experimentation to find the best idea.

I think the easiest way to measure lwlock contention would be to put
some counters in the lwlock itself.  My guess, based on a lot of
fiddling with LWLOCK_STATS over the years, is that there's no way to
count lock acquisitions and releases without harming performance
significantly - no matter where we put the counters, it's just going
to be too expensive.  However, I believe that incrementing a counter -
even in the lwlock itself - might not be too expensive if we only do
it when (1) a process goes to sleep or (2) spindelays occur.  Those
operations are expensive enough that I think the cost of an extra
shared memory access won't be too significant.

As a further point, when I study the LWLOCK_STATS output, that stuff
is typically what I'm looking for anyway.  The first few times I ran
with that enabled, I was kind of interested by the total lock counts
... but that quickly got uninteresting.  The blocking and spindelays
show you where the problems are, so that's the interesting part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I think the easiest way to measure lwlock contention would be to put
> some counters in the lwlock itself.  My guess, based on a lot of
> fiddling with LWLOCK_STATS over the years, is that there's no way to
> count lock acquisitions and releases without harming performance
> significantly - no matter where we put the counters, it's just going
> to be too expensive.  However, I believe that incrementing a counter -
> even in the lwlock itself - might not be too expensive if we only do
> it when (1) a process goes to sleep or (2) spindelays occur.  Those
> operations are expensive enough that I think the cost of an extra
> shared memory access won't be too significant.

FWIW, that approach sounds sane to me as well.  I concur with Robert's
fear that adding cycles to the no-contention case will cost so much
as to make the feature unusable in production, or even for realistic
testing; which would mean it's pretty much useless.
        regards, tom lane



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-07 10:04:38 -0400, Robert Haas wrote:
> On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
> >> > I actually don't think that's true. Every lock acquiration implies a
> >> > number of atomic locks. Those are expensive. And if you see individual
> >> > locks acquired a high number of times in multiple proceses that's
> >> > something important. It causes significant bus traffic between sockets,
> >> > while not necessarily visible in the lock held times.
> >>
> >> True, but I don't think users are going to get much value from those
> >> numbers, and they are hard to get.  Server developers might want to know
> >> lock counts, but in those cases performance might not be as important.
> >
> > In summary, I think there are three measurements we can take on locks:
> >
> > 1.  lock wait, from request to acquisition
> > 2.  lock duration, from acquisition to release
> > 3.  lock count
> >
> > I think #1 is the most useful, and can be tracked by scanning a single
> > PGPROC lock entry per session (as already outlined), because you can't
> > wait on more than one lock at a time.
> >
> > #2 would probably require multiple PGPROC lock entries, though I am
> > unclear how often a session holds multiple light-weight locks
> > concurrently.  #3 might require global counters in memory.
> >
> > #1 seems the most useful from a user perspective, and we can perhaps
> > experiment with #2 and #3 once that is done.
> 
> I agree with some of your thoughts on this, Bruce, but there are some
> points I'm not so sure about.
> 
> I have a feeling that any system that involves repeatedly scanning the
> procarray will either have painful performance impact (if it's
> frequent) or catch only a statistically insignificant fraction of lock
> acquisitions (if it's infrequent).

Agreed.

> I think the easiest way to measure lwlock contention would be to put
> some counters in the lwlock itself.  My guess, based on a lot of
> fiddling with LWLOCK_STATS over the years, is that there's no way to
> count lock acquisitions and releases without harming performance
> significantly - no matter where we put the counters, it's just going
> to be too expensive.  However, I believe that incrementing a counter -
> even in the lwlock itself - might not be too expensive if we only do
> it when (1) a process goes to sleep or (2) spindelays occur.

Increasing the size will be painful on its own :(.

Have you tried/considered putting the counters into a per-backend array
somewhere in shared memory? That way they don't blow up the size of
frequently ping-ponged cachelines. Then you can summarize those values
whenever querying the results.

> As a further point, when I study the LWLOCK_STATS output, that stuff
> is typically what I'm looking for anyway.  The first few times I ran
> with that enabled, I was kind of interested by the total lock counts
> ... but that quickly got uninteresting.  The blocking and spindelays
> show you where the problems are, so that's the interesting part.

I don't really agree with this. Especially with shared locks (even more
so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
blocking and spindelay.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Heikki Linnakangas
Дата:
On 10/07/2014 05:04 PM, Robert Haas wrote:
> On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
>>>> I actually don't think that's true. Every lock acquiration implies a
>>>> number of atomic locks. Those are expensive. And if you see individual
>>>> locks acquired a high number of times in multiple proceses that's
>>>> something important. It causes significant bus traffic between sockets,
>>>> while not necessarily visible in the lock held times.
>>>
>>> True, but I don't think users are going to get much value from those
>>> numbers, and they are hard to get.  Server developers might want to know
>>> lock counts, but in those cases performance might not be as important.
>>
>> In summary, I think there are three measurements we can take on locks:
>>
>> 1.  lock wait, from request to acquisition
>> 2.  lock duration, from acquisition to release
>> 3.  lock count
>>
>> I think #1 is the most useful, and can be tracked by scanning a single
>> PGPROC lock entry per session (as already outlined), because you can't
>> wait on more than one lock at a time.
>>
>> #2 would probably require multiple PGPROC lock entries, though I am
>> unclear how often a session holds multiple light-weight locks
>> concurrently.  #3 might require global counters in memory.
>>
>> #1 seems the most useful from a user perspective, and we can perhaps
>> experiment with #2 and #3 once that is done.
>
> I agree with some of your thoughts on this, Bruce, but there are some
> points I'm not so sure about.
>
> I have a feeling that any system that involves repeatedly scanning the
> procarray will either have painful performance impact (if it's
> frequent) or catch only a statistically insignificant fraction of lock
> acquisitions (if it's infrequent).  The reason I think there may be a
> performance impact is because quite a number of heavily-trafficked
> shared memory structures are bottlenecked on memory latency, so it's
> easy to imagine that having an additional process periodically reading
> them would increase cache-line bouncing and hurt performance.  We will
> probably need some experimentation to find the best idea.
>
> I think the easiest way to measure lwlock contention would be to put
> some counters in the lwlock itself.  My guess, based on a lot of
> fiddling with LWLOCK_STATS over the years, is that there's no way to
> count lock acquisitions and releases without harming performance
> significantly - no matter where we put the counters, it's just going
> to be too expensive.  However, I believe that incrementing a counter -
> even in the lwlock itself - might not be too expensive if we only do
> it when (1) a process goes to sleep or (2) spindelays occur.  Those
> operations are expensive enough that I think the cost of an extra
> shared memory access won't be too significant.

FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in 
shared memory. When you wake up, clear it. That should be cheap enough 
to have it always enabled. And it can easily be extended to other 
"waits", e.g. when you're waiting for input from client.

I don't think counting the number of lock acquisition is that 
interesting. It doesn't give you any information on how long the waits 
were, for example. I think the question the user or DBA is trying to 
answer is "Why is this query taking so long, even though the CPU is 
sitting idle?". A sampling approach works well for that.

For comparison, the "perf" tool works great for figuring out where the 
CPU time is spent in a process. It works by sampling. This is similar, 
but for wallclock time, and that we can hopefully produce more 
user-friendly output.

- Heikki




Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-07 17:22:18 +0300, Heikki Linnakangas wrote:
> FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in
> shared memory. When you wake up, clear it. That should be cheap enough to
> have it always enabled. And it can easily be extended to other "waits", e.g.
> when you're waiting for input from client.

I think there's a few locks where that's interesting. But in my
experience many slowdowns aren't caused by actual waits, but because of
cacheline contention. And for that the number of acquisitions is much
more relevant than the waiting. The primary example for this is probably
the procarray lock.

> I don't think counting the number of lock acquisition is that interesting.
> It doesn't give you any information on how long the waits were, for
> example.

Sure, that's a separate thing that we should be able to answer.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Robert Haas
Дата:
On Tue, Oct 7, 2014 at 10:12 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Have you tried/considered putting the counters into a per-backend array
> somewhere in shared memory? That way they don't blow up the size of
> frequently ping-ponged cachelines. Then you can summarize those values
> whenever querying the results.

The problem with that is that you need O(N*M) memory instead of O(N),
where N is the number of lwlocks and M is the number of backends.
That gets painful in a hurry.  We just got rid of something like that
with your patch to get rid of all the backend-local buffer pin arrays;
I'm not keen to add another such thing right back.  It might be viable
if we excluded the buffer locks, but that also detracts from the
value.  Plus, no matter how you slice it, you're now touching cache
lines completely unrelated to the ones you need for the foreground
work.  That's got a distributed overhead that's hard to measure, but
it is certainly going to knock other stuff out of the CPU caches to
some degree.

>> As a further point, when I study the LWLOCK_STATS output, that stuff
>> is typically what I'm looking for anyway.  The first few times I ran
>> with that enabled, I was kind of interested by the total lock counts
>> ... but that quickly got uninteresting.  The blocking and spindelays
>> show you where the problems are, so that's the interesting part.
>
> I don't really agree with this. Especially with shared locks (even more
> so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
> blocking and spindelay.

If your patch to implement lwlocks using atomics goes in, then we may
have to reassess what instrumentation is actually useful here.  I can
only comment on the usefulness of various bits of instrumentation I
have used in the past on the code bases we had that time, or my
patches thereupon.  Nobody here can reasonably be expected to know
whether the same stuff will still be useful after possible future
patches that are not even in a reviewable state at present have been
committed.

Having said that, if there's no blocking or spindelay any more, to me
that doesn't mean we should look for some other measure of contention
instead.  It just means that the whole area is a solved problem, we
don't need to measure contention any more because there isn't any, and
we can move on to other issues once we finish partying.  But mildly
skeptical that the outcome will be as good as all that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-07 10:30:54 -0400, Robert Haas wrote:
> On Tue, Oct 7, 2014 at 10:12 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Have you tried/considered putting the counters into a per-backend array
> > somewhere in shared memory? That way they don't blow up the size of
> > frequently ping-ponged cachelines. Then you can summarize those values
> > whenever querying the results.
> 
> The problem with that is that you need O(N*M) memory instead of O(N),
> where N is the number of lwlocks and M is the number of backends.

Right.

> That gets painful in a hurry.  We just got rid of something like that
> with your patch to get rid of all the backend-local buffer pin arrays;
> I'm not keen to add another such thing right back.

I think it might be ok if we'd exclude buffer locks and made it depend
on a GUC.

> It might be viable
> if we excluded the buffer locks, but that also detracts from the
> value.  Plus, no matter how you slice it, you're now touching cache
> lines completely unrelated to the ones you need for the foreground
> work.  That's got a distributed overhead that's hard to measure, but
> it is certainly going to knock other stuff out of the CPU caches to
> some degree.

Yea, it's hard to guess ;(

> >> As a further point, when I study the LWLOCK_STATS output, that stuff
> >> is typically what I'm looking for anyway.  The first few times I ran
> >> with that enabled, I was kind of interested by the total lock counts
> >> ... but that quickly got uninteresting.  The blocking and spindelays
> >> show you where the problems are, so that's the interesting part.
> >
> > I don't really agree with this. Especially with shared locks (even more
> > so if/hwen the LW_SHARED stuff gets in), there's simply no relevant
> > blocking and spindelay.
> 
> If your patch to implement lwlocks using atomics goes in, then we may
> have to reassess what instrumentation is actually useful here.  I can
> only comment on the usefulness of various bits of instrumentation I
> have used in the past on the code bases we had that time, or my
> patches thereupon.  Nobody here can reasonably be expected to know
> whether the same stuff will still be useful after possible future
> patches that are not even in a reviewable state at present have been
> committed.

It's not like it'd be significantly different today - in a read mostly
workload that's bottlenecked on ProcArrayLock you'll not see many
waits. There you'd have to count the total number of spinlocks cycles to
measure anything interesting.

> Having said that, if there's no blocking or spindelay any more, to me
> that doesn't mean we should look for some other measure of contention
> instead.  It just means that the whole area is a solved problem, we
> don't need to measure contention any more because there isn't any, and
> we can move on to other issues once we finish partying.  But mildly
> skeptical that the outcome will be as good as all that.

It's not. Just because we're not waiting in a spinlock loop doesn't mean
there can't be contention... It's just moved one level down, into the cpu.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Tue, Oct 7, 2014 at 4:12 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I think the easiest way to measure lwlock contention would be to put
>> some counters in the lwlock itself.  My guess, based on a lot of
>> fiddling with LWLOCK_STATS over the years, is that there's no way to
>> count lock acquisitions and releases without harming performance
>> significantly - no matter where we put the counters, it's just going
>> to be too expensive.  However, I believe that incrementing a counter -
>> even in the lwlock itself - might not be too expensive if we only do
>> it when (1) a process goes to sleep or (2) spindelays occur.
>
> Increasing the size will be painful on its own :(.

I am afraid in this case we should think about minimizing overhead but
not about avoiding it at all: having such DBA-friendly feature it is
worth it.

Let me step down a bit, since the discussion went to details, while
the whole design idea stays unclear.

What actually we need: fact, that lwlock acquired? lock count? time
spent in lock? overall lock duration?

Usual way to explain how any of such performance tools work, is
Traffic example (and any oracle/db2 wait-interface aware DBA knows
it):

You have some from home to office way and spend an hour to make it.
You try to optimize it and found, that however you take highway with
no speed limit, you usually stack in traffic turning from highway to
your office and spend there about 10-30 min. Alternative is to take
another way with 2 speed limit zones and one traffic light, totally
you will loose 2 and 5 minutes on speed limit parts and 2 min on red
light - overall better than 30 minutes in a jam and even better than
10 min in a jam. That is all about: to found bottleneck we need
information that process hold certain lock, that it was held certain
time or there are a lot of shorter time locks.

I think, sampling even 1-2 times pro second and building sort of
histogram is well enough at the moment, because it shows (not very in
a very precise manner however) that process hold certain lock, that it
was held certain time or there are a lot of shorter time locks.
After that it is possible to implement something more precise. (As far
as I know, Greg Smith works on some sort of wait events, but it seems
to me there are a lot of work to do to implement exact analog of OWI)

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Robert Haas
Дата:
On Tue, Oct 7, 2014 at 10:36 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> That gets painful in a hurry.  We just got rid of something like that
>> with your patch to get rid of all the backend-local buffer pin arrays;
>> I'm not keen to add another such thing right back.
>
> I think it might be ok if we'd exclude buffer locks and made it depend
> on a GUC.

Maybe.  I'm skeptical about whether what you'll get at that point is
really useful.

> It's not like it'd be significantly different today - in a read mostly
> workload that's bottlenecked on ProcArrayLock you'll not see many
> waits. There you'd have to count the total number of spinlocks cycles to
> measure anything interesting.

Hmm, really?  I've never had to do that to find bottlenecks.

>> Having said that, if there's no blocking or spindelay any more, to me
>> that doesn't mean we should look for some other measure of contention
>> instead.  It just means that the whole area is a solved problem, we
>> don't need to measure contention any more because there isn't any, and
>> we can move on to other issues once we finish partying.  But mildly
>> skeptical that the outcome will be as good as all that.
>
> It's not. Just because we're not waiting in a spinlock loop doesn't mean
> there can't be contention... It's just moved one level down, into the cpu.

I guess that's true, but how much of the contention at that level is
really important to expose to DBAs?  We can put anything that is of
developer interest only int LWLOCK_STATS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Tue, Oct 7, 2014 at 4:30 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-07 17:22:18 +0300, Heikki Linnakangas wrote:
>> FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in
>> shared memory. When you wake up, clear it. That should be cheap enough to
>> have it always enabled. And it can easily be extended to other "waits", e.g.
>> when you're waiting for input from client.
>
> I think there's a few locks where that's interesting. But in my
> experience many slowdowns aren't caused by actual waits, but because of
> cacheline contention. And for that the number of acquisitions is much
> more relevant than the waiting. The primary example for this is probably
> the procarray lock.

I would say, that to see particular lwlockid 50 times in 100 samples
or to see it 50 times one after another or see it only 2 times,
provides good and representative information for DBA. At least better
than nothing.

>
>> I don't think counting the number of lock acquisition is that interesting.
>> It doesn't give you any information on how long the waits were, for
>> example.
>
> Sure, that's a separate thing that we should be able to answer.

The point is that a lot of short waits sometimes could be as worse as
one long wait. That is why it is important, but I thing propper
sampling provides good estimation for this.

>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Andres Freund
Дата:
On 2014-10-07 10:45:24 -0400, Robert Haas wrote:
> > It's not like it'd be significantly different today - in a read mostly
> > workload that's bottlenecked on ProcArrayLock you'll not see many
> > waits. There you'd have to count the total number of spinlocks cycles to
> > measure anything interesting.
> 
> Hmm, really?  I've never had to do that to find bottlenecks.

How did you diagnose procarray contention in a readonly workload
otherwise, without using perf?

> >> Having said that, if there's no blocking or spindelay any more, to me
> >> that doesn't mean we should look for some other measure of contention
> >> instead.  It just means that the whole area is a solved problem, we
> >> don't need to measure contention any more because there isn't any, and
> >> we can move on to other issues once we finish partying.  But mildly
> >> skeptical that the outcome will be as good as all that.
> >
> > It's not. Just because we're not waiting in a spinlock loop doesn't mean
> > there can't be contention... It's just moved one level down, into the cpu.
> 
> I guess that's true, but how much of the contention at that level is
> really important to expose to DBAs?

I think so. Right now it's hard to see for them whether the rate of
transactions/the isolation mode is a significant problem or not.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Ilya Kosmodemiansky
Дата:
On Tue, Oct 7, 2014 at 4:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It's not like it'd be significantly different today - in a read mostly
>> workload that's bottlenecked on ProcArrayLock you'll not see many
>> waits. There you'd have to count the total number of spinlocks cycles to
>> measure anything interesting.
>
> Hmm, really?  I've never had to do that to find bottlenecks.

Not sure. Long waiting time represents the same thing better or at
least well enough. I think only acquisitions count is important.



>>> Having said that, if there's no blocking or spindelay any more, to me
>>> that doesn't mean we should look for some other measure of contention
>>> instead.  It just means that the whole area is a solved problem, we
>>> don't need to measure contention any more because there isn't any, and
>>> we can move on to other issues once we finish partying.  But mildly
>>> skeptical that the outcome will be as good as all that.
>>
>> It's not. Just because we're not waiting in a spinlock loop doesn't mean
>> there can't be contention... It's just moved one level down, into the cpu.
>
> I guess that's true, but how much of the contention at that level is
> really important to expose to DBAs?  We can put anything that is of
> developer interest only int LWLOCK_STATS.

For DBA all this means we are waiting for a lock.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com



Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

От
Robert Haas
Дата:
On Tue, Oct 7, 2014 at 10:51 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-07 10:45:24 -0400, Robert Haas wrote:
>> > It's not like it'd be significantly different today - in a read mostly
>> > workload that's bottlenecked on ProcArrayLock you'll not see many
>> > waits. There you'd have to count the total number of spinlocks cycles to
>> > measure anything interesting.
>>
>> Hmm, really?  I've never had to do that to find bottlenecks.
>
> How did you diagnose procarray contention in a readonly workload
> otherwise, without using perf?

spindelays.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company