Обсуждение: Add LWLock blocker(s) information

Поиск
Список
Период
Сортировка

Add LWLock blocker(s) information

От
"Drouvot, Bertrand"
Дата:

Hi hackers,


I've attached a patch to add blocker(s) information for LW Locks.
The motive behind is to be able to get some blocker(s) information (if any) in the context of LW Locks.

Motivation:

We have seen some cases with heavy contention on some LW Locks (large number of backends waiting on the same LW Lock).

Adding some blocker information would make the investigations easier, it could help answering questions like:

  • how many PIDs are holding the LWLock (could be more than one in case of LW_SHARED)?
  • Is the blocking PID changing?
  • Is the number of blocking PIDs changing?
  • What is the blocking PID doing?
  • Is the blocking PID waiting?
  • In which mode request is the blocked PID?
  • in which mode is the blocker PID holding the lock?

Technical context and proposal: There is 2 points in this patch:

  • Add the instrumentation:
    • the patch adds into the LWLock struct:

                    last_holding_pid: last pid owner of the lock
                    last_mode: last holding mode of the last pid owner of the lock
                    nholders: number of holders (could be >1 in case of LW_SHARED)

    • the patch adds into the PGPROC struct:

                    lwLastHoldingPid: last holder of the LW lock the PID is waiting for
                    lwHolderMode;  LW lock mode of last holder of the LW lock the PID is waiting for
                    lwNbHolders: number of holders of the LW lock the PID is waiting for

            and what is necessary to update this new information.

  • Provide a way to display the information: the patch also adds a function pg_lwlock_blocking_pid to display this new information.

Outcome Example:

# select * from pg_lwlock_blocking_pid(10259);
 requested_mode | last_holder_pid | last_holder_mode | nb_holders
----------------+-----------------+------------------+------------
 LW_EXCLUSIVE   |           10232 | LW_EXCLUSIVE     |          1
(1 row)
 
 # select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
             query              |  pid  | state  |  wait_event   | wait_event_type |          pg_lwlock_blocking_pid           | pg_blocking_pids
--------------------------------+-------+--------+---------------+-----------------+-------------------------------------------+------------------
 insert into bdtlwa values (1); | 10232 | active |               |                 | (,,,)                                     | {}
 insert into bdtlwb values (1); | 10254 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
 create table bdtwt (a int);    | 10256 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
 insert into bdtlwa values (2); | 10259 | active | BufferContent | LWLock          | (LW_EXCLUSIVE,10232,LW_EXCLUSIVE,1)       | {}
 drop table bdtlwd;             | 10261 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
(5 rows)


So, should a PID being blocked on a LWLock we could see:

  • in which mode request it is waiting
  • the last pid holding the lock
  • the mode of the last PID holding the lock
  • the number of PID(s) holding the lock

Remarks:

I did a few benchmarks so far and did not observe notable performance degradation (can share more details if needed).

I did some quick attempts to get an exhaustive list of blockers (in case of LW_SHARED holders), but I think that would be challenging as:

  • There is about 40 000 calls to LWLockInitialize and all my attempts to init a list here produced “ FATAL: out of shared memory” or similar.
  • One way to get rid of using a list in LWLock could be to use proc_list (with proclist_head in LWLock and proclist_node in PGPROC). This is the current implementation for the “waiters” list. But this would not work for the blockers as one PGPROC can hold multiples LW locks so it could mean having a list of about 40K proclist_node per PGPROC.
  • I also have concerns about possible performance impact by using such a huge list in this context.

Those are the reasons why this patch does not provide an exhaustive list of blockers.

While this patch does not provide an exhaustive list of blockers (in case of LW_SHARED holders), the information it delivers could already be useful to get insights during LWLock contention scenario.

I will add this patch to the next commitfest. I look forward to your feedback about the idea and/or implementation.

Regards, Bertrand

Вложения

Re: Add LWLock blocker(s) information

От
"Drouvot, Bertrand"
Дата:

Hi hackers,

On 6/2/20 2:24 PM, Drouvot, Bertrand wrote:

Hi hackers,


I've attached a patch to add blocker(s) information for LW Locks.
The motive behind is to be able to get some blocker(s) information (if any) in the context of LW Locks.

Motivation:

We have seen some cases with heavy contention on some LW Locks (large number of backends waiting on the same LW Lock).

Adding some blocker information would make the investigations easier, it could help answering questions like:

  • how many PIDs are holding the LWLock (could be more than one in case of LW_SHARED)?
  • Is the blocking PID changing?
  • Is the number of blocking PIDs changing?
  • What is the blocking PID doing?
  • Is the blocking PID waiting?
  • In which mode request is the blocked PID?
  • in which mode is the blocker PID holding the lock?

Technical context and proposal: There is 2 points in this patch:

  • Add the instrumentation:
    • the patch adds into the LWLock struct:

                    last_holding_pid: last pid owner of the lock
                    last_mode: last holding mode of the last pid owner of the lock
                    nholders: number of holders (could be >1 in case of LW_SHARED)

    • the patch adds into the PGPROC struct:

                    lwLastHoldingPid: last holder of the LW lock the PID is waiting for
                    lwHolderMode;  LW lock mode of last holder of the LW lock the PID is waiting for
                    lwNbHolders: number of holders of the LW lock the PID is waiting for

            and what is necessary to update this new information.

  • Provide a way to display the information: the patch also adds a function pg_lwlock_blocking_pid to display this new information.

Outcome Example:

# select * from pg_lwlock_blocking_pid(10259);
 requested_mode | last_holder_pid | last_holder_mode | nb_holders
----------------+-----------------+------------------+------------
 LW_EXCLUSIVE   |           10232 | LW_EXCLUSIVE     |          1
(1 row)
 
 # select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
             query              |  pid  | state  |  wait_event   | wait_event_type |          pg_lwlock_blocking_pid           | pg_blocking_pids
--------------------------------+-------+--------+---------------+-----------------+-------------------------------------------+------------------
 insert into bdtlwa values (1); | 10232 | active |               |                 | (,,,)                                     | {}
 insert into bdtlwb values (1); | 10254 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
 create table bdtwt (a int);    | 10256 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
 insert into bdtlwa values (2); | 10259 | active | BufferContent | LWLock          | (LW_EXCLUSIVE,10232,LW_EXCLUSIVE,1)       | {}
 drop table bdtlwd;             | 10261 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
(5 rows)


So, should a PID being blocked on a LWLock we could see:

  • in which mode request it is waiting
  • the last pid holding the lock
  • the mode of the last PID holding the lock
  • the number of PID(s) holding the lock

Remarks:

I did a few benchmarks so far and did not observe notable performance degradation (can share more details if needed).

I did some quick attempts to get an exhaustive list of blockers (in case of LW_SHARED holders), but I think that would be challenging as:

  • There is about 40 000 calls to LWLockInitialize and all my attempts to init a list here produced “ FATAL: out of shared memory” or similar.
  • One way to get rid of using a list in LWLock could be to use proc_list (with proclist_head in LWLock and proclist_node in PGPROC). This is the current implementation for the “waiters” list. But this would not work for the blockers as one PGPROC can hold multiples LW locks so it could mean having a list of about 40K proclist_node per PGPROC.
  • I also have concerns about possible performance impact by using such a huge list in this context.

Those are the reasons why this patch does not provide an exhaustive list of blockers.

While this patch does not provide an exhaustive list of blockers (in case of LW_SHARED holders), the information it delivers could already be useful to get insights during LWLock contention scenario.

I will add this patch to the next commitfest. I look forward to your feedback about the idea and/or implementation.

Regards, Bertrand

Attaching a new version of the patch with a tiny change to make it pass the regression tests (opr_sanity was failing due to the new function that is part of the patch).

Regards,

Bertrand

Вложения

Re: Add LWLock blocker(s) information

От
David Zhang
Дата:
Hi, 
This is a very interesting topic. I did apply the 2nd patch to master branch and performed a quick test. I can observe
belowinformation,
 
postgres=# select * from pg_lwlock_blocking_pid(26925);
 requested_mode | last_holder_pid | last_holder_mode | nb_holders 
----------------+-----------------+------------------+------------
 LW_EXCLUSIVE   |           26844 | LW_EXCLUSIVE     |          1
(1 row)

postgres=# select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from
pg_stat_activitywhere state='active' and pid != pg_backend_pid();
 
                            query                             |  pid  | state  | wait_event | wait_event_type |
pg_lwlock_blocking_pid       | pg_blocking_pids 
 

--------------------------------------------------------------+-------+--------+------------+-----------------+-------------------------------------+------------------
 INSERT INTO orders SELECT FROM generate_series(1, 10000000); | 26925 | active | WALWrite   | LWLock          |
(LW_EXCLUSIVE,26844,LW_EXCLUSIVE,1)| {}
 
(1 row)

At some points, I have to keep repeating the query in order to capture the "lock info". I think this is probably part
ofthe design, but I was wondering,
 
if a query is in deadlock expecting a developer to take a look using the methods above, will the process be killed
beforea developer get the chance to execute the one of the query?
 
if some statistics information can be added, it may help the developers to get an overall idea about the lock status,
andif the developers can specify some filters, such as, the number of times a query entered into a deadlock, the
querieshold the lock more than number of ms, etc, it might help to troubleshooting the "lock" issue even better. And
moreover,if this feature can be an independent extension, similar to "pg_buffercache" it will be great.
 
Best regards,

David

Re: Add LWLock blocker(s) information

От
Robert Haas
Дата:
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> the patch adds into the LWLock struct:
>
>                     last_holding_pid: last pid owner of the lock
>                     last_mode: last holding mode of the last pid owner of the lock
>                     nholders: number of holders (could be >1 in case of LW_SHARED)

There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.

It's also a little hard to believe that this doesn't hurt performance
on workloads with a lot of LWLock contention, although maybe not; it
doesn't seem crazy expensive, just possibly enough to matter.

I thought a little bit about what this might buy as compared with just
sampling wait events. That by itself is enough to tell you which
LWLocks are heavily contended. It doesn't tell you what they are
contending against, so this would be superior in that regard. However,
I wonder how much of a problem that actually is. Typically, LWLocks
aren't being taken for long periods, so all the things that are
accessing the lock spend some time waiting (which you will see via
wait events in pg_stat_activity) and some time holding the lock
(making you see other things in pg_stat_activity). It's possible to
have cases where this isn't true; e.g. a relatively small number of
backends committing transactions could be slowing down a much larger
number of backends taking snapshots, and you'd mostly only see the
latter waiting for ProcArrayLock. However, those kinds of cases don't
seem super-common or super-difficult to figure out.

What kinds of scenarios motivate you to propose this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add LWLock blocker(s) information

От
Andres Freund
Дата:
Hi,

On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> > the patch adds into the LWLock struct:
> >
> >                     last_holding_pid: last pid owner of the lock
> >                     last_mode: last holding mode of the last pid owner of the lock
> >                     nholders: number of holders (could be >1 in case of LW_SHARED)
> 
> There's been significant work done over the years to get the size of
> an LWLock down; I'm not very enthusiastic about making it bigger
> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
> which embeds one of the LWLocks associated with a BufferDesc into the
> structure to reduce the number of cache lines associated with common
> buffer operations. I'm not sure whether this patch would increase the
> space usage of a BufferDesc to more than one cache line again, but at
> the very least it would make it a lot tighter, since it looks like it
> adds 12 bytes to the size of each one.

+many. If anything I would like to make them *smaller*. We should strive
to make locking more and more granular, and that requires the space
overhead to be small.  I'm unhappy enough about the tranche being in
there, and requiring padding etc.

I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
on size or efficiency.


> It's also a little hard to believe that this doesn't hurt performance
> on workloads with a lot of LWLock contention, although maybe not; it
> doesn't seem crazy expensive, just possibly enough to matter.

Yea.


> I thought a little bit about what this might buy as compared with just
> sampling wait events. That by itself is enough to tell you which
> LWLocks are heavily contended. It doesn't tell you what they are
> contending against, so this would be superior in that regard. However,
> I wonder how much of a problem that actually is. Typically, LWLocks
> aren't being taken for long periods, so all the things that are
> accessing the lock spend some time waiting (which you will see via
> wait events in pg_stat_activity) and some time holding the lock
> (making you see other things in pg_stat_activity). It's possible to
> have cases where this isn't true; e.g. a relatively small number of
> backends committing transactions could be slowing down a much larger
> number of backends taking snapshots, and you'd mostly only see the
> latter waiting for ProcArrayLock. However, those kinds of cases don't
> seem super-common or super-difficult to figure out.

Most of the cases where this kind of information really is interesting
seem to benefit a lot from having stack information available.  That
obviously has overhead, so we don't want the cost all the
time.  The script at
https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
can give you results like e.g.
https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg

Greetings,

Andres Freund



Re: Add LWLock blocker(s) information

От
Peter Geoghegan
Дата:
On Mon, Aug 10, 2020 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
> Most of the cases where this kind of information really is interesting
> seem to benefit a lot from having stack information available.  That
> obviously has overhead, so we don't want the cost all the
> time.  The script at
> https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
> can give you results like e.g.
> https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg

It seems to have bitrot. Do you have a more recent version of the script?


-- 
Peter Geoghegan



Re: Add LWLock blocker(s) information

От
Andres Freund
Дата:
Hi,

On 2020-08-12 16:47:13 -0700, Peter Geoghegan wrote:
> On Mon, Aug 10, 2020 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
> > Most of the cases where this kind of information really is interesting
> > seem to benefit a lot from having stack information available.  That
> > obviously has overhead, so we don't want the cost all the
> > time.  The script at
> > https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
> > can give you results like e.g.
> > https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
> 
> It seems to have bitrot. Do you have a more recent version of the script?

Attached. Needed one python3 fix, and to be adapted so it works with
futex based semaphores. Seems to work for both sysv and posix semaphores
now, based a very short test.

sudo python3 ./pgsemwait.py -x /home/andres/build/postgres/dev-optimize/vpath/src/backend/postgres -f
3|~/src/flamegraph/flamegraph.pl

Will add a note to the other thread.

Greetings,

Andres Freund

Вложения

Re: Add LWLock blocker(s) information

От
Peter Geoghegan
Дата:
On Wed, Aug 12, 2020 at 5:39 PM Andres Freund <andres@anarazel.de> wrote:
> Attached. Needed one python3 fix, and to be adapted so it works with
> futex based semaphores. Seems to work for both sysv and posix semaphores
> now, based a very short test.

Great, thanks!

-- 
Peter Geoghegan



Re: Add LWLock blocker(s) information

От
Heikki Linnakangas
Дата:
On 11/08/2020 03:41, Andres Freund wrote:
> On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
>> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>> the patch adds into the LWLock struct:
>>>
>>>                      last_holding_pid: last pid owner of the lock
>>>                      last_mode: last holding mode of the last pid owner of the lock
>>>                      nholders: number of holders (could be >1 in case of LW_SHARED)
>>
>> There's been significant work done over the years to get the size of
>> an LWLock down; I'm not very enthusiastic about making it bigger
>> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
>> which embeds one of the LWLocks associated with a BufferDesc into the
>> structure to reduce the number of cache lines associated with common
>> buffer operations. I'm not sure whether this patch would increase the
>> space usage of a BufferDesc to more than one cache line again, but at
>> the very least it would make it a lot tighter, since it looks like it
>> adds 12 bytes to the size of each one.
> 
> +many. If anything I would like to make them *smaller*. We should strive
> to make locking more and more granular, and that requires the space
> overhead to be small.  I'm unhappy enough about the tranche being in
> there, and requiring padding etc.
> 
> I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
> on size or efficiency.

That seems to be the consensus, so I'm marking this as Returned with 
Feeback in the commitfest.

- Heikki



Re: Add LWLock blocker(s) information

От
Craig Ringer
Дата:
On Wed, Nov 18, 2020 at 5:25 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 11/08/2020 03:41, Andres Freund wrote:
> On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
>> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>> the patch adds into the LWLock struct:
>>>
>>>                      last_holding_pid: last pid owner of the lock
>>>                      last_mode: last holding mode of the last pid owner of the lock
>>>                      nholders: number of holders (could be >1 in case of LW_SHARED)
>>
>> There's been significant work done over the years to get the size of
>> an LWLock down; I'm not very enthusiastic about making it bigger
>> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
>> which embeds one of the LWLocks associated with a BufferDesc into the
>> structure to reduce the number of cache lines associated with common
>> buffer operations. I'm not sure whether this patch would increase the
>> space usage of a BufferDesc to more than one cache line again, but at
>> the very least it would make it a lot tighter, since it looks like it
>> adds 12 bytes to the size of each one.
>
> +many. If anything I would like to make them *smaller*. We should strive
> to make locking more and more granular, and that requires the space
> overhead to be small.  I'm unhappy enough about the tranche being in
> there, and requiring padding etc.
>
> I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
> on size or efficiency.

That seems to be the consensus, so I'm marking this as Returned with
Feeback in the commitfest.

For what it's worth, I think that things like this are where we can really benefit from external tracing and observation tools.

Instead of tracking the information persistently in the LWLock struct, we can emit TRACE_POSTGRESQL_LWLOCK_BLOCKED_ON(...) in a context where we have the information available to us, then forget all about it. We don't spend anything unless someone's collecting the info.

If someone wants to track LWLock blocking relationships during debugging and performance work, they can use systemtap, dtrace, bpftrace, or a similar tool to observe the LWLock tracepoints and generate stats on LWLock blocking frequencies/durations. Doing so with systemtap should be rather simple.

I actually already had a look at this before. I found that the tracepoints that're in the LWLock code right now don't supply enough information in their arguments so you have to use DWARF debuginfo based probes, which is a pain. The tranche name alone doesn't let you identify which lock within a tranche is the current target.

I've attached a patch that adds the actual LWLock* to each tracepoint in the LWLock subsystem. That permits different locks to be tracked when handling tracepoint events within a single process.

Another patch adds tracepoints that were missing from LWLockUpdateVar and LWLockWaitForVar. And another removes a stray TRACE_POSTGRESQL_LWLOCK_ACQUIRE() in LWLockWaitForVar() which should not have been there, since the lock is not actually acquired by LWLockWaitForVar().

I'd hoped to add some sort of "index within the tranche" to tracepoints, but it looks like it's not feasible. It turns out to be basically impossible to get a stable identifier for an individual LWLock that is valid across different backends. A LWLock inside a DSM segment might have a different address in different backends depending on where the DSM segment got mapped. The LWLock subsystem doesn't keep track of them and doesn't have any way to map a LWLock pointer to any sort of cross-process-valid identifier. So that's a bit of a pain when tracing. To provide something stable I think  it'd be necessary to add some kind of counter tracked per-tranche and set by LWLockInitialize in the LWLock struct itself, which we sure don't want to do. If this ever becomes important for some reason we can probably look up whether the address is within a DSM segment or static shmem and compute some kind of relative address to report. For now you can't identify and compare individual locks within a tranche except for individual locks and named tranches.


By the way, the LWLock tracepoints currently fire T_NAME(lock) which calls GetLWTrancheName() for each tracepoint hit, so long as Pg is built with --enable-dtrace, even when nothing is actually tracing them. We might want to consider guarding them in systemtap tracepoint semaphore tests so they just become a predicted-away branch when not active. Doing so requires a small change to how we compile probes.d and the related makefile, but shouldn't be too hard. I haven't done that in this patch set.

Вложения