Обсуждение: WALWriteLock

Поиск
Список
Период
Сортировка

WALWriteLock

От
Yambu
Дата:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

regards

Re: WALWriteLock

От
Vijaykumar Jain
Дата:

On Tue, Jul 6, 2021, 7:44 PM Yambu <hyambu@gmail.com> wrote:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

pg version 
cloud or on prem
When do you see this piling up? Heavy concurrent usage or regular usage but random spike in locks?
Underlying storage shows any anomalies? 
resource utilization normal or high?
pg part of master replica setup with synchronous replication or async ?
Disk iowait high?




Re: WALWriteLock

От
Vijaykumar Jain
Дата:


On Tue, Jul 6, 2021, 7:54 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

On Tue, Jul 6, 2021, 7:44 PM Yambu <hyambu@gmail.com> wrote:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

pg version 
cloud or on prem
When do you see this piling up? Heavy concurrent usage or regular usage but random spike in locks?
Underlying storage shows any anomalies? 
resource utilization normal or high?
pg part of master replica setup with synchronous replication or async ?
Disk iowait high?

Also, is it during checkpointing ?
how much is the checkpoint spread, 
You can enable logging checkpoints,
to help relate correlation between checkpoint windows and walwrite locks piling up ? 

Re: WALWriteLock

От
Jeff Janes
Дата:
On Tue, Jul 6, 2021 at 10:14 AM Yambu <hyambu@gmail.com> wrote:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

Are you having a performance problem you are trying to track down?  If so, this probably indicates the cause.  If not, then it is probably not a bad sign.

The likely cause is that multiple sessions are trying to COMMIT at the same time, and are blocking on the slow fsync of the WAL data.  One process will block on WALSync, all the ones queued up behind it will block on WALWrite.

Cheers,

Jeff

Re: WALWriteLock

От
Vijaykumar Jain
Дата:


On Thu, Jul 8, 2021, 1:22 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 6, 2021 at 10:14 AM Yambu <hyambu@gmail.com> wrote:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

Are you having a performance problem you are trying to track down?  If so, this probably indicates the cause.  If not, then it is probably not a bad sign.

The likely cause is that multiple sessions are trying to COMMIT at the same time, and are blocking on the slow fsync of the WAL data.  One process will block on WALSync, all the ones queued up behind it will block on WALWrite.

Is this reproducible ? I mean I have seen multiple ppl raising a similar issue but I could not really reproduce on my laptop, tried with slow disk, introducing random latency etc. I even tried running gdb on a few processes and trying to force into acquire a walwrite lock but not release it etc but I guess I was not doing it correctly.

If the db is slowed down due to high concurrent usage trying to commit at same time, how do we arrive at a baseline of how much a server can handle before it starts tripping.


Re: WALWriteLock

От
Jeff Janes
Дата:
On Wed, Jul 7, 2021 at 4:14 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:


On Thu, Jul 8, 2021, 1:22 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 6, 2021 at 10:14 AM Yambu <hyambu@gmail.com> wrote:
Hello

I'm seeing a lot of WALWriteLocks , is this a bad sign , what might cause this?

Are you having a performance problem you are trying to track down?  If so, this probably indicates the cause.  If not, then it is probably not a bad sign.

The likely cause is that multiple sessions are trying to COMMIT at the same time, and are blocking on the slow fsync of the WAL data.  One process will block on WALSync, all the ones queued up behind it will block on WALWrite.

Is this reproducible ? I mean I have seen multiple ppl raising a similar issue but I could not really reproduce on my laptop, tried with slow disk, introducing random latency etc. I even tried running gdb on a few processes and trying to force into acquire a walwrite lock but not release it etc but I guess I was not doing it correctly.

It is easy to reproduce if you have WAL on a slow disk, but you need high concurrency to do it.  This for example should show it nicely:

pgbench -i -s20   ## set it up
pgbench -T600 -c10 -j10  ## run the test

If you initialize with the default scale of 1, then you won't get enough concurrency to show the problem, as all process will try to update the same row in pgbench_branches, which means only one process will commit at a time, with the rest blocking on the row lock, rather than on the WAL.

If the db is slowed down due to high concurrent usage trying to commit at same time, how do we arrive at a baseline of how much a server can handle before it starts tripping.

 The above benchmark should work for that, too.

Cheers,

Jeff