Обсуждение: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

Поиск

Список

Период

Сортировка

Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Tom Dearman

Дата:

28 октября 2015 г., 18:57:24

We have a performance problem when our postgres is under high load. The CPU usage is very low, we have 48 cores for our postgres and the idle time averages at 90%. The problem is we get spikes in our transaction times which don’t appear with any obvious regularity and when we get the larger spikes, if I look in the postgres log we see that there is locking on 'process 41915 acquired ExclusiveLock on extension of relation 27177 of database 26192’. The actual relation changes one time it might be one table and another time another, though they are always big tables. I have looked at various previous threads and the only suggestions are either that the disk io is maxed out, which from our observations we don’t believe is the case for us, or that ‘shared_buffers’ is to large - so we have reduced this right down to 1G. In the previous threads there was an indication that the underlying problem was a lock which I believe has been either removed or much improved in 9.5 (see Lock scalability improvements), however we have not seen any improvement in the relation extension locking problem that we see. The version of 9.5 that we have tested is beta1. Any help in showing us how to improve this would be greatly appreciated.

Вложения

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Tom Dearman

Дата:

28 октября 2015 г., 20:11:34

The processes involved are always client (java) processes which finish (normally) quickly. The actual process can be any of the client processes, the point is that an extension to the relation is going on and it seems to acquire excessive locks. You typically see a whole series of process id contentions all waiting for the same extension. I have included an example below. It is also interesting that a later attempt to get the exclusive lock by process 41911 says it is waiting for id 41907 even though according to the log other processes have already acquired the lock.

2015-10-28 15:42:26 GMT [41907]: [9-1] db=betwave,user=betwave_app_user LOG: process 41907 acquired ExclusiveLock on extension of relation 26789 of database 26192 after 2135.238 ms

2015-10-28 15:42:26 GMT [41955]: [4-1] db=betwave,user=betwave_app_user LOG: process 41955 acquired ExclusiveLock on extension of relation 26789 of database 26192 after 2134.035 ms

.......

2015-10-28 15:42:26 GMT [41700]: [9-1] db=betwave,user=betwave_app_user LOG: process 41700 acquired ExclusiveLock on extension of relation 26789 of database 26192 after 2104.766 ms

.......

2015-10-28 15:42:26 GMT [42006]: [19-1] db=betwave,user=betwave_app_user LOG: process 42006 acquired ExclusiveLock on extension of relation 26789 of database 26192 after 2087.745 ms

.......

2015-10-28 15:46:37 GMT [41911]: [25-1] db=betwave,user=betwave_app_user LOG: process 41911 still waiting for ExclusiveLock on extension of relation 26789 of database 26192 after 1000.193 ms

2015-10-28 15:46:37 GMT [41911]: [26-1] db=betwave,user=betwave_app_user DETAIL: Process holding the lock: 41907. Wait queue: 41911, 41918, 41929.

On 28 Oct 2015, at 16:56, Igor Neyman <ineyman@perceptron.com> wrote:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Dearman
Sent: Wednesday, October 28, 2015 11:44 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

We have a performance problem when our postgres is under high load. The CPU usage is very low, we have 48 cores for our postgres and the idle time averages at 90%. The problem is we get spikes in our transaction times which don’t appear with any obvious regularity and when we get the larger spikes, if I look in the postgres log we see that there is locking on 'process 41915 acquired ExclusiveLock on extension of relation 27177 of database 26192’. The actual relation changes one time it might be one table and another time another, though they are always big tables. I have looked at various previous threads and the only suggestions are either that the disk io is maxed out, which from our observations we don’t believe is the case for us, or that ‘shared_buffers’ is to large - so we have reduced this right down to 1G. In the previous threads there was an indication that the underlying problem was a lock which I believe has been either removed or much improved in 9.5 (see Lock scalability improvements), however we have not seen any improvement in the relation extension locking problem that we see. The version of 9.5 that we have tested is beta1. Any help in showing us how to improve this would be greatly appreciated.

Do you know what “process 41915” is? And what it was doing to cause ExclusiveLock?

Regards,
Igor Neyman

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Jim Nasby

Дата:

28 октября 2015 г., 21:21:39

On 10/28/15 12:11 PM, Tom Dearman wrote:
> It is also interesting that a later attempt to get the exclusive lock by
> process 41911 says it is waiting for id 41907 even though according to
> the log other processes have already acquired the lock.

Those would be different acquisitions of the same lock.

It's important to understand that lock is an *internal* lock that is
used to protect extending a file on disk. There are 20 places in code
that attempt to acquire this lock (search for LockRelationForExtension),
but the case in access/heap/hio.c is probably pretty representative. It
gets the extension lock, finds a buffer to extend the relation into,
locks the buffer that holds the existing tuple (for an UPDATE only), and
then releases the extension lock.

Looking through there, the obvious places for things to get hung up are
either on extending the file on disk, or finding a place to put the new
buffer in the buffer pool. The later is why there are recommendations on
shrinking shared_buffers. The larger shared_buffers are, the longer it
takes to scan them.

But there's a flip-side: if you set shared_buffers too small, that means
that the system is working very hard to keep ALL the buffers it has in
shared_buffers, which makes it even more expensive to find a victim buffer.

What was shared_buffers originally set too, and how large is this database?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Igor Neyman

Дата:

28 октября 2015 г., 21:26:23

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Dearman
Sent: Wednesday, October 28, 2015 11:44 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

Do you know what “process 41915” is? And what it was doing to cause ExclusiveLock?

Regards,

Igor Neyman

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Jeff Janes

Дата:

28 октября 2015 г., 22:20:52

On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@gmail.com> wrote:
> We have a performance problem when our postgres is under high load.  The CPU
> usage is very low, we have 48 cores for our postgres and the idle time
> averages at 90%.  The problem is we get spikes in our transaction times
> which don’t appear with any obvious regularity and when we get the larger
> spikes, if I look in the postgres log we see that there is locking on
> 'process 41915 acquired ExclusiveLock on extension of relation 27177 of
> database 26192’.  The actual relation changes one time it might be one table
> and another time another, though they are always big tables.  I have looked
> at various previous threads and the only suggestions are either that the
> disk io is maxed out, which from our observations we don’t believe is the
> case for us,

What are those observations?  Keep in mind that if 20 processes are
all trying to extend the relation at the same time, one will block on
IO (according to top/sar/vmstat etc.) and the other 19 will block on
that first one on a PostgreSQL heavy-weight lock.  So all 20 of them
are effectively blocked on IO, but system monitoring tools won't know
that.

Also, the IO spikes will be transient, so any monitoring that
over-averages will not pick up on them.

> or that ‘shared_buffers’ is to large - so we have reduced this
> right down to 1G.  In the previous threads there was an indication that the
> underlying problem was a lock which I believe has been either removed or
> much improved in 9.5 (see Lock scalability improvements), however we have
> not seen any improvement in the relation extension locking problem that we
> see.  The version of 9.5 that we have tested is beta1.  Any help in showing
> us how to improve this would be greatly appreciated.

I don't believe any of the improvements made were to this area.

Your latency spikes seem to be happening at a 20 minute interval.
That would make me think they are lined up with end-of-checkpoint
fsync activity, except those should be happening every 5 minutes as
your conf has not changed checkpoint_timeout away from the default.
Since you have log_checkpoints on, what do you see in the log files
about how often they occur, and what the checkpoint write time, sync
time, etc. are?

Cheers,

Jeff

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Tom Dearman

Дата:

02 ноября 2015 г., 18:32:16

Thanks for the prompt replies so far, I have done some more investigation to be able to clearly answer some of the question.

The original shared-buffers was 8G and I have done another run on Friday using this old value instead of my more recent 1G limit. There was no noticeable improvement. I also installed the extension pg_buffercache and following some articles such as:

http://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/

using:

SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers

 FROM pg_class c

 INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode

 INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())

WHERE usagecount >= 3;

My system under load is using just over 500M of the shared_buffer at usage count >= 3. Our system is very write heavy, with all of the big tables written to but not read from (at least during the load test run). Although our db will grow (under load) to 100G in a few hours and keep growing, the data in shared_buffers - according to my observations above - seems low. We have the WAL on a different disk from the main tables. We have the following representative TPS for the disks (from SAR) when under our load test:

06:34:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util

06:35:01 PM dev8-0 176.15 25.89 8773.98 49.96 0.83 4.73 3.79 66.85

06:35:01 PM dev8-1 174.74 0.13 8746.49 50.06 0.81 4.64 3.82 66.81

06:35:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

06:35:01 PM dev8-5 1.41 25.76 27.49 37.74 0.02 14.96 2.31 0.33

06:35:01 PM dev8-16 15.02 0.00 3994.82 265.95 1.27 84.88 2.76 4.14

06:35:01 PM dev8-17 15.02 0.00 3994.82 265.95 1.27 84.88 2.76 4.14

dev8-0 to dev8-5 are the represent the disk where the WAL is. dev8-16 to dev8-17 are where the main postgres data directory is located.

We have run bonnie against these WAL disks and the SAR log for that show the following (examples for heavy write and a heavy read and write):

heavy write

01:28:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util

01:29:01 PM dev8-0 371.85 12.11 377771.45 1015.97 122.93 330.46 2.69 100.00

01:29:01 PM dev8-1 371.85 12.11 377771.45 1015.97 122.93 330.46 2.69 100.00

01:29:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

01:29:01 PM dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

01:29:01 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

01:29:01 PM dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

heavy read/write

12:56:02 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util

12:57:01 PM dev8-0 456.32 355622.67 109942.01 1020.26 58.34 126.70 2.17 99.00

12:57:01 PM dev8-1 456.32 355622.67 109942.01 1020.26 58.34 126.70 2.17 99.00

12:57:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

12:57:01 PM dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

12:57:01 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

12:57:01 PM dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

To me this shows that we are not IO limited at least it is not obvious to me how we can be. I have also, previously, run our load test (purely a sanity check) where I had the WAL written to RAM disk, just to be sure that the IO was not a bottleneck, and we still had the same problem with waiting on exclusive locks.

As far as the checkpoint goes, it does happen every 5 minutes and takes about 4.5 mins which corresponds to the 0.9 checkpoint_completion_target we have set.

On 28 Oct 2015, at 19:20, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@gmail.com> wrote:
We have a performance problem when our postgres is under high load. The CPU
usage is very low, we have 48 cores for our postgres and the idle time
averages at 90%. The problem is we get spikes in our transaction times
which don’t appear with any obvious regularity and when we get the larger
spikes, if I look in the postgres log we see that there is locking on
'process 41915 acquired ExclusiveLock on extension of relation 27177 of
database 26192’. The actual relation changes one time it might be one table
and another time another, though they are always big tables. I have looked
at various previous threads and the only suggestions are either that the
disk io is maxed out, which from our observations we don’t believe is the
case for us,

What are those observations? Keep in mind that if 20 processes are
all trying to extend the relation at the same time, one will block on
IO (according to top/sar/vmstat etc.) and the other 19 will block on
that first one on a PostgreSQL heavy-weight lock. So all 20 of them
are effectively blocked on IO, but system monitoring tools won't know
that.

Also, the IO spikes will be transient, so any monitoring that
over-averages will not pick up on them.

or that ‘shared_buffers’ is to large - so we have reduced this
right down to 1G. In the previous threads there was an indication that the
underlying problem was a lock which I believe has been either removed or
much improved in 9.5 (see Lock scalability improvements), however we have
not seen any improvement in the relation extension locking problem that we
see. The version of 9.5 that we have tested is beta1. Any help in showing
us how to improve this would be greatly appreciated.

I don't believe any of the improvements made were to this area.

Your latency spikes seem to be happening at a 20 minute interval.
That would make me think they are lined up with end-of-checkpoint
fsync activity, except those should be happening every 5 minutes as
your conf has not changed checkpoint_timeout away from the default.
Since you have log_checkpoints on, what do you see in the log files
about how often they occur, and what the checkpoint write time, sync
time, etc. are?

Cheers,

Jeff

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Jim Nasby

Дата:

02 ноября 2015 г., 21:47:56

On 11/2/15 9:32 AM, Tom Dearman wrote:
> My system under load is using just over 500M of the shared_buffer at
> usage count >= 3.  Our system is very write heavy, with all of the big
> tables written to but not read from (at least during the load test run).
>   Although our db will grow (under load) to 100G in a few hours and keep
> growing, the data in shared_buffers - according to my observations above
> - seems low.

That query won't tell you much for this particular case. When you have a
workload that's creating tons of new pages that's going to very quickly
fill shared_buffers with a ton of dirty pages that need to be written
out. I suspect you're generating more dirty pages than the bgwriter can
keep up with. That means that the scan to find an available buffer that
I mentioned in my other email is going to become insanely expensive:
First it's going to have to run the clock sweep long enough to identify
a victim, then it's going to have to wait until that victim buffer is
flushed to disk. That flush to disk also can't happen until the WAL is
flushed, so it might have to wait on that as well.

Can you run perf or something equivalent? That would give us a much
better idea where all the time is actually being spent.

> dev8-0 to dev8-5 are the represent the disk where the WAL is.  dev8-16

Are you sure that's not dev8-0 and dev8-1? Because the patterns for your
bonnie++ tests show no activity on 8-2 through 8-5.

Also, keep in mind that for the question of how long you'll wait for the
extension lock bandwidth and throughput (and even TPS) don't matter:
it's all a question of how long it takes for disk requests to be
serviced (and unfortunately iostat really sucks at measuring that).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

От

Jeff Janes

Дата:

03 ноября 2015 г., 00:17:09

On Mon, Nov 2, 2015 at 7:32 AM, Tom Dearman <tom.dearman@gmail.com> wrote:
> Thanks for the prompt replies so far, I have done some more investigation to
> be able to clearly answer some of the question.
>
> The original shared-buffers was 8G and I have done another run on Friday
> using this old value instead of my more recent 1G limit.  There was no
> noticeable improvement.  I also installed the extension pg_buffercache and
> following some articles such as:
>
>  http://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/
>
> using:
>
> SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers
> FROM pg_class c
> INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
> INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname =
> current_database())
> WHERE usagecount >= 3;
>
>
> My system under load is using just over 500M of the shared_buffer at usage
> count >= 3.  Our system is very write heavy, with all of the big tables
> written to but not read from (at least during the load test run).  Although
> our db will grow (under load) to 100G in a few hours and keep growing, the
> data in shared_buffers - according to my observations above - seems low.  We
> have the WAL on a different disk from the main tables.

What fraction of that is dirty?

Is your data loading done by INSERT statements in a loop?  Or by COPY?
 INSERT in a loop will insert rows into a chosen buffer one by one,
increasing the usage_count each time.  As soon as the buffer is full,
it becomes completely cold in reality, but it is still very hot
according to the usage count.

> We have the
> following representative TPS for the disks (from SAR) when under our load
> test:
>
> 06:34:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
> await     svctm     %util
> 06:35:01 PM    dev8-0    176.15     25.89   8773.98     49.96      0.83
> 4.73      3.79     66.85
> 06:35:01 PM    dev8-1    174.74      0.13   8746.49     50.06      0.81
> 4.64      3.82     66.81
> 06:35:01 PM    dev8-2      0.00      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00
> 06:35:01 PM    dev8-5      1.41     25.76     27.49     37.74      0.02
> 14.96      2.31      0.33
> 06:35:01 PM   dev8-16     15.02      0.00   3994.82    265.95      1.27
> 84.88      2.76      4.14
> 06:35:01 PM   dev8-17     15.02      0.00   3994.82    265.95      1.27
> 84.88      2.76      4.14


I'm no expert in `sar -d`, but isn't 84.88 pretty high for await?


> As far as the checkpoint goes, it does happen every 5 minutes and takes
> about 4.5 mins which corresponds to the 0.9 checkpoint_completion_target we
> have set.

So, any idea what is happening at 20 minute intervals?

Cheers,

Jeff

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5

Вложения