Обсуждение: Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5
Вложения
On 28 Oct 2015, at 16:56, Igor Neyman <ineyman@perceptron.com> wrote:From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Dearman
Sent: Wednesday, October 28, 2015 11:44 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5We have a performance problem when our postgres is under high load. The CPU usage is very low, we have 48 cores for our postgres and the idle time averages at 90%. The problem is we get spikes in our transaction times which don’t appear with any obvious regularity and when we get the larger spikes, if I look in the postgres log we see that there is locking on 'process 41915 acquired ExclusiveLock on extension of relation 27177 of database 26192’. The actual relation changes one time it might be one table and another time another, though they are always big tables. I have looked at various previous threads and the only suggestions are either that the disk io is maxed out, which from our observations we don’t believe is the case for us, or that ‘shared_buffers’ is to large - so we have reduced this right down to 1G. In the previous threads there was an indication that the underlying problem was a lock which I believe has been either removed or much improved in 9.5 (see Lock scalability improvements), however we have not seen any improvement in the relation extension locking problem that we see. The version of 9.5 that we have tested is beta1. Any help in showing us how to improve this would be greatly appreciated.Do you know what “process 41915” is? And what it was doing to cause ExclusiveLock?Regards,Igor Neyman
On 10/28/15 12:11 PM, Tom Dearman wrote: > It is also interesting that a later attempt to get the exclusive lock by > process 41911 says it is waiting for id 41907 even though according to > the log other processes have already acquired the lock. Those would be different acquisitions of the same lock. It's important to understand that lock is an *internal* lock that is used to protect extending a file on disk. There are 20 places in code that attempt to acquire this lock (search for LockRelationForExtension), but the case in access/heap/hio.c is probably pretty representative. It gets the extension lock, finds a buffer to extend the relation into, locks the buffer that holds the existing tuple (for an UPDATE only), and then releases the extension lock. Looking through there, the obvious places for things to get hung up are either on extending the file on disk, or finding a place to put the new buffer in the buffer pool. The later is why there are recommendations on shrinking shared_buffers. The larger shared_buffers are, the longer it takes to scan them. But there's a flip-side: if you set shared_buffers too small, that means that the system is working very hard to keep ALL the buffers it has in shared_buffers, which makes it even more expensive to find a victim buffer. What was shared_buffers originally set too, and how large is this database? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Dearman
Sent: Wednesday, October 28, 2015 11:44 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Waiting on ExclusiveLock on extension 9.3, 9.4 and 9.5
We have a performance problem when our postgres is under high load. The CPU usage is very low, we have 48 cores for our postgres and the idle time averages at 90%. The problem is we get spikes in our transaction times which don’t appear with any obvious regularity and when we get the larger spikes, if I look in the postgres log we see that there is locking on 'process 41915 acquired ExclusiveLock on extension of relation 27177 of database 26192’. The actual relation changes one time it might be one table and another time another, though they are always big tables. I have looked at various previous threads and the only suggestions are either that the disk io is maxed out, which from our observations we don’t believe is the case for us, or that ‘shared_buffers’ is to large - so we have reduced this right down to 1G. In the previous threads there was an indication that the underlying problem was a lock which I believe has been either removed or much improved in 9.5 (see Lock scalability improvements), however we have not seen any improvement in the relation extension locking problem that we see. The version of 9.5 that we have tested is beta1. Any help in showing us how to improve this would be greatly appreciated.
Do you know what “process 41915” is? And what it was doing to cause ExclusiveLock?
Regards,
Igor Neyman
On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@gmail.com> wrote: > We have a performance problem when our postgres is under high load. The CPU > usage is very low, we have 48 cores for our postgres and the idle time > averages at 90%. The problem is we get spikes in our transaction times > which don’t appear with any obvious regularity and when we get the larger > spikes, if I look in the postgres log we see that there is locking on > 'process 41915 acquired ExclusiveLock on extension of relation 27177 of > database 26192’. The actual relation changes one time it might be one table > and another time another, though they are always big tables. I have looked > at various previous threads and the only suggestions are either that the > disk io is maxed out, which from our observations we don’t believe is the > case for us, What are those observations? Keep in mind that if 20 processes are all trying to extend the relation at the same time, one will block on IO (according to top/sar/vmstat etc.) and the other 19 will block on that first one on a PostgreSQL heavy-weight lock. So all 20 of them are effectively blocked on IO, but system monitoring tools won't know that. Also, the IO spikes will be transient, so any monitoring that over-averages will not pick up on them. > or that ‘shared_buffers’ is to large - so we have reduced this > right down to 1G. In the previous threads there was an indication that the > underlying problem was a lock which I believe has been either removed or > much improved in 9.5 (see Lock scalability improvements), however we have > not seen any improvement in the relation extension locking problem that we > see. The version of 9.5 that we have tested is beta1. Any help in showing > us how to improve this would be greatly appreciated. I don't believe any of the improvements made were to this area. Your latency spikes seem to be happening at a 20 minute interval. That would make me think they are lined up with end-of-checkpoint fsync activity, except those should be happening every 5 minutes as your conf has not changed checkpoint_timeout away from the default. Since you have log_checkpoints on, what do you see in the log files about how often they occur, and what the checkpoint write time, sync time, etc. are? Cheers, Jeff
On 28 Oct 2015, at 19:20, Jeff Janes <jeff.janes@gmail.com> wrote:On Wed, Oct 28, 2015 at 8:43 AM, Tom Dearman <tom.dearman@gmail.com> wrote:We have a performance problem when our postgres is under high load. The CPU
usage is very low, we have 48 cores for our postgres and the idle time
averages at 90%. The problem is we get spikes in our transaction times
which don’t appear with any obvious regularity and when we get the larger
spikes, if I look in the postgres log we see that there is locking on
'process 41915 acquired ExclusiveLock on extension of relation 27177 of
database 26192’. The actual relation changes one time it might be one table
and another time another, though they are always big tables. I have looked
at various previous threads and the only suggestions are either that the
disk io is maxed out, which from our observations we don’t believe is the
case for us,
What are those observations? Keep in mind that if 20 processes are
all trying to extend the relation at the same time, one will block on
IO (according to top/sar/vmstat etc.) and the other 19 will block on
that first one on a PostgreSQL heavy-weight lock. So all 20 of them
are effectively blocked on IO, but system monitoring tools won't know
that.
Also, the IO spikes will be transient, so any monitoring that
over-averages will not pick up on them.or that ‘shared_buffers’ is to large - so we have reduced this
right down to 1G. In the previous threads there was an indication that the
underlying problem was a lock which I believe has been either removed or
much improved in 9.5 (see Lock scalability improvements), however we have
not seen any improvement in the relation extension locking problem that we
see. The version of 9.5 that we have tested is beta1. Any help in showing
us how to improve this would be greatly appreciated.
I don't believe any of the improvements made were to this area.
Your latency spikes seem to be happening at a 20 minute interval.
That would make me think they are lined up with end-of-checkpoint
fsync activity, except those should be happening every 5 minutes as
your conf has not changed checkpoint_timeout away from the default.
Since you have log_checkpoints on, what do you see in the log files
about how often they occur, and what the checkpoint write time, sync
time, etc. are?
Cheers,
Jeff
On 11/2/15 9:32 AM, Tom Dearman wrote: > My system under load is using just over 500M of the shared_buffer at > usage count >= 3. Our system is very write heavy, with all of the big > tables written to but not read from (at least during the load test run). > Although our db will grow (under load) to 100G in a few hours and keep > growing, the data in shared_buffers - according to my observations above > - seems low. That query won't tell you much for this particular case. When you have a workload that's creating tons of new pages that's going to very quickly fill shared_buffers with a ton of dirty pages that need to be written out. I suspect you're generating more dirty pages than the bgwriter can keep up with. That means that the scan to find an available buffer that I mentioned in my other email is going to become insanely expensive: First it's going to have to run the clock sweep long enough to identify a victim, then it's going to have to wait until that victim buffer is flushed to disk. That flush to disk also can't happen until the WAL is flushed, so it might have to wait on that as well. Can you run perf or something equivalent? That would give us a much better idea where all the time is actually being spent. > dev8-0 to dev8-5 are the represent the disk where the WAL is. dev8-16 Are you sure that's not dev8-0 and dev8-1? Because the patterns for your bonnie++ tests show no activity on 8-2 through 8-5. Also, keep in mind that for the question of how long you'll wait for the extension lock bandwidth and throughput (and even TPS) don't matter: it's all a question of how long it takes for disk requests to be serviced (and unfortunately iostat really sucks at measuring that). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Nov 2, 2015 at 7:32 AM, Tom Dearman <tom.dearman@gmail.com> wrote: > Thanks for the prompt replies so far, I have done some more investigation to > be able to clearly answer some of the question. > > The original shared-buffers was 8G and I have done another run on Friday > using this old value instead of my more recent 1G limit. There was no > noticeable improvement. I also installed the extension pg_buffercache and > following some articles such as: > > http://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/ > > using: > > SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers > FROM pg_class c > INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode > INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = > current_database()) > WHERE usagecount >= 3; > > > My system under load is using just over 500M of the shared_buffer at usage > count >= 3. Our system is very write heavy, with all of the big tables > written to but not read from (at least during the load test run). Although > our db will grow (under load) to 100G in a few hours and keep growing, the > data in shared_buffers - according to my observations above - seems low. We > have the WAL on a different disk from the main tables. What fraction of that is dirty? Is your data loading done by INSERT statements in a loop? Or by COPY? INSERT in a loop will insert rows into a chosen buffer one by one, increasing the usage_count each time. As soon as the buffer is full, it becomes completely cold in reality, but it is still very hot according to the usage count. > We have the > following representative TPS for the disks (from SAR) when under our load > test: > > 06:34:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz > await svctm %util > 06:35:01 PM dev8-0 176.15 25.89 8773.98 49.96 0.83 > 4.73 3.79 66.85 > 06:35:01 PM dev8-1 174.74 0.13 8746.49 50.06 0.81 > 4.64 3.82 66.81 > 06:35:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 06:35:01 PM dev8-5 1.41 25.76 27.49 37.74 0.02 > 14.96 2.31 0.33 > 06:35:01 PM dev8-16 15.02 0.00 3994.82 265.95 1.27 > 84.88 2.76 4.14 > 06:35:01 PM dev8-17 15.02 0.00 3994.82 265.95 1.27 > 84.88 2.76 4.14 I'm no expert in `sar -d`, but isn't 84.88 pretty high for await? > As far as the checkpoint goes, it does happen every 5 minutes and takes > about 4.5 mins which corresponds to the 0.9 checkpoint_completion_target we > have set. So, any idea what is happening at 20 minute intervals? Cheers, Jeff