Re: Checkpoints and slow queries

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Checkpoints and slow queries
Дата
Msg-id 535D88B6.5090002@fuzzy.cz
обсуждение исходный текст
Ответ на Checkpoints and slow queries  (Elanchezhiyan Elango <elanelango@gmail.com>)
Ответы Re: Checkpoints and slow queries  (Elanchezhiyan Elango <elanelango@gmail.com>)
Список pgsql-performance
On 27.4.2014 23:01, Elanchezhiyan Elango wrote:
> (I am resending this question after waiting for several hours because
> my previous mail got stalled probably because I didn't confirm my
> email address after subscribing. So resending the mail. Sorry if this
> is causing a double post.)
>
> *Problem description:*
> After a few days of running in my test environment, a query timed out
> (query timeout=4mins). Also in general the queries were taking a lot
> longer than expected. The workload in my database is a write intensive
> workload. And the writes happen in a burst every 5 minutes. There are a
> whole bunch of insert and update queries that run every 5 minutes. When
> I analyzed the situation (by enabling more postgres logs), I noticed
> that postgres checkpoints were triggering approximately every 5 minutes
> and based on my online research I suspected the i/o overhead of
> checkpoints was affecting the query performance. The checkpoint related
> settings were:
> checkpoint_segments = 30
> checkpoint_timeout = 15min
>
> I modified these settings to the following:
> checkpoint_segments = 250
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9

The problem is that while this makes the checkpoints less frequent, it
accumulates more changes that need to be written to disk during the
checkpoint. Which means the impact more severe.

The only case when this is not true is when repeatedly modifying a
subset of the data (say, a few data blocks), because the changes merge
into a single write during checkpoint.

> After I tweaked these settings, checkpoints were happening only once in
> an hour and that improved the query performance. However, when the
> checkpoint happens every hour, the query performance is still very poor.
> This is still undesirable to my system.

So, can you share a few of the checkpoint log messages? So that we get
an idea of how much data needs to be synced to disk.

> I also tried editing dirty_background_ratio and dirty_expire_centisecs
> in /etc/sysctl.conf. All dirty related kernel settings:
>
>># sysctl -a | grep dirty
>
> vm.dirty_background_ratio = 1
> vm.dirty_background_bytes = 0
> vm.dirty_ratio = 20
> vm.dirty_bytes = 0
> vm.dirty_writeback_centisecs = 500
> vm.dirty_expire_centisecs = 500
>
> This also didn't improve the situation.

Can you monitor the amount of dirty data in page cache, i.e. data that
needs to be written to disk? Wait for the checkpoint and sample the
/proc/meminfo a few times:

$ cat /proc/meminfo | grep Dirty

Also, watch "iostat -x -k 1" or something similar to see disk activity.

> My question is how to track down the reason for the poor performance
> during checkpoints and improve the query performance when the
> checkpoints happen?
>
>   * *EXPLAIN ANALYZE:*
>       o http://explain.depesz.com/s/BNva - An insert query inserting
>         just 129 rows takes 20 seconds.
>       o http://explain.depesz.com/s/5hA - An update query updating 43926
>         rows takes 55 seconds.
>   * *History:* It gets slower after a few days of the system running.
>
> *Table Metadata*:
>
>   * The tables get updated every 5 minutes. Utmost 50000 rows in a table
>     get updated every 5 minutes. About 50000 rows get inserted every 1 hour.
>   * There are 90 tables in the DB. 43 of these are updated every 5
>     minutes. 8/90 tables receive a high update traffic of 50000
>     updates/5mins. Remaining tables receive an update traffic of 2000
>     updates/5min. 43/90 tables are updated every 1 hour.

So how much data in total are we talking about?

> *PostgreSQL version: *PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu,
> compiled by gcc (GCC) 4.6.x-google 20120601 (prerelease), 64-bit
>
> *How you installed PostgreSQL: *Compiled from source and installed.
>
> *Changes made to the settings in the postgresql.conf file:*

Seems fine to me, except for the following changes:

>              name             |    current_setting     |        source
> ------------------------------+------------------------+----------------------
>  maintenance_work_mem         | 64MB                   | configuration file
>  temp_buffers                 | 256MB                  | configuration file
>  wal_buffers                  | 1MB                    | configuration file
>  work_mem                     | 128MB                  | configuration file

Any particular reasons for setting work_mem > maintenance_work_mem? Why
have you modified wal_buffer and temp_buffers?

I doubt these are related to the issues you're seeing, though.

>   * *Relevant Schema*: All tables referenced in this question have this
>     same schema
>
> managed_target_stats=> \d  stat_300_3_1
>
> Table "public.stat_300_40110_1"
>
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  ts     | integer |
>  target | bigint  |
>  port   | integer |
>  data   | real[]  |
>
> Indexes:
>     "unique_stat_300_40110_1" UNIQUE CONSTRAINT, btree (ts, target, port)
>     "idx_port_stat_300_40110_1" btree (port)
>     "idx_target_stat_300_40110_1" btree (target)
>     "idx_ts_stat_300_40110_1" btree (ts)

OK, so there are multiple tables, and you're updating 50k rows in all
tables in total? Can you post \dt+ and \di+ so that we get an idea of
table/index sizes?

>   * *Hardware*:
>       o CPU:  Intel(R) Xeon(R) CPU E5205  @ 1.86GHz
>       o Memory: 6GB
>       o Storage Details:
>
> There are 2 500GB disks (/dev/sda, /dev/sdb) with the following 6
> partitions on each disk.
>
> *Number  Start   End     Size    Type      File system     Flags*
>
>  1      512B    24.7MB  24.7MB  primary                   boot
>  2      24.7MB  6473MB  6449MB  primary   linux-swap(v1)
>  3      6473MB  40.8GB  34.4GB  primary   ext3
>  4      40.8GB  500GB   459GB   extended                  lba
>  5      40.8GB  408GB   367GB   logical   ext3
>  6      408GB   472GB   64.4GB  logical   ext3

The first problem here is ext3. It's behavior when performing fsync is
really terrible. See

  http://blog.2ndquadrant.com/linux_filesystems_and_postgres/

for more details. So, the first thing you should do is switching to ext4
or xfs.


> *Disk model and details:*
>
> Model Family:     Western Digital RE3 Serial ATA family

Regular 7.2k SATA disk, not the most powerful piece of hardware.


> The postgres data is stored on a software RAID10 on partition 5 of
> both these disks.

So essentially RAID0, as you only have 2 drives.

> * *Maintenance Setup*: autovacuum is running with default settings.
> Old records are deleted every night. I also do 'vacuum full' on a 12
> tables that receive large number of updates every night at 1AM. I
> have noticed that these 'vacuum full' also time out. (I am planning
> to post a separate question regarding my vacuuming strategy).

My bet is it's related. If the system is I/O bound, it's natural the
vacuum full are performing badly too.

>   * *WAL Configuration*: The WAL is in the same disk.

Which is not helping, because it interferes with the other I/O.

regards
Tomas


В списке pgsql-performance по дате отправления:

Предыдущее
От: Elanchezhiyan Elango
Дата:
Сообщение: Checkpoints and slow queries
Следующее
От: Elanchezhiyan Elango
Дата:
Сообщение: Re: Checkpoints and slow queries