Обсуждение: Performances issues with SSD volume ?

Поиск
Список
Период
Сортировка

Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Hi everyone,

Due to increasing load of my application, I recently changed my
production server from a server with SAS 15k HDDs to a server with intel
SSD disks (only for the postgresql data partition).
It is RAID 10 configuration in both servers.
The rest of the hardware is approximatively the same (maybe a little
more recent on the new server).
I did not changed the postgresql.conf

In a first time, I plugged the new server as a hot standby slave, during
weeks. (I have a master/ slave configuration)

As everything looks good in terms of performances in this slave role,
(latency on /data partition decreased from ~60ms to ~3ms ; and
utilisation of the device decreased significantly too), I decided to
change server roles, and promote as master the server with SSD drives.
(I did several benchmark tests too on both configuration, and SSD config
was always better than old HDD config.)

After the change, I had the following behavior, and I don't understand
why : everything seems to work fine (load is ~6/7, when it was
previously ~25/30 on the HDD server), so the SSD server is faster than
the HDD one, and my apps run faster too, but after some time (can be 5
minutes or 2 hours), the load average increases suddently (can reach 150
!) and does not decrease, so postgres and my application are almost
unusable. (even small requests are in statement timeout)

The only way to decrease load is to restart pooler (pgpool) so that all
requests are stopped, and then the postgres is accessible again, during
some time...

I tried to adapt postgresql.conf parameters to SSD config (random page
cost=1 ), but this does not changed the problem.

When I look IO disk, the RAID volume is 100% busy, which is weird
because if was around 20/30% just before.
Transactions per second is at this moment very low (<10)

I think of a cache issue, but I can't explain why data load should be
longer on SSDs than on HDDs, and I don't know how to find the origin of
the problem.


Does anyone has an idea :) ?

Sorry for my poor english
thanks,
Thomas


Performances issues with SSD volume ?

От
Thomas SIMON
Дата:

Hi everyone,

Due to increasing load of my application, I recently changed my
production server from a server with SAS 15k HDDs to a server with intel
SSD disks (only for the postgresql data partition).
It is RAID 10 configuration in both servers.
The rest of the hardware is approximatively the same (maybe a little
more recent on the new server).
I did not changed the postgresql.conf

In a first time, I plugged the new server as a hot standby slave, during
weeks. (I have a master/ slave configuration)

As everything looks good in terms of performances in this slave role,
(latency on /data partition decreased from ~60ms to ~3ms ; and
utilisation of the device decreased significantly too), I decided to
change server roles, and promote as master the server with SSD drives.
(I did several benchmark tests too on both configuration, and SSD config
was always better than old HDD config.)

After the change, I had the following behavior, and I don't understand
why : everything seems to work fine (load is ~6/7, when it was
previously ~25/30 on the HDD server), so the SSD server is faster than
the HDD one, and my apps run faster too, but after some time (can be 5
minutes or 2 hours), the load average increases suddently (can reach 150
!) and does not decrease, so postgres and my application are almost
unusable. (even small requests are in statement timeout)

The only way to decrease load is to restart pooler (pgpool) so that all
requests are stopped, and then the postgres is accessible again, during
some time...

I tried to adapt postgresql.conf parameters to SSD config (random page
cost=1 ), but this does not changed the problem.

When I look IO disk, the RAID volume is 100% busy, which is weird
because if was around 20/30% just before.
Transactions per second is at this moment very low (<10)

I think of a cache issue, but I can't explain why data load should be
longer on SSDs than on HDDs, and I don't know how to find the origin of
the problem.


Does anyone has an idea :) ?

Sorry for my poor english
thanks,
Thomas





Re: Performances issues with SSD volume ?

От
Matheus de Oliveira
Дата:

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Hi matheus,
thanks for reply.

Kernel is slightly different ; with a 3.10.23-xxxx-std-ipv6-64 kernel (OVH kernel) on HDD server, and  3.2.65-1+deb7u1 kernel on SSD server.
I'm using debian wheezy on both servers.

I didn't changed the 2 parameters you talk about.

Thanks,
Thomas
Le 18/05/2015 15:13, Matheus de Oliveira a écrit :

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: Performances issues with SSD volume ?

От
Koray Eyidoğan
Дата:

On Mon, May 18, 2015 at 4:35 PM, Thomas SIMON <tsimon@neteven.com> wrote:
Hi matheus,
thanks for reply.

Kernel is slightly different ; with a 3.10.23-xxxx-std-ipv6-64 kernel (OVH kernel) on HDD server, and  3.2.65-1+deb7u1 kernel on SSD server.
I'm using debian wheezy on both servers.

I didn't changed the 2 parameters you talk about.

Thanks,
Thomas
Le 18/05/2015 15:13, Matheus de Oliveira a écrit :

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres



Re: Performances issues with SSD volume ?

От
Wei Shan
Дата:

Hi Thomas,

This is just a wild guess. Are you using hardware RAID? Is the cache battery still working?

Cheers

On 18 May 2015 9:36 pm, "Thomas SIMON" <tsimon@neteven.com> wrote:
Hi matheus,
thanks for reply.

Kernel is slightly different ; with a 3.10.23-xxxx-std-ipv6-64 kernel (OVH kernel) on HDD server, and  3.2.65-1+deb7u1 kernel on SSD server.
I'm using debian wheezy on both servers.

I didn't changed the 2 parameters you talk about.

Thanks,
Thomas
Le 18/05/2015 15:13, Matheus de Oliveira a écrit :

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Wow, thanks a lot for advice Koray!
This is a huge difference indeed !
I will start with upgrading my kernel asap :)
Thomas
Le 18/05/2015 15:51, Koray Eyidoğan a écrit :

On Mon, May 18, 2015 at 4:35 PM, Thomas SIMON <tsimon@neteven.com> wrote:
Hi matheus,
thanks for reply.

Kernel is slightly different ; with a 3.10.23-xxxx-std-ipv6-64 kernel (OVH kernel) on HDD server, and  3.2.65-1+deb7u1 kernel on SSD server.
I'm using debian wheezy on both servers.

I didn't changed the 2 parameters you talk about.

Thanks,
Thomas
Le 18/05/2015 15:13, Matheus de Oliveira a écrit :

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres




Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Hi Wei,
Yes i'm using RAID (megacli card), battery is ok, i had already checked BBU :)

Cheers
Thomas
Le 18/05/2015 15:52, Wei Shan a écrit :

Hi Thomas,

This is just a wild guess. Are you using hardware RAID? Is the cache battery still working?

Cheers

On 18 May 2015 9:36 pm, "Thomas SIMON" <tsimon@neteven.com> wrote:
Hi matheus,
thanks for reply.

Kernel is slightly different ; with a 3.10.23-xxxx-std-ipv6-64 kernel (OVH kernel) on HDD server, and  3.2.65-1+deb7u1 kernel on SSD server.
I'm using debian wheezy on both servers.

I didn't changed the 2 parameters you talk about.

Thanks,
Thomas
Le 18/05/2015 15:13, Matheus de Oliveira a écrit :

On Mon, May 18, 2015 at 10:05 AM, Thomas SIMON <tsimon@neteven.com> wrote:
After the change, I had the following behavior, and I don't understand why : everything seems to work fine (load is ~6/7, when it was previously ~25/30 on the HDD server), so the SSD server is faster than the HDD one, and my apps run faster too, but after some time (can be 5 minutes or 2 hours), the load average increases suddently (can reach 150 !) and does not decrease, so postgres and my application are almost unusable. (even small requests are in statement timeout)

Before we go mad, what is the OS/kernel/distro that you are using? Did it change from the previous one?

If Linux, have you disabled transparent_huge_pages and zone_reclaim_mode (two known problematic Linux features)?

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres



Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
> From: Koray Eyidoğan <korayey@gmail.com>
>To: Thomas SIMON <tsimon@neteven.com>
>Cc: pgsql-admin@postgresql.org
>Sent: Monday, 18 May 2015, 14:51
>Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
>
>Hi Thomas,
>
>
>3.2 kernel may be #1 cause of your I/O load problem:
>
>
>http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>

>https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>
>
>
>Have a nice day.
>
>
>Koray


Likely 3.2 kernel isn't helping, but I think we need much more information before jumping to conclusions.

You say you're IO bound, so some output from sar / iostat / dstat and pg_stat_activity etc before and during the issue
wouldbe of use. 

Also:
    System memory size

    SSD Model numbers and how many
    Raid controller
    Partition allignments and stripe sizes
    Kernel options
    Filesystem used and mount options
    IO Scheduler
    Postgresql version and configuration
    Connection pool sizing


Perhaps you could thow us the output of some of these:

    fdisk -l
    cat /etc/mtab
    cat /sys/block/<ssd device>/queue/scheduler
    sysctl kernel | grep sched
    sysctl vm
    select * from pg_stat_activity
    select name, setting from pg_settings
    lspci | grep -E 'RAID|SCSI|IDE|SATA'


Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Hi Glyn,

I'll try to answer this points.

I've made some benchs, and indeed 3.2 not helping. not helping at all.
I changed to 3.14 and gap is quite big !
With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
conditions
With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
conditions too.

It should so be better, but when server was in production, and ever with
bad kernel, performances was already quite good before they quickly
decreased.
So i think too I have another configuration problem.



You say you're IO bound, so some output from sar / iostat / dstat and pg_stat_activity etc before and during the issue
wouldbe of use. 

-> My server is not in production right now, so it is difficult to
replay production load and have some useful metrics.
The best way I've found is to replay trafic from logs with pgreplay.
I hoped that the server falls back by replaying this traffic, but it
never happens ... Another thing I can't understand ...

Below is my dstat output when I replay this traffic (and so when server
runs normally)
I have unfortunately no more outputs when server's performances decreased.

[17:31:12]root@db10:~$ dstat -alrmy
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
---load-avg--- --io/total- ------memory-usage----- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int csw |
1m   5m  15m | read  writ| used  buff  cach  free| int   csw
   1   1  98   0   0   0|4870k   16M|   0     0 |   0     0 |4666
35k|4.68 1.48 0.56| 419   774 |6204M 47.1M 70.9G 175G|4666    35k
  10   4  81   5   0   0| 177M  368k| 510B 9674B|   0     0 | 32k
81k|5.11 1.62 0.61|14.7k 54.0 |6232M 47.1M 71.1G  175G| 32k   81k
   3   2  92   3   0   0|  81M 7568k| 224B  738B|   0     0 | 12k
28k|5.11 1.62 0.61|8687   822 |6233M 47.1M 71.1G  175G| 12k   28k
   4   2  92   2   0   0|  87M 7844k|1046B   10k|   0     0 | 12k
25k|5.11 1.62 0.61|8031   837 |6235M 47.1M 71.2G  175G| 12k   25k
  14   5  78   3   0   0| 155M 2972k| 556B 1084B|   0     0 | 28k
69k|5.11 1.62 0.61|12.7k  176 |6645M 47.1M 71.4G  174G| 28k   70k
  19   4  73   4   0   0| 172M 3012k|1082B 9182B|   0     0 | 33k
73k|5.11 1.62 0.61|12.5k  248 |6727M 47.1M 71.5G  174G| 33k   72k
  23   5  66   6   0   0| 247M   20M| 252B  800B|   0     0 | 42k
88k|5.42 1.75 0.65|20.1k  426 |6827M 47.1M 71.8G  173G| 42k   88k
  16   3  78   4   0   0| 119M 1384k| 744B 7680B|   0     0 | 28k
47k|5.42 1.75 0.65|11.3k  113 |6883M 47.1M 71.9G  173G| 28k   47k
   5   1  92   2   0   0|  55M 5952k|6960B 8210B|   0     0 | 16k
25k|5.42 1.75 0.65|6304   676 |6890M 47.1M 72.0G  173G| 16k   25k
   5   2  91   2   0   1|  56M  568k|9593B   13k|   0     0 | 16k
26k|5.42 1.75 0.65|6383  10.0 |6892M 47.1M 72.0G  173G| 16k   26k
   5   1  92   2   0   0|  51M    0 | 546B  888B|   0     0 | 14k
24k|5.42 1.75 0.65|6091     0 |6891M 47.1M 72.1G  173G| 14k   24k
  21   5  71   4   0   0| 110M 1416k| 384B 6874B|   0     0 | 31k
58k|6.18 1.97 0.73|11.1k  203 |7023M 47.1M 72.2G  173G| 31k   58k
  19   4  74   2   0   0| 110M 2364k| 312B  846B|   0     0 | 23k
44k|6.18 1.97 0.73|9399   221 |7055M 47.2M 72.3G  173G| 22k   44k
  26   6  65   3   0   0| 110M 6040k|1770B 7068B|   0     0 | 31k
59k|6.18 1.97 0.73|10.5k  781 |7255M 47.2M 72.4G  172G| 31k   59k
  23   5  69   3   0   0| 128M 2912k|1146B 1350B|   0     0 | 27k
52k|6.18 1.97 0.73|11.2k  240 |7314M 47.2M 72.5G  172G| 27k   52k


Other things you asked

     System memory size : 256 Go
     SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
INTEL SSDSC2BB480G4
     Raid controller : MegaRAID SAS 2208
     Partition alignments and stripe sizes : see fdisk delow
     Kernel options : the config file is here :
ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
     Filesystem used and mount options : ext4, see mtab below
     IO Scheduler : noop [deadline] cfq for my ssd raid volume
     Postgresql version and configuration : 9.3.5

max_connections=1800
shared_buffers=8GB
temp_buffers=32MB
work_mem=100MB
maintenance_work_mem=12GB
bgwriter_lru_maxpages=200
effective_io_concurrency=4
wal_level=hot_standby
wal_sync_method=fdatasync
wal_writer_delay=2000ms
commit_delay=1000
checkpoint_segments=80
checkpoint_timeout=15min
checkpoint_completion_target=0.7
archive_command='rsync ....'
max_wal_senders=10
wal_keep_segments=38600
vacuum_defer_cleanup_age=100
hot_standby = on
max_standby_archive_delay = 5min
max_standby_streaming_delay = 5min
hot_standby_feedback = on
random_page_cost = 1.0
effective_cache_size = 240GB
log_min_error_statement = warning
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_line_prefix = '%m|%u|%d|%c|'
log_lock_waits = on
log_statement = 'all'
log_timezone = 'localtime'
track_activities = on
track_functions = pl
track_activity_query_size = 8192
autovacuum_max_workers = 5
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 40
autovacuum_analyze_threshold = 20
autovacuum_vacuum_scale_factor = 0.10
autovacuum_analyze_scale_factor = 0.10
autovacuum_vacuum_cost_delay = 5ms
default_transaction_isolation = 'read committed'
max_locks_per_transaction = 128



     Connection pool sizing (pgpool2)
num_init_children = 1790
max_pool = 1

I also add megacli parameters :

Virtual Drive: 2 (Target Id: 2)
Name                :datassd
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 893.25 GB
Sector Size         : 512
Is VD emulated      : Yes
Mirror Data         : 893.25 GB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives per span:2
Span Depth          : 2
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No


Other outputs :
     fdisk -l

Disk /dev/sdc: 959.1 GB, 959119884288 bytes
255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000



     cat /etc/mtab
/dev/mapper/vg_datassd-lv_datassd /datassd ext4
rw,relatime,discard,nobarrier,data=ordered 0 0
(I added nobarrier option)


     cat /sys/block/sdc/queue/scheduler
noop [deadline] cfq


     sysctl kernel | grep sched
kernel.sched_child_runs_first = 0
kernel.sched_rr_timeslice_ms = 25
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000

I've read some advices about changing kernel.sched_autogroup_enabled=0
and kernel.sched_migration_cost_ns=5000000, but this parameters are not
recognized by my kernel. So I don't know what to do with that...

     sysctl vm
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 8388608
vm.dirty_background_ratio = 0
vm.dirty_bytes = 67108864
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 3
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256    256    32
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 65008
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.scan_unevictable_pages = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0


     select * from pg_stat_activity
I've got hundred of entries for that when i'm in production, and I can't
paste it here due to confidentiality.
By day, it is around 50 millions queries usually. (35% selects ; 55%
updates & 5% inserts)


     lspci | grep -E 'RAID|SCSI|IDE|SATA'
00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset
6-Port SATA AHCI Controller (rev 06)
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208
[Thunderbolt] (rev 05)
07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset
4-Port SATA Storage Control Unit (rev 06)

Thanks

Thomas

Le 18/05/2015 16:29, Glyn Astill a écrit :
>> From: Koray Eyidoğan <korayey@gmail.com>
>> To: Thomas SIMON <tsimon@neteven.com>
>> Cc: pgsql-admin@postgresql.org
>> Sent: Monday, 18 May 2015, 14:51
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>>
>> Hi Thomas,
>>
>>
>> 3.2 kernel may be #1 cause of your I/O load problem:
>>
>>
>> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>>
>>
https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>>
>>
>>
>> Have a nice day.
>>
>>
>> Koray
>
> Likely 3.2 kernel isn't helping, but I think we need much more information before jumping to conclusions.
>
> You say you're IO bound, so some output from sar / iostat / dstat and pg_stat_activity etc before and during the
issuewould be of use. 
>
> Also:
>      System memory size
>
>      SSD Model numbers and how many
>      Raid controller
>      Partition allignments and stripe sizes
>      Kernel options
>      Filesystem used and mount options
>      IO Scheduler
>      Postgresql version and configuration
>      Connection pool sizing
>
>
> Perhaps you could thow us the output of some of these:
>
>      fdisk -l
>      cat /etc/mtab
>      cat /sys/block/<ssd device>/queue/scheduler
>      sysctl kernel | grep sched
>      sysctl vm
>      select * from pg_stat_activity
>      select name, setting from pg_settings
>      lspci | grep -E 'RAID|SCSI|IDE|SATA'



Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:

> From: Thomas SIMON <tsimon@neteven.com>
> To: glynastill@yahoo.co.uk
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Wednesday, 20 May 2015, 16:41
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
> Hi Glyn,
>
> I'll try to answer this points.
>
> I've made some benchs, and indeed 3.2 not helping. not helping at all.
> I changed to 3.14 and gap is quite big !
> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
> conditions
> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
> conditions too.

>


That's a start then.

> It should so be better, but when server was in production, and ever with
> bad kernel, performances was already quite good before they quickly
> decreased.
> So i think too I have another configuration problem.
>
> You say you're IO bound, so some output from sar / iostat / dstat and
> pg_stat_activity etc before and during the issue would be of use.
>
> -> My server is not in production right now, so it is difficult to
> replay production load and have some useful metrics.
> The best way I've found is to replay trafic from logs with pgreplay.
> I hoped that the server falls back by replaying this traffic, but it
> never happens ... Another thing I can't understand ...
>
> Below is my dstat output when I replay this traffic (and so when server
> runs normally)
> I have unfortunately no more outputs when server's performances decreased.

>

It's a shame we can't get any insight into activity on the server during the issues.
>
>
> Other things you asked
>
>      System memory size : 256 Go
>      SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
> INTEL SSDSC2BB480G4
>      Raid controller : MegaRAID SAS 2208
>      Partition alignments and stripe sizes : see fdisk delow
>      Kernel options : the config file is here :
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>      Filesystem used and mount options : ext4, see mtab below
>      IO Scheduler : noop [deadline] cfq for my ssd raid volume
>      Postgresql version and configuration : 9.3.5
>
> max_connections=1800
> shared_buffers=8GB
> temp_buffers=32MB
> work_mem=100MB
> maintenance_work_mem=12GB
> bgwriter_lru_maxpages=200
> effective_io_concurrency=4
> wal_level=hot_standby
> wal_sync_method=fdatasync
> wal_writer_delay=2000ms
> commit_delay=1000
> checkpoint_segments=80
> checkpoint_timeout=15min
> checkpoint_completion_target=0.7
> archive_command='rsync ....'
> max_wal_senders=10
> wal_keep_segments=38600
> vacuum_defer_cleanup_age=100
> hot_standby = on
> max_standby_archive_delay = 5min
> max_standby_streaming_delay = 5min
> hot_standby_feedback = on
> random_page_cost = 1.0
> effective_cache_size = 240GB
> log_min_error_statement = warning
> log_min_duration_statement = 0
> log_checkpoints = on
> log_connections = on
> log_disconnections = on
> log_line_prefix = '%m|%u|%d|%c|'
> log_lock_waits = on
> log_statement = 'all'
> log_timezone = 'localtime'
> track_activities = on
> track_functions = pl
> track_activity_query_size = 8192
> autovacuum_max_workers = 5
> autovacuum_naptime = 30s
> autovacuum_vacuum_threshold = 40
> autovacuum_analyze_threshold = 20
> autovacuum_vacuum_scale_factor = 0.10
> autovacuum_analyze_scale_factor = 0.10
> autovacuum_vacuum_cost_delay = 5ms
> default_transaction_isolation = 'read committed'
> max_locks_per_transaction = 128
>
>
>
>      Connection pool sizing (pgpool2)
> num_init_children = 1790
> max_pool = 1

>


1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as I
recall,my memory is a little fuzzy on pgpool now).  Unless your app is stateful in some way or has unique users for
eachof those 1800 connections you should lower the quantity of active connections.  A general starting point is usually
cpucores * 2, so you could up max_pool and divide num_init_children by the same amount. 

Hard to say what you need to do without knowing what exactly you're doing though.  What's the nature of the app(s)?

> I also add megacli parameters :
>
> Virtual Drive: 2 (Target Id: 2)
> Name                :datassd
> RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
> Size                : 893.25 GB
> Sector Size         : 512
> Is VD emulated      : Yes
> Mirror Data         : 893.25 GB
> State               : Optimal
> Strip Size          : 256 KB
> Number Of Drives per span:2
> Span Depth          : 2
> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
> Cache if Bad BBU
> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
> Cache if Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy   : Enabled
> Encryption Type     : None
> Bad Blocks Exist: No
> PI type: No PI
>
> Is VD Cached: No

>


Not using your raid controllers write cache then?  Not sure just how important that is with SSDs these days, but if
you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. 


>
> Other outputs :
>      fdisk -l
>
> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
>
> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
>
>
>      cat /etc/mtab
> /dev/mapper/vg_datassd-lv_datassd /datassd ext4
> rw,relatime,discard,nobarrier,data=ordered 0 0
> (I added nobarrier option)
>
>
>      cat /sys/block/sdc/queue/scheduler
> noop [deadline] cfq
>

>


You could swap relatime for noatime,nodiratime.

>      sysctl kernel | grep sched
> kernel.sched_child_runs_first = 0
> kernel.sched_rr_timeslice_ms = 25
> kernel.sched_rt_period_us = 1000000
> kernel.sched_rt_runtime_us = 950000
>
> I've read some advices about changing kernel.sched_autogroup_enabled=0
> and kernel.sched_migration_cost_ns=5000000, but this parameters are not
> recognized by my kernel. So I don't know what to do with that...

>

sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why
sched_autogroup_enabled wouldn't be recognized though.


>      sysctl vm
> vm.admin_reserve_kbytes = 8192
> vm.block_dump = 0
> vm.dirty_background_bytes = 8388608
> vm.dirty_background_ratio = 0
> vm.dirty_bytes = 67108864
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 0
> vm.dirty_writeback_centisecs = 500
> vm.drop_caches = 3
> vm.laptop_mode = 0
> vm.legacy_va_layout = 0
> vm.lowmem_reserve_ratio = 256    256    32
> vm.max_map_count = 65530
> vm.memory_failure_early_kill = 0
> vm.memory_failure_recovery = 1
> vm.min_free_kbytes = 65008
> vm.min_slab_ratio = 5
> vm.min_unmapped_ratio = 1
> vm.mmap_min_addr = 4096
> vm.nr_pdflush_threads = 0
> vm.numa_zonelist_order = default
> vm.oom_dump_tasks = 1
> vm.oom_kill_allocating_task = 0
> vm.overcommit_kbytes = 0
> vm.overcommit_memory = 2
> vm.overcommit_ratio = 50
> vm.page-cluster = 3
> vm.panic_on_oom = 0
> vm.percpu_pagelist_fraction = 0
> vm.scan_unevictable_pages = 0
> vm.stat_interval = 1
> vm.swappiness = 60
> vm.user_reserve_kbytes = 131072
> vm.vfs_cache_pressure = 100
> vm.zone_reclaim_mode = 0
>
>
>      select * from pg_stat_activity
> I've got hundred of entries for that when i'm in production, and I
> can't
> paste it here due to confidentiality.
> By day, it is around 50 millions queries usually. (35% selects ; 55%
> updates & 5% inserts)
>
>
>      lspci | grep -E 'RAID|SCSI|IDE|SATA'
> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset
> 6-Port SATA AHCI Controller (rev 06)
> 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208
> [Thunderbolt] (rev 05)
> 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset
> 4-Port SATA Storage Control Unit (rev 06)
>
> Thanks
>
> Thomas
>
>
> Le 18/05/2015 16:29, Glyn Astill a écrit :
>>>  From: Koray Eyidoğan <korayey@gmail.com>
>>>  To: Thomas SIMON <tsimon@neteven.com>
>>>  Cc: pgsql-admin@postgresql.org
>>>  Sent: Monday, 18 May 2015, 14:51
>>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>
>>>
>>>  Hi Thomas,
>>>
>>>
>>>  3.2 kernel may be #1 cause of your I/O load problem:
>>>
>>>
>>>
> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>>>
>>>
>
https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>>>
>>>
>>>
>>>  Have a nice day.
>>>
>>>
>>>  Koray
>>
>>  Likely 3.2 kernel isn't helping, but I think we need much more
> information before jumping to conclusions.
>>
>>  You say you're IO bound, so some output from sar / iostat / dstat and
> pg_stat_activity etc before and during the issue would be of use.
>>
>>  Also:
>>       System memory size
>>
>>       SSD Model numbers and how many
>>       Raid controller
>>       Partition allignments and stripe sizes
>>       Kernel options
>>       Filesystem used and mount options
>>       IO Scheduler
>>       Postgresql version and configuration
>>       Connection pool sizing
>>
>>
>>  Perhaps you could thow us the output of some of these:
>>
>>       fdisk -l
>>       cat /etc/mtab
>>       cat /sys/block/<ssd device>/queue/scheduler
>>       sysctl kernel | grep sched
>>       sysctl vm
>>       select * from pg_stat_activity
>>       select name, setting from pg_settings
>>       lspci | grep -E 'RAID|SCSI|IDE|SATA'
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>


Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
----- Original Message -----
> From: Glyn Astill <glynastill@yahoo.co.uk>
> To: Thomas SIMON <tsimon@neteven.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Wednesday, 20 May 2015, 17:50
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
>
>>  From: Thomas SIMON <tsimon@neteven.com>
>>  To: glynastill@yahoo.co.uk
>>  Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>>  Sent: Wednesday, 20 May 2015, 16:41
>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>  Hi Glyn,
>>
>>  I'll try to answer this points.
>>
>>  I've made some benchs, and indeed 3.2 not helping. not helping at all.
>>  I changed to 3.14 and gap is quite big !
>>  With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>>  conditions
>>  With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>>  conditions too.
>
>>
>
>
> That's a start then.
>
>>  It should so be better, but when server was in production, and ever with
>>  bad kernel, performances was already quite good before they quickly
>>  decreased.
>>  So i think too I have another configuration problem.
>>
>>  You say you're IO bound, so some output from sar / iostat / dstat and
>>  pg_stat_activity etc before and during the issue would be of use.
>>
>>  -> My server is not in production right now, so it is difficult to
>>  replay production load and have some useful metrics.
>>  The best way I've found is to replay trafic from logs with pgreplay.
>>  I hoped that the server falls back by replaying this traffic, but it
>>  never happens ... Another thing I can't understand ...
>>
>>  Below is my dstat output when I replay this traffic (and so when server
>>  runs normally)
>>  I have unfortunately no more outputs when server's performances
> decreased.
>
>>
>
> It's a shame we can't get any insight into activity on the server during
> the issues.
>>
>>
>>  Other things you asked
>>
>>       System memory size : 256 Go
>>       SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>  INTEL SSDSC2BB480G4
>>       Raid controller : MegaRAID SAS 2208
>>       Partition alignments and stripe sizes : see fdisk delow
>>       Kernel options : the config file is here :
>>
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>       Filesystem used and mount options : ext4, see mtab below
>>       IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>       Postgresql version and configuration : 9.3.5
>>
>>  max_connections=1800
>>  shared_buffers=8GB
>>  temp_buffers=32MB
>>  work_mem=100MB
>>  maintenance_work_mem=12GB
>>  bgwriter_lru_maxpages=200
>>  effective_io_concurrency=4
>>  wal_level=hot_standby
>>  wal_sync_method=fdatasync
>>  wal_writer_delay=2000ms
>>  commit_delay=1000
>>  checkpoint_segments=80
>>  checkpoint_timeout=15min
>>  checkpoint_completion_target=0.7
>>  archive_command='rsync ....'
>>  max_wal_senders=10
>>  wal_keep_segments=38600
>>  vacuum_defer_cleanup_age=100
>>  hot_standby = on
>>  max_standby_archive_delay = 5min
>>  max_standby_streaming_delay = 5min
>>  hot_standby_feedback = on
>>  random_page_cost = 1.0
>>  effective_cache_size = 240GB
>>  log_min_error_statement = warning
>>  log_min_duration_statement = 0
>>  log_checkpoints = on
>>  log_connections = on
>>  log_disconnections = on
>>  log_line_prefix = '%m|%u|%d|%c|'
>>  log_lock_waits = on
>>  log_statement = 'all'
>>  log_timezone = 'localtime'
>>  track_activities = on
>>  track_functions = pl
>>  track_activity_query_size = 8192
>>  autovacuum_max_workers = 5
>>  autovacuum_naptime = 30s
>>  autovacuum_vacuum_threshold = 40
>>  autovacuum_analyze_threshold = 20
>>  autovacuum_vacuum_scale_factor = 0.10
>>  autovacuum_analyze_scale_factor = 0.10
>>  autovacuum_vacuum_cost_delay = 5ms
>>  default_transaction_isolation = 'read committed'
>>  max_locks_per_transaction = 128
>>
>>
>>
>>       Connection pool sizing (pgpool2)
>>  num_init_children = 1790
>>  max_pool = 1
>
>>
>
>
> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
> effectively just using pgpool as a proxy (as I recall, my memory is a little
> fuzzy on pgpool now).  Unless your app is stateful in some way or has unique
> users for each of those 1800 connections you should lower the quantity of active
> connections.  A general starting point is usually cpu cores * 2, so you could up
> max_pool and divide num_init_children by the same amount.
>
> Hard to say what you need to do without knowing what exactly you're doing
> though.  What's the nature of the app(s)?
>
>>  I also add megacli parameters :
>>
>>  Virtual Drive: 2 (Target Id: 2)
>>  Name                :datassd
>>  RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
>>  Size                : 893.25 GB
>>  Sector Size         : 512
>>  Is VD emulated      : Yes
>>  Mirror Data         : 893.25 GB
>>  State               : Optimal
>>  Strip Size          : 256 KB
>>  Number Of Drives per span:2
>>  Span Depth          : 2
>>  Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>  Cache if Bad BBU
>>  Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>  Cache if Bad BBU
>>  Default Access Policy: Read/Write
>>  Current Access Policy: Read/Write
>>  Disk Cache Policy   : Enabled
>>  Encryption Type     : None
>>  Bad Blocks Exist: No
>>  PI type: No PI
>>
>>  Is VD Cached: No
>
>>
>
>
> Not using your raid controllers write cache then?  Not sure just how important
> that is with SSDs these days, but if you've got a BBU set it to
> "WriteBack". Also change "Cache if Bad BBU" to "No
> Write Cache if Bad BBU" if you do that.
>
>
>>
>>  Other outputs :
>>       fdisk -l
>>
>>  Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>>  255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>>  Units = sectors of 1 * 512 = 512 bytes
>>  Sector size (logical/physical): 512 bytes / 4096 bytes
>>  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>  Disk identifier: 0x00000000
>>
>>  Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>>  255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>>  Units = sectors of 1 * 512 = 512 bytes
>>  Sector size (logical/physical): 512 bytes / 4096 bytes
>>  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>  Disk identifier: 0x00000000
>>
>>
>>       cat /etc/mtab
>>  /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>>  rw,relatime,discard,nobarrier,data=ordered 0 0
>>  (I added nobarrier option)
>>
>>
>>       cat /sys/block/sdc/queue/scheduler
>>  noop [deadline] cfq
>>
>
>>
>
>
> You could swap relatime for noatime,nodiratime.
>


You could also see if the noop scheduler makes any improvement.

Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Le 20/05/2015 18:50, Glyn Astill a écrit :
>
>> From: Thomas SIMON <tsimon@neteven.com>
>> To: glynastill@yahoo.co.uk
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Wednesday, 20 May 2015, 16:41
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>> Hi Glyn,
>>
>> I'll try to answer this points.
>>
>> I've made some benchs, and indeed 3.2 not helping. not helping at all.
>> I changed to 3.14 and gap is quite big !
>> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>> conditions
>> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>> conditions too.
>
> That's a start then.
>
>> It should so be better, but when server was in production, and ever with
>> bad kernel, performances was already quite good before they quickly
>> decreased.
>> So i think too I have another configuration problem.
>>
>> You say you're IO bound, so some output from sar / iostat / dstat and
>> pg_stat_activity etc before and during the issue would be of use.
>>
>> -> My server is not in production right now, so it is difficult to
>> replay production load and have some useful metrics.
>> The best way I've found is to replay trafic from logs with pgreplay.
>> I hoped that the server falls back by replaying this traffic, but it
>> never happens ... Another thing I can't understand ...
>>
>> Below is my dstat output when I replay this traffic (and so when server
>> runs normally)
>> I have unfortunately no more outputs when server's performances decreased.
> It's a shame we can't get any insight into activity on the server during the issues.
>>
>> Other things you asked
>>
>>       System memory size : 256 Go
>>       SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>> INTEL SSDSC2BB480G4
>>       Raid controller : MegaRAID SAS 2208
>>       Partition alignments and stripe sizes : see fdisk delow
>>       Kernel options : the config file is here :
>> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>       Filesystem used and mount options : ext4, see mtab below
>>       IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>       Postgresql version and configuration : 9.3.5
>>
>> max_connections=1800
>> shared_buffers=8GB
>> temp_buffers=32MB
>> work_mem=100MB
>> maintenance_work_mem=12GB
>> bgwriter_lru_maxpages=200
>> effective_io_concurrency=4
>> wal_level=hot_standby
>> wal_sync_method=fdatasync
>> wal_writer_delay=2000ms
>> commit_delay=1000
>> checkpoint_segments=80
>> checkpoint_timeout=15min
>> checkpoint_completion_target=0.7
>> archive_command='rsync ....'
>> max_wal_senders=10
>> wal_keep_segments=38600
>> vacuum_defer_cleanup_age=100
>> hot_standby = on
>> max_standby_archive_delay = 5min
>> max_standby_streaming_delay = 5min
>> hot_standby_feedback = on
>> random_page_cost = 1.0
>> effective_cache_size = 240GB
>> log_min_error_statement = warning
>> log_min_duration_statement = 0
>> log_checkpoints = on
>> log_connections = on
>> log_disconnections = on
>> log_line_prefix = '%m|%u|%d|%c|'
>> log_lock_waits = on
>> log_statement = 'all'
>> log_timezone = 'localtime'
>> track_activities = on
>> track_functions = pl
>> track_activity_query_size = 8192
>> autovacuum_max_workers = 5
>> autovacuum_naptime = 30s
>> autovacuum_vacuum_threshold = 40
>> autovacuum_analyze_threshold = 20
>> autovacuum_vacuum_scale_factor = 0.10
>> autovacuum_analyze_scale_factor = 0.10
>> autovacuum_vacuum_cost_delay = 5ms
>> default_transaction_isolation = 'read committed'
>> max_locks_per_transaction = 128
>>
>>
>>
>>       Connection pool sizing (pgpool2)
>> num_init_children = 1790
>> max_pool = 1
>
> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as
Irecall, my memory is a little fuzzy on pgpool now).  Unless your app is stateful in some way or has unique users for
eachof those 1800 connections you should lower the quantity of active connections.  A general starting point is usually
cpucores * 2, so you could up max_pool and divide num_init_children by the same amount. 
>
> Hard to say what you need to do without knowing what exactly you're doing though.  What's the nature of the app(s)?
Yes, we just use it as a proxy for now.
We have approximately 100 different active users, doing for all of then
various number of connexions (twisted + zope apps)
result is ~ 900 idle connexions for ~ 60 active connexions, but
sometimes (when stopping/starting prod), we need almost double of
connexion because some twisted services don't stop their connexions
immediatly.
But this is the actual (working) configuration, and I don't think think
my performance disk is related to this.
>
>> I also add megacli parameters :
>>
>> Virtual Drive: 2 (Target Id: 2)
>> Name                :datassd
>> RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
>> Size                : 893.25 GB
>> Sector Size         : 512
>> Is VD emulated      : Yes
>> Mirror Data         : 893.25 GB
>> State               : Optimal
>> Strip Size          : 256 KB
>> Number Of Drives per span:2
>> Span Depth          : 2
>> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>> Cache if Bad BBU
>> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>> Cache if Bad BBU
>> Default Access Policy: Read/Write
>> Current Access Policy: Read/Write
>> Disk Cache Policy   : Enabled
>> Encryption Type     : None
>> Bad Blocks Exist: No
>> PI type: No PI
>>
>> Is VD Cached: No
>
> Not using your raid controllers write cache then?  Not sure just how important that is with SSDs these days, but if
you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. 
No, I had read some megacli related docs about SSD, and the advice was
to put writethrough on disks. (see
http://wiki.mikejung.biz/LSI#Configure_LSI_Card_for_SSD_RAID), last section.
Disks are already in "No Write Cache if Bad BBU" mode. (wrote on
splitted line on my extract)
>
>
>> Other outputs :
>>       fdisk -l
>>
>> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x00000000
>>
>> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0x00000000
>>
>>
>>       cat /etc/mtab
>> /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>> rw,relatime,discard,nobarrier,data=ordered 0 0
>> (I added nobarrier option)
>>
>>
>>       cat /sys/block/sdc/queue/scheduler
>> noop [deadline] cfq
>>
>
> You could swap relatime for noatime,nodiratime.
I'll swap to noatime, thanks.
>
>>       sysctl kernel | grep sched
>> kernel.sched_child_runs_first = 0
>> kernel.sched_rr_timeslice_ms = 25
>> kernel.sched_rt_period_us = 1000000
>> kernel.sched_rt_runtime_us = 950000
>>
>> I've read some advices about changing kernel.sched_autogroup_enabled=0
>> and kernel.sched_migration_cost_ns=5000000, but this parameters are not
>> recognized by my kernel. So I don't know what to do with that...
> sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why
> sched_autogroup_enabled wouldn't be recognized though.
I've found on config file that "CONFIG_SCHED_AUTOGROUP is not set" in
this kernel.
So i guess it is the same thing as if it was enabled=0 , right ?
I haven't found any parameter related to migration_sost in config file.


>
>>       sysctl vm
>> vm.admin_reserve_kbytes = 8192
>> vm.block_dump = 0
>> vm.dirty_background_bytes = 8388608
>> vm.dirty_background_ratio = 0
>> vm.dirty_bytes = 67108864
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 0
>> vm.dirty_writeback_centisecs = 500
>> vm.drop_caches = 3
>> vm.laptop_mode = 0
>> vm.legacy_va_layout = 0
>> vm.lowmem_reserve_ratio = 256    256    32
>> vm.max_map_count = 65530
>> vm.memory_failure_early_kill = 0
>> vm.memory_failure_recovery = 1
>> vm.min_free_kbytes = 65008
>> vm.min_slab_ratio = 5
>> vm.min_unmapped_ratio = 1
>> vm.mmap_min_addr = 4096
>> vm.nr_pdflush_threads = 0
>> vm.numa_zonelist_order = default
>> vm.oom_dump_tasks = 1
>> vm.oom_kill_allocating_task = 0
>> vm.overcommit_kbytes = 0
>> vm.overcommit_memory = 2
>> vm.overcommit_ratio = 50
>> vm.page-cluster = 3
>> vm.panic_on_oom = 0
>> vm.percpu_pagelist_fraction = 0
>> vm.scan_unevictable_pages = 0
>> vm.stat_interval = 1
>> vm.swappiness = 60
>> vm.user_reserve_kbytes = 131072
>> vm.vfs_cache_pressure = 100
>> vm.zone_reclaim_mode = 0
>>
>>
>>       select * from pg_stat_activity
>> I've got hundred of entries for that when i'm in production, and I
>> can't
>> paste it here due to confidentiality.
>> By day, it is around 50 millions queries usually. (35% selects ; 55%
>> updates & 5% inserts)
>>
>>
>>       lspci | grep -E 'RAID|SCSI|IDE|SATA'
>> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset
>> 6-Port SATA AHCI Controller (rev 06)
>> 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208
>> [Thunderbolt] (rev 05)
>> 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset
>> 4-Port SATA Storage Control Unit (rev 06)
>>
>> Thanks
>>
>> Thomas
>>
>>
>> Le 18/05/2015 16:29, Glyn Astill a écrit :
>>>>   From: Koray Eyidoğan <korayey@gmail.com>
>>>>   To: Thomas SIMON <tsimon@neteven.com>
>>>>   Cc: pgsql-admin@postgresql.org
>>>>   Sent: Monday, 18 May 2015, 14:51
>>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>>
>>>>
>>>>
>>>>   Hi Thomas,
>>>>
>>>>
>>>>   3.2 kernel may be #1 cause of your I/O load problem:
>>>>
>>>>
>>>>
>> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>>>>
>>
https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>>>>
>>>>
>>>>   Have a nice day.
>>>>
>>>>
>>>>   Koray
>>>   Likely 3.2 kernel isn't helping, but I think we need much more
>> information before jumping to conclusions.
>>>   You say you're IO bound, so some output from sar / iostat / dstat and
>> pg_stat_activity etc before and during the issue would be of use.
>>>   Also:
>>>        System memory size
>>>
>>>        SSD Model numbers and how many
>>>        Raid controller
>>>        Partition allignments and stripe sizes
>>>        Kernel options
>>>        Filesystem used and mount options
>>>        IO Scheduler
>>>        Postgresql version and configuration
>>>        Connection pool sizing
>>>
>>>
>>>   Perhaps you could thow us the output of some of these:
>>>
>>>        fdisk -l
>>>        cat /etc/mtab
>>>        cat /sys/block/<ssd device>/queue/scheduler
>>>        sysctl kernel | grep sched
>>>        sysctl vm
>>>        select * from pg_stat_activity
>>>        select name, setting from pg_settings
>>>        lspci | grep -E 'RAID|SCSI|IDE|SATA'
>>
>>
>> --
>> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-admin
>>



Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
disk was already in noop mode

cat /sys/block/sdc/queue/scheduler
  noop [deadline] cfq


Thomas

Le 20/05/2015 19:03, Glyn Astill a écrit :
> ----- Original Message -----
>> From: Glyn Astill <glynastill@yahoo.co.uk>
>> To: Thomas SIMON <tsimon@neteven.com>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Wednesday, 20 May 2015, 17:50
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>>
>>>   From: Thomas SIMON <tsimon@neteven.com>
>>>   To: glynastill@yahoo.co.uk
>>>   Cc: "pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>>   Sent: Wednesday, 20 May 2015, 16:41
>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>   Hi Glyn,
>>>
>>>   I'll try to answer this points.
>>>
>>>   I've made some benchs, and indeed 3.2 not helping. not helping at all.
>>>   I changed to 3.14 and gap is quite big !
>>>   With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>>>   conditions
>>>   With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>>>   conditions too.
>>
>> That's a start then.
>>
>>>   It should so be better, but when server was in production, and ever with
>>>   bad kernel, performances was already quite good before they quickly
>>>   decreased.
>>>   So i think too I have another configuration problem.
>>>
>>>   You say you're IO bound, so some output from sar / iostat / dstat and
>>>   pg_stat_activity etc before and during the issue would be of use.
>>>
>>>   -> My server is not in production right now, so it is difficult to
>>>   replay production load and have some useful metrics.
>>>   The best way I've found is to replay trafic from logs with pgreplay.
>>>   I hoped that the server falls back by replaying this traffic, but it
>>>   never happens ... Another thing I can't understand ...
>>>
>>>   Below is my dstat output when I replay this traffic (and so when server
>>>   runs normally)
>>>   I have unfortunately no more outputs when server's performances
>> decreased.
>>
>> It's a shame we can't get any insight into activity on the server during
>> the issues.
>>>
>>>   Other things you asked
>>>
>>>        System memory size : 256 Go
>>>        SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>>   INTEL SSDSC2BB480G4
>>>        Raid controller : MegaRAID SAS 2208
>>>        Partition alignments and stripe sizes : see fdisk delow
>>>        Kernel options : the config file is here :
>>>
>> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>>        Filesystem used and mount options : ext4, see mtab below
>>>        IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>>        Postgresql version and configuration : 9.3.5
>>>
>>>   max_connections=1800
>>>   shared_buffers=8GB
>>>   temp_buffers=32MB
>>>   work_mem=100MB
>>>   maintenance_work_mem=12GB
>>>   bgwriter_lru_maxpages=200
>>>   effective_io_concurrency=4
>>>   wal_level=hot_standby
>>>   wal_sync_method=fdatasync
>>>   wal_writer_delay=2000ms
>>>   commit_delay=1000
>>>   checkpoint_segments=80
>>>   checkpoint_timeout=15min
>>>   checkpoint_completion_target=0.7
>>>   archive_command='rsync ....'
>>>   max_wal_senders=10
>>>   wal_keep_segments=38600
>>>   vacuum_defer_cleanup_age=100
>>>   hot_standby = on
>>>   max_standby_archive_delay = 5min
>>>   max_standby_streaming_delay = 5min
>>>   hot_standby_feedback = on
>>>   random_page_cost = 1.0
>>>   effective_cache_size = 240GB
>>>   log_min_error_statement = warning
>>>   log_min_duration_statement = 0
>>>   log_checkpoints = on
>>>   log_connections = on
>>>   log_disconnections = on
>>>   log_line_prefix = '%m|%u|%d|%c|'
>>>   log_lock_waits = on
>>>   log_statement = 'all'
>>>   log_timezone = 'localtime'
>>>   track_activities = on
>>>   track_functions = pl
>>>   track_activity_query_size = 8192
>>>   autovacuum_max_workers = 5
>>>   autovacuum_naptime = 30s
>>>   autovacuum_vacuum_threshold = 40
>>>   autovacuum_analyze_threshold = 20
>>>   autovacuum_vacuum_scale_factor = 0.10
>>>   autovacuum_analyze_scale_factor = 0.10
>>>   autovacuum_vacuum_cost_delay = 5ms
>>>   default_transaction_isolation = 'read committed'
>>>   max_locks_per_transaction = 128
>>>
>>>
>>>
>>>        Connection pool sizing (pgpool2)
>>>   num_init_children = 1790
>>>   max_pool = 1
>>
>> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
>> effectively just using pgpool as a proxy (as I recall, my memory is a little
>> fuzzy on pgpool now).  Unless your app is stateful in some way or has unique
>> users for each of those 1800 connections you should lower the quantity of active
>> connections.  A general starting point is usually cpu cores * 2, so you could up
>> max_pool and divide num_init_children by the same amount.
>>
>> Hard to say what you need to do without knowing what exactly you're doing
>> though.  What's the nature of the app(s)?
>>
>>>   I also add megacli parameters :
>>>
>>>   Virtual Drive: 2 (Target Id: 2)
>>>   Name                :datassd
>>>   RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
>>>   Size                : 893.25 GB
>>>   Sector Size         : 512
>>>   Is VD emulated      : Yes
>>>   Mirror Data         : 893.25 GB
>>>   State               : Optimal
>>>   Strip Size          : 256 KB
>>>   Number Of Drives per span:2
>>>   Span Depth          : 2
>>>   Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>>   Cache if Bad BBU
>>>   Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>>   Cache if Bad BBU
>>>   Default Access Policy: Read/Write
>>>   Current Access Policy: Read/Write
>>>   Disk Cache Policy   : Enabled
>>>   Encryption Type     : None
>>>   Bad Blocks Exist: No
>>>   PI type: No PI
>>>
>>>   Is VD Cached: No
>>
>> Not using your raid controllers write cache then?  Not sure just how important
>> that is with SSDs these days, but if you've got a BBU set it to
>> "WriteBack". Also change "Cache if Bad BBU" to "No
>> Write Cache if Bad BBU" if you do that.
>>
>>
>>>   Other outputs :
>>>        fdisk -l
>>>
>>>   Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>>>   255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>>>   Units = sectors of 1 * 512 = 512 bytes
>>>   Sector size (logical/physical): 512 bytes / 4096 bytes
>>>   I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>   Disk identifier: 0x00000000
>>>
>>>   Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>>>   255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>>>   Units = sectors of 1 * 512 = 512 bytes
>>>   Sector size (logical/physical): 512 bytes / 4096 bytes
>>>   I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>   Disk identifier: 0x00000000
>>>
>>>
>>>        cat /etc/mtab
>>>   /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>>>   rw,relatime,discard,nobarrier,data=ordered 0 0
>>>   (I added nobarrier option)
>>>
>>>
>>>        cat /sys/block/sdc/queue/scheduler
>>>   noop [deadline] cfq
>>>
>>
>> You could swap relatime for noatime,nodiratime.
>>
>
> You could also see if the noop scheduler makes any improvement.
>
>



Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
> From: Thomas SIMON <tsimon@neteven.com>
>To: Glyn Astill <glynastill@yahoo.co.uk>
>Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>Sent: Thursday, 21 May 2015, 13:28
>Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
>disk was already in noop mode
>
>cat /sys/block/sdc/queue/scheduler
>  noop [deadline] cfq
>
>


No, the output shows the i/o scheduler between the brackets. That says it's deadline for sdc.

Of course deadline may well be the better option for you, but it's worth testing.


Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
> From: Thomas SIMON <tsimon@neteven.com>

> To: Glyn Astill <glynastill@yahoo.co.uk>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, 21 May 2015, 13:12
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
> Le 20/05/2015 18:50, Glyn Astill a écrit :
>>
>>>  From: Thomas SIMON <tsimon@neteven.com>
>>>  To: glynastill@yahoo.co.uk
>>>  Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>>>  Sent: Wednesday, 20 May 2015, 16:41
>>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>  Hi Glyn,
>>>
>>>  I'll try to answer this points.
>>>
>>>  I've made some benchs, and indeed 3.2 not helping. not helping at
> all.
>>>  I changed to 3.14 and gap is quite big !
>>>  With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in
> same
>>>  conditions
>>>  With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS,
> same
>>>  conditions too.
>>
>>  That's a start then.
>>
>>>  It should so be better, but when server was in production, and ever
> with
>>>  bad kernel, performances was already quite good before they quickly
>>>  decreased.
>>>  So i think too I have another configuration problem.
>>>
>>>  You say you're IO bound, so some output from sar / iostat / dstat
> and
>>>  pg_stat_activity etc before and during the issue would be of use.
>>>
>>>  -> My server is not in production right now, so it is difficult to
>>>  replay production load and have some useful metrics.
>>>  The best way I've found is to replay trafic from logs with
> pgreplay.
>>>  I hoped that the server falls back by replaying this traffic, but it
>>>  never happens ... Another thing I can't understand ...
>>>
>>>  Below is my dstat output when I replay this traffic (and so when server
>>>  runs normally)
>>>  I have unfortunately no more outputs when server's performances
> decreased.
>>  It's a shame we can't get any insight into activity on the server
> during the issues.
>>>
>>>  Other things you asked
>>>
>>>        System memory size : 256 Go
>>>        SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>>  INTEL SSDSC2BB480G4
>>>        Raid controller : MegaRAID SAS 2208
>>>        Partition alignments and stripe sizes : see fdisk delow
>>>        Kernel options : the config file is here :
>>>
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>>        Filesystem used and mount options : ext4, see mtab below
>>>        IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>>        Postgresql version and configuration : 9.3.5
>>>
>>>  max_connections=1800
>>>  shared_buffers=8GB
>>>  temp_buffers=32MB
>>>  work_mem=100MB
>>>  maintenance_work_mem=12GB
>>>  bgwriter_lru_maxpages=200
>>>  effective_io_concurrency=4
>>>  wal_level=hot_standby
>>>  wal_sync_method=fdatasync
>>>  wal_writer_delay=2000ms
>>>  commit_delay=1000
>>>  checkpoint_segments=80
>>>  checkpoint_timeout=15min
>>>  checkpoint_completion_target=0.7
>>>  archive_command='rsync ....'
>>>  max_wal_senders=10
>>>  wal_keep_segments=38600
>>>  vacuum_defer_cleanup_age=100
>>>  hot_standby = on
>>>  max_standby_archive_delay = 5min
>>>  max_standby_streaming_delay = 5min
>>>  hot_standby_feedback = on
>>>  random_page_cost = 1.0
>>>  effective_cache_size = 240GB
>>>  log_min_error_statement = warning
>>>  log_min_duration_statement = 0
>>>  log_checkpoints = on
>>>  log_connections = on
>>>  log_disconnections = on
>>>  log_line_prefix = '%m|%u|%d|%c|'
>>>  log_lock_waits = on
>>>  log_statement = 'all'
>>>  log_timezone = 'localtime'
>>>  track_activities = on
>>>  track_functions = pl
>>>  track_activity_query_size = 8192
>>>  autovacuum_max_workers = 5
>>>  autovacuum_naptime = 30s
>>>  autovacuum_vacuum_threshold = 40
>>>  autovacuum_analyze_threshold = 20
>>>  autovacuum_vacuum_scale_factor = 0.10
>>>  autovacuum_analyze_scale_factor = 0.10
>>>  autovacuum_vacuum_cost_delay = 5ms
>>>  default_transaction_isolation = 'read committed'
>>>  max_locks_per_transaction = 128
>>>
>>>
>>>
>>>        Connection pool sizing (pgpool2)
>>>  num_init_children = 1790
>>>  max_pool = 1
>>
>>  1800 is quite a lot of connections, and with max_pool=1 in pgpool
> you're effectively just using pgpool as a proxy (as I recall, my memory is a
> little fuzzy on pgpool now).  Unless your app is stateful in some way or has
> unique users for each of those 1800 connections you should lower the quantity of
> active connections.  A general starting point is usually cpu cores * 2, so you
> could up max_pool and divide num_init_children by the same amount.
>>

>>  Hard to say what you need to do without knowing what exactly you're

>> doing though.  What's the nature of the app(s)?
>
> Yes, we just use it as a proxy for now.
> We have approximately 100 different active users, doing for all of then
> various number of connexions (twisted + zope apps)
> result is ~ 900 idle connexions for ~ 60 active connexions, but
> sometimes (when stopping/starting prod), we need almost double of
> connexion because some twisted services don't stop their connexions
> immediatly.
> But this is the actual (working) configuration, and I don't think think
> my performance disk is related to this.

I think at this point you could do with going back and trying to reproduce the issue, then trace back up to
pg_stat_activityto see what activity could be causing the disk i/o.  I assume you've tried to reproduce the disk issues
witha simple disk benchmark like bonnie++? 

Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Le 21/05/2015 14:57, Glyn Astill a écrit :
>> From: Thomas SIMON <tsimon@neteven.com>
>> To: Glyn Astill <glynastill@yahoo.co.uk>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Thursday, 21 May 2015, 13:12
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>> Le 20/05/2015 18:50, Glyn Astill a écrit :
>>>>   From: Thomas SIMON <tsimon@neteven.com>
>>>>   To: glynastill@yahoo.co.uk
>>>>   Cc: "pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>>>   Sent: Wednesday, 20 May 2015, 16:41
>>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>>
>>>>   Hi Glyn,
>>>>
>>>>   I'll try to answer this points.
>>>>
>>>>   I've made some benchs, and indeed 3.2 not helping. not helping at
>> all.
>>>>   I changed to 3.14 and gap is quite big !
>>>>   With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in
>> same
>>>>   conditions
>>>>   With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS,
>> same
>>>>   conditions too.
>>>   That's a start then.
>>>
>>>>   It should so be better, but when server was in production, and ever
>> with
>>>>   bad kernel, performances was already quite good before they quickly
>>>>   decreased.
>>>>   So i think too I have another configuration problem.
>>>>
>>>>   You say you're IO bound, so some output from sar / iostat / dstat
>> and
>>>>   pg_stat_activity etc before and during the issue would be of use.
>>>>
>>>>   -> My server is not in production right now, so it is difficult to
>>>>   replay production load and have some useful metrics.
>>>>   The best way I've found is to replay trafic from logs with
>> pgreplay.
>>>>   I hoped that the server falls back by replaying this traffic, but it
>>>>   never happens ... Another thing I can't understand ...
>>>>
>>>>   Below is my dstat output when I replay this traffic (and so when server
>>>>   runs normally)
>>>>   I have unfortunately no more outputs when server's performances
>> decreased.
>>>   It's a shame we can't get any insight into activity on the server
>> during the issues.
>>>>   Other things you asked
>>>>
>>>>         System memory size : 256 Go
>>>>         SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>>>   INTEL SSDSC2BB480G4
>>>>         Raid controller : MegaRAID SAS 2208
>>>>         Partition alignments and stripe sizes : see fdisk delow
>>>>         Kernel options : the config file is here :
>>>>
>> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>>>         Filesystem used and mount options : ext4, see mtab below
>>>>         IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>>>         Postgresql version and configuration : 9.3.5
>>>>
>>>>   max_connections=1800
>>>>   shared_buffers=8GB
>>>>   temp_buffers=32MB
>>>>   work_mem=100MB
>>>>   maintenance_work_mem=12GB
>>>>   bgwriter_lru_maxpages=200
>>>>   effective_io_concurrency=4
>>>>   wal_level=hot_standby
>>>>   wal_sync_method=fdatasync
>>>>   wal_writer_delay=2000ms
>>>>   commit_delay=1000
>>>>   checkpoint_segments=80
>>>>   checkpoint_timeout=15min
>>>>   checkpoint_completion_target=0.7
>>>>   archive_command='rsync ....'
>>>>   max_wal_senders=10
>>>>   wal_keep_segments=38600
>>>>   vacuum_defer_cleanup_age=100
>>>>   hot_standby = on
>>>>   max_standby_archive_delay = 5min
>>>>   max_standby_streaming_delay = 5min
>>>>   hot_standby_feedback = on
>>>>   random_page_cost = 1.0
>>>>   effective_cache_size = 240GB
>>>>   log_min_error_statement = warning
>>>>   log_min_duration_statement = 0
>>>>   log_checkpoints = on
>>>>   log_connections = on
>>>>   log_disconnections = on
>>>>   log_line_prefix = '%m|%u|%d|%c|'
>>>>   log_lock_waits = on
>>>>   log_statement = 'all'
>>>>   log_timezone = 'localtime'
>>>>   track_activities = on
>>>>   track_functions = pl
>>>>   track_activity_query_size = 8192
>>>>   autovacuum_max_workers = 5
>>>>   autovacuum_naptime = 30s
>>>>   autovacuum_vacuum_threshold = 40
>>>>   autovacuum_analyze_threshold = 20
>>>>   autovacuum_vacuum_scale_factor = 0.10
>>>>   autovacuum_analyze_scale_factor = 0.10
>>>>   autovacuum_vacuum_cost_delay = 5ms
>>>>   default_transaction_isolation = 'read committed'
>>>>   max_locks_per_transaction = 128
>>>>
>>>>
>>>>
>>>>         Connection pool sizing (pgpool2)
>>>>   num_init_children = 1790
>>>>   max_pool = 1
>>>   1800 is quite a lot of connections, and with max_pool=1 in pgpool
>> you're effectively just using pgpool as a proxy (as I recall, my memory is a
>> little fuzzy on pgpool now).  Unless your app is stateful in some way or has
>> unique users for each of those 1800 connections you should lower the quantity of
>> active connections.  A general starting point is usually cpu cores * 2, so you
>> could up max_pool and divide num_init_children by the same amount.
>>>   Hard to say what you need to do without knowing what exactly you're
>>> doing though.  What's the nature of the app(s)?
>> Yes, we just use it as a proxy for now.
>> We have approximately 100 different active users, doing for all of then
>> various number of connexions (twisted + zope apps)
>> result is ~ 900 idle connexions for ~ 60 active connexions, but
>> sometimes (when stopping/starting prod), we need almost double of
>> connexion because some twisted services don't stop their connexions
>> immediatly.
>> But this is the actual (working) configuration, and I don't think think
>> my performance disk is related to this.
> I think at this point you could do with going back and trying to reproduce the issue, then trace back up to
pg_stat_activityto see what activity could be causing the disk i/o.  I assume you've tried to reproduce the disk issues
witha simple disk benchmark like bonnie++? 
Yes, I think the same thing. Probably I will doing this tomorrow early
in the morning.
I tried to reproduce disk issues with different stress tests like
bonnie, fio, tsung, and I use a more realistic scenario with pgreplay to
reproduce my production trafic from postgresql logfile.
However, I'm note sure how to diagnostic performance issues.
I mean, if I see ssd are 100% full, how can I figure out why their
behavior changes ?

I'm asking myself another question, about master/slave configuration.
For doing my test, I will put my ssd server as slave of hdd server.
After that, I will promote him as master.
In case I still have performance issues and I must do a rollback, am I
necessarily forced to reconstruct completely my new slave (hdd) with
pg_basebackup (and wait some hours file are transferer), or can I
promote directly this old master as a slave without pending time to
reconstruct (as files should be the same on both servers) ?

Thanks
Thomas


Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:

>>  I think at this point you could do with going back and trying to reproduce
>> the issue, then trace back up to pg_stat_activity to see what activity could be
>> causing the disk i/o.  I assume you've tried to reproduce the disk issues

>> with a simple disk benchmark like bonnie++?
>
> Yes, I think the same thing. Probably I will doing this tomorrow early
> in the morning.
> I tried to reproduce disk issues with different stress tests like
> bonnie, fio, tsung, and I use a more realistic scenario with pgreplay to
> reproduce my production trafic from postgresql logfile.
> However, I'm note sure how to diagnostic performance issues.
> I mean, if I see ssd are 100% full, how can I figure out why their

> behavior changes ?
>

Well the disk benchmarks are purely to see what your disks are capable of, and help with your initial tuning.


You need to trace back which processes are causing most of the IO you're seeing, as well as the postgresql logs
somethinglike iotop, or dstat with the --top-bio option might help you there. 


You could also look at the pg_statio_user_tables view to narrow down which tables are being hit the hardest, which
mightgive you some clues. 


Also see here:
    https://wiki.postgresql.org/wiki/Performance_Analysis_Tools
    https://wiki.postgresql.org/wiki/Monitoring

> > I'm asking myself another question, about master/slave configuration.

> For doing my test, I will put my ssd server as slave of hdd server.


Unless you've got a mainly read-only worlkoad, you can't really test the slave properly that way as all it's doing is
replayingthe wal. 

> After that, I will promote him as master.
> In case I still have performance issues and I must do a rollback, am I
> necessarily forced to reconstruct completely my new slave (hdd) with
> pg_basebackup (and wait some hours file are transferer), or can I
> promote directly this old master as a slave without pending time to
> reconstruct (as files should be the same on both servers) ?
>


Yes you will need to rebuild it or look at pg_rewind:


    https://github.com/vmware/pg_rewind


Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Le 21/05/2015 14:46, Glyn Astill a écrit :
>> From: Thomas SIMON <tsimon@neteven.com>
>> To: Glyn Astill <glynastill@yahoo.co.uk>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Thursday, 21 May 2015, 13:28
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>> disk was already in noop mode
>>
>> cat /sys/block/sdc/queue/scheduler
>>   noop [deadline] cfq
>>
>>
>
> No, the output shows the i/o scheduler between the brackets. That says it's deadline for sdc.
>
> Of course deadline may well be the better option for you, but it's worth testing.
You're totally right. I changed parameter to noop and redo pgbench test,
but I've seen no significantly differences.



Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Le 21/05/2015 16:30, Glyn Astill a écrit :
>
>>>   I think at this point you could do with going back and trying to reproduce
>>> the issue, then trace back up to pg_stat_activity to see what activity could be
>>> causing the disk i/o.  I assume you've tried to reproduce the disk issues
>>> with a simple disk benchmark like bonnie++?
>> Yes, I think the same thing. Probably I will doing this tomorrow early
>> in the morning.
>> I tried to reproduce disk issues with different stress tests like
>> bonnie, fio, tsung, and I use a more realistic scenario with pgreplay to
>> reproduce my production trafic from postgresql logfile.
>> However, I'm note sure how to diagnostic performance issues.
>> I mean, if I see ssd are 100% full, how can I figure out why their
>> behavior changes ?
>>
> Well the disk benchmarks are purely to see what your disks are capable of, and help with your initial tuning.
>
>
> You need to trace back which processes are causing most of the IO you're seeing, as well as the postgresql logs
somethinglike iotop, or dstat with the --top-bio option might help you there. 
>
>
> You could also look at the pg_statio_user_tables view to narrow down which tables are being hit the hardest, which
mightgive you some clues. 
Is there something to activate for seeing something in this table ?
Because its empty on my production server

postgres=# select * from pg_statio_user_tables;
  relid | schemaname | relname | heap_blks_read | heap_blks_hit |
idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit

-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
(0 rows)

>
>
> Also see here:
>      https://wiki.postgresql.org/wiki/Performance_Analysis_Tools
>      https://wiki.postgresql.org/wiki/Monitoring
>
>>> I'm asking myself another question, about master/slave configuration.
>> For doing my test, I will put my ssd server as slave of hdd server.
>
> Unless you've got a mainly read-only worlkoad, you can't really test the slave properly that way as all it's doing is
replayingthe wal. 
Sorry, I mispoke.
I meant that for promoting my actual ssd test server to master, I had to
promote its as a slave in a first time.
>
>> After that, I will promote him as master.
>> In case I still have performance issues and I must do a rollback, am I
>> necessarily forced to reconstruct completely my new slave (hdd) with
>> pg_basebackup (and wait some hours file are transferer), or can I
>> promote directly this old master as a slave without pending time to
>> reconstruct (as files should be the same on both servers) ?
>>
>
> Yes you will need to rebuild it or look at pg_rewind:
>
>
>      https://github.com/vmware/pg_rewind
Thanks for reference. Could do the trick.
I'll read that with interest.
>
>



Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
> From: Thomas SIMON <tsimon@neteven.com>

> To: Glyn Astill <glynastill@yahoo.co.uk>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Thursday, 21 May 2015, 17:56
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
> Le 21/05/2015 16:30, Glyn Astill a écrit :
>>
>>>>    I think at this point you could do with going back and trying to
> reproduce
>>>>  the issue, then trace back up to pg_stat_activity to see what
> activity could be
>>>>  causing the disk i/o.  I assume you've tried to reproduce the
> disk issues
>>>>  with a simple disk benchmark like bonnie++?
>>>  Yes, I think the same thing. Probably I will doing this tomorrow early
>>>  in the morning.
>>>  I tried to reproduce disk issues with different stress tests like
>>>  bonnie, fio, tsung, and I use a more realistic scenario with pgreplay
> to
>>>  reproduce my production trafic from postgresql logfile.
>>>  However, I'm note sure how to diagnostic performance issues.
>>>  I mean, if I see ssd are 100% full, how can I figure out why their
>>>  behavior changes ?
>>>
>>  Well the disk benchmarks are purely to see what your disks are capable of,
> and help with your initial tuning.
>>
>>
>>  You need to trace back which processes are causing most of the IO
> you're seeing, as well as the postgresql logs something like iotop, or dstat
> with the --top-bio option might help you there.
>>
>>
>>  You could also look at the pg_statio_user_tables view to narrow down which
> tables are being hit the hardest, which might give you some clues.
> Is there something to activate for seeing something in this table ?
> Because its empty on my production server
>
> postgres=# select * from pg_statio_user_tables;
>   relid | schemaname | relname | heap_blks_read | heap_blks_hit |
> idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
> tidx_blks_read | tidx_blks_hit
>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
> (0 rows)
>


Looks like you need to set track_counts=on then.  Infact if you've got track_counts off then you're also not running
autovacuum,that's a warning flag unless it's intentional. 


Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Thomas

Le 22/05/2015 11:37, Glyn Astill a écrit :
>> From: Thomas SIMON <tsimon@neteven.com>
>> To: Glyn Astill <glynastill@yahoo.co.uk>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Thursday, 21 May 2015, 17:56
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>> Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>     I think at this point you could do with going back and trying to
>> reproduce
>>>>>   the issue, then trace back up to pg_stat_activity to see what
>> activity could be
>>>>>   causing the disk i/o.  I assume you've tried to reproduce the
>> disk issues
>>>>>   with a simple disk benchmark like bonnie++?
>>>>   Yes, I think the same thing. Probably I will doing this tomorrow early
>>>>   in the morning.
>>>>   I tried to reproduce disk issues with different stress tests like
>>>>   bonnie, fio, tsung, and I use a more realistic scenario with pgreplay
>> to
>>>>   reproduce my production trafic from postgresql logfile.
>>>>   However, I'm note sure how to diagnostic performance issues.
>>>>   I mean, if I see ssd are 100% full, how can I figure out why their
>>>>   behavior changes ?
>>>>
>>>   Well the disk benchmarks are purely to see what your disks are capable of,
>> and help with your initial tuning.
>>>
>>>   You need to trace back which processes are causing most of the IO
>> you're seeing, as well as the postgresql logs something like iotop, or dstat
>> with the --top-bio option might help you there.
>>>
>>>   You could also look at the pg_statio_user_tables view to narrow down which
>> tables are being hit the hardest, which might give you some clues.
>> Is there something to activate for seeing something in this table ?
>> Because its empty on my production server
>>
>> postgres=# select * from pg_statio_user_tables;
>>    relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>> idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>> tidx_blks_read | tidx_blks_hit
>>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>> (0 rows)
>>
>
> Looks like you need to set track_counts=on then.  Infact if you've got track_counts off then you're also not running
autovacuum,that's a warning flag unless it's intentional. 
Arf, no it's just me who does dummy things ...
I was in postgres database, not in my app...


Here is result of my top tables.

relid   | schemaname | relname                        | heap_blks_read |
heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read |
toast_blks_hit | tidx_blks_read | tidx_blks_hit

----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
   2056411 | public     | table 1
|   326588572770 | 432317896616 |   12839634494 |   33230958888 |
52298815 |       20020628 |        9406665 |     153557966
   2059324 | public     | table 2                               |
223323685944 |   72194867978 |   10319078550 |  103672304590 |
189135 |         262100 |          27815 |        207212
   2056368 | public     | table 3
                                                |   198706578066 |
1259514828344 |   21083719219 | 1456776226532 |           90446
|         331680 |          30850 |        367441
   2056347 | public     | table 4 |   124086575669 |  554775474730 |
5236764894 |  614913194098 |        14978587 |       67085822 |
751951 |     134187396
   2057296 | public     | table 5
|    68120702371 | 18134568265 |    1243107028 |   10537765277 |
350646003 |       87888306 |      119113944 |     629790819
   2058460 | public     | table 6 |    35119995405 |  158177258403 |
1687014495 | 1395923096049 |          738542 |         214574 |
325765 |       1318187
   2058900 | public     | table 7
|    34205345184 | 271135546976 |   21050227933 |  252099340046
|            3924 |           6567 |            422 |         16333
   2059195 | public     | table 8
                                              |    29169256266 |
152937546457 |    2263998316 |  127782237565 |               0
|              0 |              0 |             0

how you interpret these values?
You check ratio between  heap_read & heap_hit ?
i've made ratio, it's about 89%



Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
----- Original Message -----

> From: Thomas SIMON <tsimon@neteven.com>
> To: Glyn Astill <glynastill@yahoo.co.uk>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Friday, 22 May 2015, 16:29
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>
> Thomas
>
> Le 22/05/2015 11:37, Glyn Astill a écrit :
>>>  From: Thomas SIMON <tsimon@neteven.com>
>>>  To: Glyn Astill <glynastill@yahoo.co.uk>
>>>  Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>>>  Sent: Thursday, 21 May 2015, 17:56
>>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>  Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>>      I think at this point you could do with going back and
> trying to
>>>  reproduce
>>>>>>    the issue, then trace back up to pg_stat_activity to see
> what
>>>  activity could be
>>>>>>    causing the disk i/o.  I assume you've tried to
> reproduce the
>>>  disk issues
>>>>>>    with a simple disk benchmark like bonnie++?
>>>>>    Yes, I think the same thing. Probably I will doing this
> tomorrow early
>>>>>    in the morning.
>>>>>    I tried to reproduce disk issues with different stress tests
> like
>>>>>    bonnie, fio, tsung, and I use a more realistic scenario with
> pgreplay
>>>  to
>>>>>    reproduce my production trafic from postgresql logfile.
>>>>>    However, I'm note sure how to diagnostic performance
> issues.
>>>>>    I mean, if I see ssd are 100% full, how can I figure out why
> their
>>>>>    behavior changes ?
>>>>>
>>>>    Well the disk benchmarks are purely to see what your disks are
> capable of,
>>>  and help with your initial tuning.
>>>>
>>>>    You need to trace back which processes are causing most of the IO
>>>  you're seeing, as well as the postgresql logs something like iotop,
> or dstat
>>>  with the --top-bio option might help you there.
>>>>
>>>>    You could also look at the pg_statio_user_tables view to narrow
> down which
>>>  tables are being hit the hardest, which might give you some clues.
>>>  Is there something to activate for seeing something in this table ?
>>>  Because its empty on my production server
>>>
>>>  postgres=# select * from pg_statio_user_tables;
>>>     relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>>>  idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>>>  tidx_blks_read | tidx_blks_hit
>>>
>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>>>  (0 rows)
>>>
>>
>>  Looks like you need to set track_counts=on then.  Infact if you've got
> track_counts off then you're also not running autovacuum, that's a
> warning flag unless it's intentional.
> Arf, no it's just me who does dummy things ...
> I was in postgres database, not in my app...
>
>
> Here is result of my top tables.
>
> relid   | schemaname | relname                        | heap_blks_read |
> heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read |
> toast_blks_hit | tidx_blks_read | tidx_blks_hit
>
----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
>    2056411 | public     | table 1
> |   326588572770 | 432317896616 |   12839634494 |   33230958888 |
> 52298815 |       20020628 |        9406665 |     153557966
>    2059324 | public     | table 2                               |
> 223323685944 |   72194867978 |   10319078550 |  103672304590 |
> 189135 |         262100 |          27815 |        207212
>    2056368 | public     | table 3
>                                                 |   198706578066 |
> 1259514828344 |   21083719219 | 1456776226532 |           90446
> |         331680 |          30850 |        367441
>    2056347 | public     | table 4 |   124086575669 |  554775474730 |
> 5236764894 |  614913194098 |        14978587 |       67085822 |
> 751951 |     134187396
>    2057296 | public     | table 5
> |    68120702371 | 18134568265 |    1243107028 |   10537765277 |
> 350646003 |       87888306 |      119113944 |     629790819
>    2058460 | public     | table 6 |    35119995405 |  158177258403 |
> 1687014495 | 1395923096049 |          738542 |         214574 |
> 325765 |       1318187
>    2058900 | public     | table 7
> |    34205345184 | 271135546976 |   21050227933 |  252099340046
> |            3924 |           6567 |            422 |         16333
>    2059195 | public     | table 8
>                                               |    29169256266 |
> 152937546457 |    2263998316 |  127782237565 |               0
> |              0 |              0 |             0
>
> how you interpret these values?
> You check ratio between  heap_read & heap_hit ?
> i've made ratio, it's about 89%

>


It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared buffers
as"hit".   


See here: http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW


Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o
performance(or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are
thereimprovements to be had with your database and application setup. 

For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else
isgoing off on them.  If you can't do that on your production server perhaps just the output from sar -d during a busy
periodmight help.  Hopefully that would help to clarify the specific disk performance differences for us.  Other
interestingtests would be the output of Greg Smiths stream scaling script
(https://github.com/gregs1104/stream-scaling). We're still missing the exact specs of your old system and the CPUs in
thenew one. 


For 2) I think we're still missing a lot of information about what's happening in your database to be able to say much,
i.e.what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate?  Perhaps you
couldgive a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the
pg_stat_database,pg_stat_user_tables and pg_staio_user_tables views over a specific period of time. 

You really need to dig into each of these yourself to try and see where your new system might be lacking.


Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Thomas

Le 22/05/2015 18:21, Glyn Astill a écrit :
> ----- Original Message -----
>
>> From: Thomas SIMON<tsimon@neteven.com>
>> To: Glyn Astill<glynastill@yahoo.co.uk>
>> Cc:"pgsql-admin@postgresql.org"  <pgsql-admin@postgresql.org>
>> Sent: Friday, 22 May 2015, 16:29
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>> Thomas
>>
>> Le 22/05/2015 11:37, Glyn Astill a écrit :
>>>>   From: Thomas SIMON<tsimon@neteven.com>
>>>>   To: Glyn Astill<glynastill@yahoo.co.uk>
>>>>   Cc:"pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>>>   Sent: Thursday, 21 May 2015, 17:56
>>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>>
>>>>   Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>>>       I think at this point you could do with going back and
>> trying to
>>>>   reproduce
>>>>>>>     the issue, then trace back up to pg_stat_activity to see
>> what
>>>>   activity could be
>>>>>>>     causing the disk i/o.  I assume you've tried to
>> reproduce the
>>>>   disk issues
>>>>>>>     with a simple disk benchmark like bonnie++?
>>>>>>     Yes, I think the same thing. Probably I will doing this
>> tomorrow early
>>>>>>     in the morning.
>>>>>>     I tried to reproduce disk issues with different stress tests
>> like
>>>>>>     bonnie, fio, tsung, and I use a more realistic scenario with
>> pgreplay
>>>>   to
>>>>>>     reproduce my production trafic from postgresql logfile.
>>>>>>     However, I'm note sure how to diagnostic performance
>> issues.
>>>>>>     I mean, if I see ssd are 100% full, how can I figure out why
>> their
>>>>>>     behavior changes ?
>>>>>>
>>>>>     Well the disk benchmarks are purely to see what your disks are
>> capable of,
>>>>   and help with your initial tuning.
>>>>>     You need to trace back which processes are causing most of the IO
>>>>   you're seeing, as well as the postgresql logs something like iotop,
>> or dstat
>>>>   with the --top-bio option might help you there.
>>>>>     You could also look at the pg_statio_user_tables view to narrow
>> down which
>>>>   tables are being hit the hardest, which might give you some clues.
>>>>   Is there something to activate for seeing something in this table ?
>>>>   Because its empty on my production server
>>>>
>>>>   postgres=# select * from pg_statio_user_tables;
>>>>      relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>>>>   idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>>>>   tidx_blks_read | tidx_blks_hit
>>>>
>>
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>>>>   (0 rows)
>>>>
>>>   Looks like you need to set track_counts=on then.  Infact if you've got
>> track_counts off then you're also not running autovacuum, that's a
>> warning flag unless it's intentional.
>> Arf, no it's just me who does dummy things ...
>> I was in postgres database, not in my app...
>>
>>
>> Here is result of my top tables.
>>
>> relid   | schemaname | relname                        | heap_blks_read |
>> heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read |
>> toast_blks_hit | tidx_blks_read | tidx_blks_hit
>>
----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
>>     2056411 | public     | table 1
>> |   326588572770 | 432317896616 |   12839634494 |   33230958888 |
>> 52298815 |       20020628 |        9406665 |     153557966
>>     2059324 | public     | table 2                               |
>> 223323685944 |   72194867978 |   10319078550 |  103672304590 |
>> 189135 |         262100 |          27815 |        207212
>>     2056368 | public     | table 3
>>                                                  |   198706578066 |
>> 1259514828344 |   21083719219 | 1456776226532 |           90446
>> |         331680 |          30850 |        367441
>>     2056347 | public     | table 4 |   124086575669 |  554775474730 |
>> 5236764894 |  614913194098 |        14978587 |       67085822 |
>> 751951 |     134187396
>>     2057296 | public     | table 5
>> |    68120702371 | 18134568265 |    1243107028 |   10537765277 |
>> 350646003 |       87888306 |      119113944 |     629790819
>>     2058460 | public     | table 6 |    35119995405 |  158177258403 |
>> 1687014495 | 1395923096049 |          738542 |         214574 |
>> 325765 |       1318187
>>     2058900 | public     | table 7
>> |    34205345184 | 271135546976 |   21050227933 |  252099340046
>> |            3924 |           6567 |            422 |         16333
>>     2059195 | public     | table 8
>>                                                |    29169256266 |
>> 152937546457 |    2263998316 |  127782237565 |               0
>> |              0 |              0 |             0
>>
>> how you interpret these values?
>> You check ratio between  heap_read & heap_hit ?
>> i've made ratio, it's about 89%
> It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared
buffersas "hit". 
>
>
> See here:http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW
>
>
> Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o
performance(or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are
thereimprovements to be had with your database and application setup. 
>
> For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else
isgoing off on them.  If you can't do that on your production server perhaps just the output from sar -d during a busy
periodmight help.  Hopefully that would help to clarify the specific disk performance differences for us.  Other
interestingtests would be the output of Greg Smiths stream scaling script
(https://github.com/gregs1104/stream-scaling). We're still missing the exact specs of your old system and the CPUs in
thenew one. 
I'll stay in this 1st point for now, because there is origin of my
problem and most urgent right now.

I can't do bonnie++ now, beacause I've already prepared server as a
slave, and I have no sufficient disk space available (program says File
size should be double RAM for good results output)

My current sar output on productions (HDD) server is:
14:25:01          DEV                    tps  rd_sec/s  wr_sec/s
avgrq-sz  avgqu-sz     await     svctm     %util

11:05:01    vg_data-lv_data    954.37   2417.35  18941.77 22.38
31.10     32.46      0.34     32.70
11:15:01    vg_data-lv_data   1155.79   8716.91  21995.77 26.57
25.15     21.74      0.40     46.70
11:25:01    vg_data-lv_data   1250.62   6478.67  23450.07 23.93
39.77     31.78      0.41     51.34
11:35:01    vg_data-lv_data    842.48   2051.11  17120.92 22.76
15.63     18.53      0.29     24.04
11:45:01    vg_data-lv_data    666.21   1403.32  14174.47 23.38
10.11     15.12      0.24     15.79
11:55:01    vg_data-lv_data    923.51   6763.36  15337.58 23.93
13.07     14.14      0.35     32.63
12:05:01    vg_data-lv_data    989.86   9148.71  16252.59 25.66
19.42     19.56      0.45     44.21
12:15:01    vg_data-lv_data   1369.24   8631.93  24737.60 24.37
35.04     25.54      0.45     61.33
12:25:01    vg_data-lv_data   1776.12   7070.01  39851.34 26.42
74.81     42.05      0.44     77.29
12:35:01    vg_data-lv_data   1529.15   6635.80  85865.14 60.49
54.11     35.34      0.48     72.89
12:45:01    vg_data-lv_data   1187.43   4528.74  40366.95 37.81
36.07     30.36      0.39     45.81
12:55:01    vg_data-lv_data    984.48   3520.06  21539.36 25.45
17.91     18.17      0.31     30.20
13:05:01    vg_data-lv_data    926.54   6304.44  16688.94 24.82
17.36     18.69      0.41     38.05
13:15:01    vg_data-lv_data   1232.46   7199.65  29852.49 30.06
40.17     32.53      0.42     51.60
13:25:01    vg_data-lv_data   1223.46   3945.05  27448.15 25.66
31.07     25.31      0.35     42.65
13:35:01    vg_data-lv_data   1126.91   2811.70  22067.19 22.08
24.33     21.55      0.32     36.00
13:45:01    vg_data-lv_data    833.33   1805.26  17274.43 22.90
24.40     29.25      0.30     25.41
13:55:02    vg_data-lv_data   1085.88   7616.75  19140.67 24.64
17.48     16.06      0.39     42.15
14:05:01    vg_data-lv_data    691.52   3852.50  13125.53 24.55
7.75     11.15      0.30     20.74
14:15:01    vg_data-lv_data   1288.88   5390.41  24171.07 22.94
33.31     25.76      0.36     46.31
14:25:01    vg_data-lv_data   1592.88   3637.77  29836.89 21.02
76.45     47.94      0.40     63.28
14:35:01    vg_data-lv_data   1652.78   9502.87  31587.68 24.86
58.97     35.58      0.44     72.46
14:45:01    vg_data-lv_data   1623.82   6249.52  34148.46 24.88
53.47     32.83      0.40     65.19
14:55:01    vg_data-lv_data   1330.44   6516.11  26828.59 25.06
55.66     41.81      0.42     55.46
Average:    vg_data-lv_data   1176.55   5508.02  26324.37 27.06
33.86     28.72      0.39     45.59

Output of stream-scaling is
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 25600K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 25600K (Unified)
... (for 40 cpus)


Total CPU system cache: 1060372480 bytes
Suggested minimum array elements needed: 481987490
WARNING: Array size may not fit into a 32 bit structure.
If stream files to compile, you may need to uncomment the
line in the script labeled and described by the "Size
clamp code" comments in the stream-scaling script.
Array elements used: 481987490

=== CPU Core Summary ===
processor    : 39
model name    : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
cpu MHz        : 1273.925
siblings    : 20

=== Check and build stream ===
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Triad._omp_fn.0':
stream.c:(.text+0x5a): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x8b): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x98): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe0): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe7): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x14b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x15b): relocation truncated to fit: R_X86_64_32S
against `.bss'
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Add._omp_fn.1':
stream.c:(.text+0x1f4): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x1fd): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x23b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x242): additional relocation overflows omitted from the
output
collect2: error: ld returned 1 exit status
=== Trying large memory model ===
(this can take a while to compile)

=== Testing up to 40 cores ===

-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 481987490, Offset = 0
Total memory required = 11031.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 520522 microseconds.
    (= 520522 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        9552.4291       0.8205       0.8073       0.8319
Scale:       9818.6724       0.8128       0.7854       0.8333
Add:        10600.6443       1.1113       1.0912       1.1557
Triad:      10598.8774       1.1073       1.0914       1.1301
-------------------------------------------------------------
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000
-------------------------------------------------------------

Number of Threads requested = 2
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      26541.4003       0.4667       0.4358       0.5130
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000

Number of Threads requested = 3
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      30890.3475       0.3798       0.3745       0.3842
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000

...

Number of Threads requested = 40
Function      Rate (MB/s)   Avg time     Min time     Max time
Triad:      53770.7791       0.2186       0.2151       0.2258
Failed Validation on array c[]
         Expected  : 148233746338593734656.000000
         Observed  : 148233747862815801344.000000



For hardware specs, both servers have same CPU & RAM ; this is 40 *
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz ; and 256 GB RAM.

I'll do tomorrow morning switch between master & slave ; so I probably
will have more information at this moment on SSD server. (sar, dstat
reports, ..)
>
> For 2) I think we're still missing a lot of information about what's happening in your database to be able to say
much,i.e. what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate?  Perhaps
youcould give a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the
pg_stat_database,pg_stat_user_tables and pg_staio_user_tables views over a specific period of time. 
>
> You really need to dig into each of these yourself to try and see where your new system might be lacking.
Quick answer on principal values on db
size ~ 150 GB.
Standard comportement is 55% update, 40% select & 10 % insert ; around
170 TPS.

I've set up recording of pg_stat_database for getting stats and will try
to analyze other tables.






Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
> From: Thomas SIMON <tsimon@neteven.com>
>To: Glyn Astill <glynastill@yahoo.co.uk>
>Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>Sent: Tuesday, 26 May 2015, 15:44
>Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>I can't do bonnie++ now, beacause I've already prepared server as a
>slave, and I have no sufficient disk space available (program says File
>size should be double RAM for good results output)
>
>My current sar output on productions (HDD) server is:
>14:25:01          DEV                    tps  rd_sec/s  wr_sec/s
>avgrq-sz  avgqu-sz     await     svctm     %util
>
>11:05:01    vg_data-lv_data    954.37   2417.35  18941.77 22.38
>31.10     32.46      0.34     32.70
>11:15:01    vg_data-lv_data   1155.79   8716.91  21995.77 26.57
>25.15     21.74      0.40     46.70
>11:25:01    vg_data-lv_data   1250.62   6478.67  23450.07 23.93
>39.77     31.78      0.41     51.34
>11:35:01    vg_data-lv_data    842.48   2051.11  17120.92 22.76
>15.63     18.53      0.29     24.04
>11:45:01    vg_data-lv_data    666.21   1403.32  14174.47 23.38
>10.11     15.12      0.24     15.79
>11:55:01    vg_data-lv_data    923.51   6763.36  15337.58 23.93
>13.07     14.14      0.35     32.63
>12:05:01    vg_data-lv_data    989.86   9148.71  16252.59 25.66
>19.42     19.56      0.45     44.21
>12:15:01    vg_data-lv_data   1369.24   8631.93  24737.60 24.37
>35.04     25.54      0.45     61.33
>12:25:01    vg_data-lv_data   1776.12   7070.01  39851.34 26.42
>74.81     42.05      0.44     77.29
>12:35:01    vg_data-lv_data   1529.15   6635.80  85865.14 60.49
>54.11     35.34      0.48     72.89
>12:45:01    vg_data-lv_data   1187.43   4528.74  40366.95 37.81
>36.07     30.36      0.39     45.81
>12:55:01    vg_data-lv_data    984.48   3520.06  21539.36 25.45
>17.91     18.17      0.31     30.20
>13:05:01    vg_data-lv_data    926.54   6304.44  16688.94 24.82
>17.36     18.69      0.41     38.05
>13:15:01    vg_data-lv_data   1232.46   7199.65  29852.49 30.06
>40.17     32.53      0.42     51.60
>13:25:01    vg_data-lv_data   1223.46   3945.05  27448.15 25.66
>31.07     25.31      0.35     42.65
>13:35:01    vg_data-lv_data   1126.91   2811.70  22067.19 22.08
>24.33     21.55      0.32     36.00
>13:45:01    vg_data-lv_data    833.33   1805.26  17274.43 22.90
>24.40     29.25      0.30     25.41
>13:55:02    vg_data-lv_data   1085.88   7616.75  19140.67 24.64
>17.48     16.06      0.39     42.15
>14:05:01    vg_data-lv_data    691.52   3852.50  13125.53 24.55
>7.75     11.15      0.30     20.74
>14:15:01    vg_data-lv_data   1288.88   5390.41  24171.07 22.94
>33.31     25.76      0.36     46.31
>14:25:01    vg_data-lv_data   1592.88   3637.77  29836.89 21.02
>76.45     47.94      0.40     63.28
>14:35:01    vg_data-lv_data   1652.78   9502.87  31587.68 24.86
>58.97     35.58      0.44     72.46
>14:45:01    vg_data-lv_data   1623.82   6249.52  34148.46 24.88
>53.47     32.83      0.40     65.19
>14:55:01    vg_data-lv_data   1330.44   6516.11  26828.59 25.06
>55.66     41.81      0.42     55.46
>Average:    vg_data-lv_data   1176.55   5508.02  26324.37 27.06
>33.86     28.72      0.39     45.59


So the i/o is read heavy, it would be interesting to see why that might be and some insight into running queries would
goa long way there.  


>Quick answer on principal values on db
>size ~ 150 GB.
>Standard comportement is 55% update, 40% select & 10 % insert ; around
>170 TPS.
>


Again, if this is a dedicated postgres server I'd be hoping for there to be pretty much 0 reads from disk as your whole
databaseshould fit in os buffer cache.  Only 170 TPS, are these large operations like bulk loads and big selects? 


Can you post the ouput of "free -m" and "cat /proc/meminfo"?

Re: Performances issues with SSD volume ?

От
Glyn Astill
Дата:
----- Original Message -----

> From: Glyn Astill <glynastill@yahoo.co.uk>
> To: Thomas SIMON <tsimon@neteven.com>
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Tuesday, 26 May 2015, 16:27
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
>>  From: Thomas SIMON <tsimon@neteven.com>
>> To: Glyn Astill <glynastill@yahoo.co.uk>
>> Cc: "pgsql-admin@postgresql.org"
> <pgsql-admin@postgresql.org>
>> Sent: Tuesday, 26 May 2015, 15:44
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>> I can't do bonnie++ now, beacause I've already prepared server as a
>> slave, and I have no sufficient disk space available (program says File
>> size should be double RAM for good results output)
>>
>> My current sar output on productions (HDD) server is:
>> 14:25:01          DEV                    tps  rd_sec/s  wr_sec/s
>> avgrq-sz  avgqu-sz     await     svctm     %util
>>
>> 11:05:01    vg_data-lv_data    954.37   2417.35  18941.77 22.38
>> 31.10     32.46      0.34     32.70
>> 11:15:01    vg_data-lv_data   1155.79   8716.91  21995.77 26.57
>> 25.15     21.74      0.40     46.70
>> 11:25:01    vg_data-lv_data   1250.62   6478.67  23450.07 23.93
>> 39.77     31.78      0.41     51.34
>> 11:35:01    vg_data-lv_data    842.48   2051.11  17120.92 22.76
>> 15.63     18.53      0.29     24.04
>> 11:45:01    vg_data-lv_data    666.21   1403.32  14174.47 23.38
>> 10.11     15.12      0.24     15.79
>> 11:55:01    vg_data-lv_data    923.51   6763.36  15337.58 23.93
>> 13.07     14.14      0.35     32.63
>> 12:05:01    vg_data-lv_data    989.86   9148.71  16252.59 25.66
>> 19.42     19.56      0.45     44.21
>> 12:15:01    vg_data-lv_data   1369.24   8631.93  24737.60 24.37
>> 35.04     25.54      0.45     61.33
>> 12:25:01    vg_data-lv_data   1776.12   7070.01  39851.34 26.42
>> 74.81     42.05      0.44     77.29
>> 12:35:01    vg_data-lv_data   1529.15   6635.80  85865.14 60.49
>> 54.11     35.34      0.48     72.89
>> 12:45:01    vg_data-lv_data   1187.43   4528.74  40366.95 37.81
>> 36.07     30.36      0.39     45.81
>> 12:55:01    vg_data-lv_data    984.48   3520.06  21539.36 25.45
>> 17.91     18.17      0.31     30.20
>> 13:05:01    vg_data-lv_data    926.54   6304.44  16688.94 24.82
>> 17.36     18.69      0.41     38.05
>> 13:15:01    vg_data-lv_data   1232.46   7199.65  29852.49 30.06
>> 40.17     32.53      0.42     51.60
>> 13:25:01    vg_data-lv_data   1223.46   3945.05  27448.15 25.66
>> 31.07     25.31      0.35     42.65
>> 13:35:01    vg_data-lv_data   1126.91   2811.70  22067.19 22.08
>> 24.33     21.55      0.32     36.00
>> 13:45:01    vg_data-lv_data    833.33   1805.26  17274.43 22.90
>> 24.40     29.25      0.30     25.41
>> 13:55:02    vg_data-lv_data   1085.88   7616.75  19140.67 24.64
>> 17.48     16.06      0.39     42.15
>> 14:05:01    vg_data-lv_data    691.52   3852.50  13125.53 24.55
>> 7.75     11.15      0.30     20.74
>> 14:15:01    vg_data-lv_data   1288.88   5390.41  24171.07 22.94
>> 33.31     25.76      0.36     46.31
>> 14:25:01    vg_data-lv_data   1592.88   3637.77  29836.89 21.02
>> 76.45     47.94      0.40     63.28
>> 14:35:01    vg_data-lv_data   1652.78   9502.87  31587.68 24.86
>> 58.97     35.58      0.44     72.46
>> 14:45:01    vg_data-lv_data   1623.82   6249.52  34148.46 24.88
>> 53.47     32.83      0.40     65.19
>> 14:55:01    vg_data-lv_data   1330.44   6516.11  26828.59 25.06
>> 55.66     41.81      0.42     55.46
>> Average:    vg_data-lv_data   1176.55   5508.02  26324.37 27.06
>> 33.86     28.72      0.39     45.59
>
>
> So the i/o is read heavy, it would be interesting to see why that might be and
> some insight into running queries would go a long way there.
>

>


Scratch that, I've obviously misaligned the output when reading it and I was looking at writes when I thought I was
lookingat reads.  Still would be nice to see what the reads are though. 

Re: Performances issues with SSD volume ?

От
Thomas SIMON
Дата:
Hi everyone,

Coming back to you after my server switch, results are very positive !

For answering your last question Glyn, I figured from where reads was
from, this is wal senders sent to slave, so it is "normal". No more
significative reads when no archieving to slave.

Seems that kernel change combined to raid optimisations have resolved
the problem.
No more crazy load peaks or io bound problem on sdd partition, SSD
performances are great for now, pg badger total query duration decreased
about 35% since I switch servers.

So I guess this topic is solved now, thanks to many contributors who
helped me, and special thanks to Glyn who spent a lot of time to answer
me, and helped me a lot !

Thomas

Le 26/05/2015 17:42, Glyn Astill a écrit :
> ----- Original Message -----
>
>> From: Glyn Astill <glynastill@yahoo.co.uk>
>> To: Thomas SIMON <tsimon@neteven.com>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Tuesday, 26 May 2015, 16:27
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>>   From: Thomas SIMON <tsimon@neteven.com>
>>> To: Glyn Astill <glynastill@yahoo.co.uk>
>>> Cc: "pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>> Sent: Tuesday, 26 May 2015, 15:44
>>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>> I can't do bonnie++ now, beacause I've already prepared server as a
>>> slave, and I have no sufficient disk space available (program says File
>>> size should be double RAM for good results output)
>>>
>>> My current sar output on productions (HDD) server is:
>>> 14:25:01          DEV                    tps  rd_sec/s  wr_sec/s
>>> avgrq-sz  avgqu-sz     await     svctm     %util
>>>
>>> 11:05:01    vg_data-lv_data    954.37   2417.35  18941.77 22.38
>>> 31.10     32.46      0.34     32.70
>>> 11:15:01    vg_data-lv_data   1155.79   8716.91  21995.77 26.57
>>> 25.15     21.74      0.40     46.70
>>> 11:25:01    vg_data-lv_data   1250.62   6478.67  23450.07 23.93
>>> 39.77     31.78      0.41     51.34
>>> 11:35:01    vg_data-lv_data    842.48   2051.11  17120.92 22.76
>>> 15.63     18.53      0.29     24.04
>>> 11:45:01    vg_data-lv_data    666.21   1403.32  14174.47 23.38
>>> 10.11     15.12      0.24     15.79
>>> 11:55:01    vg_data-lv_data    923.51   6763.36  15337.58 23.93
>>> 13.07     14.14      0.35     32.63
>>> 12:05:01    vg_data-lv_data    989.86   9148.71  16252.59 25.66
>>> 19.42     19.56      0.45     44.21
>>> 12:15:01    vg_data-lv_data   1369.24   8631.93  24737.60 24.37
>>> 35.04     25.54      0.45     61.33
>>> 12:25:01    vg_data-lv_data   1776.12   7070.01  39851.34 26.42
>>> 74.81     42.05      0.44     77.29
>>> 12:35:01    vg_data-lv_data   1529.15   6635.80  85865.14 60.49
>>> 54.11     35.34      0.48     72.89
>>> 12:45:01    vg_data-lv_data   1187.43   4528.74  40366.95 37.81
>>> 36.07     30.36      0.39     45.81
>>> 12:55:01    vg_data-lv_data    984.48   3520.06  21539.36 25.45
>>> 17.91     18.17      0.31     30.20
>>> 13:05:01    vg_data-lv_data    926.54   6304.44  16688.94 24.82
>>> 17.36     18.69      0.41     38.05
>>> 13:15:01    vg_data-lv_data   1232.46   7199.65  29852.49 30.06
>>> 40.17     32.53      0.42     51.60
>>> 13:25:01    vg_data-lv_data   1223.46   3945.05  27448.15 25.66
>>> 31.07     25.31      0.35     42.65
>>> 13:35:01    vg_data-lv_data   1126.91   2811.70  22067.19 22.08
>>> 24.33     21.55      0.32     36.00
>>> 13:45:01    vg_data-lv_data    833.33   1805.26  17274.43 22.90
>>> 24.40     29.25      0.30     25.41
>>> 13:55:02    vg_data-lv_data   1085.88   7616.75  19140.67 24.64
>>> 17.48     16.06      0.39     42.15
>>> 14:05:01    vg_data-lv_data    691.52   3852.50  13125.53 24.55
>>> 7.75     11.15      0.30     20.74
>>> 14:15:01    vg_data-lv_data   1288.88   5390.41  24171.07 22.94
>>> 33.31     25.76      0.36     46.31
>>> 14:25:01    vg_data-lv_data   1592.88   3637.77  29836.89 21.02
>>> 76.45     47.94      0.40     63.28
>>> 14:35:01    vg_data-lv_data   1652.78   9502.87  31587.68 24.86
>>> 58.97     35.58      0.44     72.46
>>> 14:45:01    vg_data-lv_data   1623.82   6249.52  34148.46 24.88
>>> 53.47     32.83      0.40     65.19
>>> 14:55:01    vg_data-lv_data   1330.44   6516.11  26828.59 25.06
>>> 55.66     41.81      0.42     55.46
>>> Average:    vg_data-lv_data   1176.55   5508.02  26324.37 27.06
>>> 33.86     28.72      0.39     45.59
>>
>> So the i/o is read heavy, it would be interesting to see why that might be and
>> some insight into running queries would go a long way there.
>>
>
> Scratch that, I've obviously misaligned the output when reading it and I was looking at writes when I thought I was
lookingat reads.  Still would be nice to see what the reads are though. 
>
>