От: Mark Kirkwood
Тема: Raid 10 chunksize
Дата: ,
Msg-id: 49C98433.5080709@paradise.net.nz
(см: обсуждение, исходный текст)
Ответы: Re: Raid 10 chunksize  (Scott Carey)
Re: Raid 10 chunksize  (Scott Marlowe)
Re: Raid 10 chunksize  (Greg Smith)
Re: Raid 10 chunksize  (Stef Telford)
Список: pgsql-performance

Скрыть дерево обсуждения

Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Scott Carey, )
  Re: Raid 10 chunksize  (David Rees, )
   Re: Raid 10 chunksize  (Greg Smith, )
 Re: Raid 10 chunksize  (Scott Marlowe, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Jerry Champlin, )
   Re: Raid 10 chunksize  (Mark Kirkwood, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Greg Smith, )
  Re: Raid 10 chunksize  (Scott Carey, )
   Re: Raid 10 chunksize  (Merlin Moncure, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Merlin Moncure, )
     Re: Raid 10 chunksize  (Scott Carey, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Greg Smith, )
    Re: Raid 10 chunksize  (Mark Kirkwood, )
 Re: Raid 10 chunksize  (Stef Telford, )
  Re: Raid 10 chunksize  (Mark Kirkwood, )
   Re: Raid 10 chunksize  (Scott Carey, )
    Re: Raid 10 chunksize  (Scott Carey, )
     Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (Stef Telford, )
       Re: Raid 10 chunksize  (Greg Smith, )
        Re: Raid 10 chunksize  (Stef Telford, )
         Re: Raid 10 chunksize  (Scott Marlowe, )
          Re: Raid 10 chunksize  (Stef Telford, )
           Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Greg Smith, )
             Re: Raid 10 chunksize  (Matthew Wakeling, )
              Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Scott Carey, )
             Re: Raid 10 chunksize  (, )
             Re: Raid 10 chunksize  (Scott Marlowe, )
           Re: Raid 10 chunksize  (Matthew Wakeling, )
            Re: Raid 10 chunksize  (Scott Marlowe, )
            Re: Raid 10 chunksize  (Stef Telford, )
            Re: Raid 10 chunksize  (Scott Carey, )
          Re: Raid 10 chunksize  (Matthew Wakeling, )
         Re: Raid 10 chunksize  (Scott Carey, )
          Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (James Mansion, )
            Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (Scott Carey, )
            Re: Raid 10 chunksize  (Greg Smith, )
           Re: Raid 10 chunksize  (Ron Mayer, )
            Re: Raid 10 chunksize  (Hannes Dorbath, )
       Re: Raid 10 chunksize  (Stef Telford, )
        Re: Raid 10 chunksize  (Stef Telford, )
         Re: Raid 10 chunksize  (Scott Carey, )
       Re: Raid 10 chunksize  (Mark Kirkwood, )
      Re: Raid 10 chunksize  (, )
       Re: Raid 10 chunksize  (, )
 Re: Raid 10 chunksize  (Greg Smith, )
  Re: Raid 10 chunksize  (, )
   Re: Raid 10 chunksize  (Greg Smith, )
   Re: Raid 10 chunksize  (Scott Carey, )

I'm trying to pin down some performance issues with a machine where I
work, we are seeing (read only) query response times blow out by an
order of magnitude or more at busy times. Initially we blamed
autovacuum, but after a  tweak of the cost_delay it is *not* the
problem. Then I looked at checkpoints... and altho there was some
correlation with them and the query response - I'm thinking that the
raid chunksize may well be the issue.

Fortunately there is an identical DR box, so I could do a little
testing. Details follow:

Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
Debian Lenny
Pg 8.3.6

The disk is laid out using software (md) raid:

4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
2 drives raid 1 with database transaction logs (ext3 ordered, noatime)

The relevant non default .conf params are:

shared_buffers = 2048MB
work_mem = 4MB
maintenance_work_mem = 1024MB
max_fsm_pages = 153600
bgwriter_lru_maxpages = 200
wal_buffers = 2MB
checkpoint_segments = 32
effective_cache_size = 4096MB
autovacuum_vacuum_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 60    # This is high, but seemed to help...

I've run pgbench:

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 655.335102 (including connections establishing)
tps = 655.423232 (excluding connections establishing)


Looking at iostat while it is running shows (note sda-sdd raid10, sde
and sdf raid 1):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    56.80    0.00  579.00     0.00     2.47
8.74   133.76  235.10   1.73 100.00
sdb               0.00    45.60    0.00  583.60     0.00     2.45
8.59    52.65   90.03   1.71 100.00
sdc               0.00    49.00    0.00  579.80     0.00     2.45
8.66    72.56  125.09   1.72 100.00
sdd               0.00    58.40    0.00  565.00     0.00     2.42
8.79   135.31  235.52   1.77 100.00
sde               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    12.80    0.00   23.40     0.00     0.15
12.85     3.04  103.38   4.27  10.00
sdb               0.00    12.80    0.00   22.80     0.00     0.14
12.77     2.31   73.51   3.58   8.16
sdc               0.00    12.80    0.00   21.40     0.00     0.13
12.86     2.38   79.21   3.63   7.76
sdd               0.00    12.80    0.00   21.80     0.00     0.14
12.70     2.66   90.02   3.93   8.56
sde               0.00  2546.80    0.00  146.80     0.00    10.53
146.94     0.97    6.38   5.34  78.40
sdf               0.00  2546.80    0.00  146.60     0.00    10.53
147.05     0.97    6.38   5.53  81.04

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   231.40    0.00  566.80     0.00     3.16
11.41   124.92  228.26   1.76  99.52
sdb               0.00   223.00    0.00  558.00     0.00     3.06
11.23    46.64   83.55   1.70  94.88
sdc               0.00   230.60    0.00  551.60     0.00     3.07
11.40    94.38  171.54   1.76  96.96
sdd               0.00   231.40    0.00  528.60     0.00     2.94
11.37   122.55  220.81   1.83  96.48
sde               0.00  1495.80    0.00   99.00     0.00     6.23
128.86     0.81    8.15   7.76  76.80
sdf               0.00  1495.80    0.00   99.20     0.00     6.26
129.24     0.73    7.40   7.10  70.48

Top looks like:

Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,  0.2%si,
0.0%st
Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
Swap:  2104440k total,      944k used,  2103496k free, 13552720k cached

It looks to me like we are maxing out the raid 10 array, and I suspect
the chunksize (4K) is the culprit. However as this is a pest to change
(!) I'd like some opinions on whether I'm jumping to conclusions. I'd
also appreciate comments about what chunksize to use (I've tended to use
256K in the past, but what are folks preferring these days?)

regards

Mark




В списке pgsql-performance по дате сообщения:

От: Mark Kirkwood
Дата:
Сообщение: Re: Raid 10 chunksize
От: Greg Smith
Дата:
Сообщение: Re: Raid 10 chunksize