Обсуждение: Using pgiosim realistically

Поиск
Список
Период
Сортировка

Using pgiosim realistically

От
John Rouillard
Дата:
Hi all:

I am adding pgiosim to our testing for new database hardware and I am
seeing something I don't quite get and I think it's because I am using
pgiosim incorrectly.

Specs:

  OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
  memory: 96GB
  cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
  disks: WD2003FYYS RE4
  raid: lsi - 9260-4i with 8 disks in raid 10 configuration
              1MB stripe size
              raid cache enabled w/ bbu
              disk caches disabled
  filesystem: ext3 created with -E stride=256

I am seeing really poor (70) iops with pgiosim.  According to:
http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
database benchmark they are seeing ~170 iops on a single disk for
these drives. I would expect an 8 disk raid 10 should get better then
3x the single disk rate (assuming the data is randomly distributed).

To test I am using 5 100GB files with

    sudo ~/pgiosim -c -b 100G -v file?

I am using 100G sizes to make sure that the data read and files sizes
exceed the memory size of the system.

However if I use 5 1GB files (and still 100GB read data) I see 200+ to
400+ iops at 50% of the 100GB of data read, which I assume means that
the data is cached in the OS cache and I am not really getting hard
drive/raid I/O measurement of iops.

However, IIUC postgres will never have an index file greater than 1GB
in size
(http://www.postgresql.org/docs/8.4/static/storage-file-layout.html)
and will just add 1GB segments, so the 1GB size files seems to be more
realistic.

So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to
feed pgiosim? That way I will have enough data that not all of it can
be cached in memory and the file sizes (and file operations:
open/close) more closely match what postgres is doing with index
files?

Also in the output of pgiosim I see:

  25.17%,   2881 read,      0 written, 2304.56kB/sec  288.07 iops

which I interpret (left to right) as the % of the 100GB that has been
read, the number of read operations over some time period, number of
bytes read/written and the io operations/sec. Iops always seems to be
1/10th of the read number (rounded up to an integer). Is this
expected and if so anybody know why?

While this is running if I also run "iostat -p /dev/sdc 5" I see:

  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc             166.40      2652.80         4.80      13264         24
  sdc1           2818.80         1.20       999.20          6       4996

which I am interpreting as 2818 read/io operations (corresponding more
or less to read in the pgiosim output) to the partition and of those
only 116 are actually going to the drive??? with the rest handled from
OS cache.

However the tps isn't increasing when I see pgiosim reporting:

   48.47%,   4610 read,      0 written, 3687.62kB/sec  460.95 iops

an iostat 5 output near the same time is reporting:

  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc             165.87      2647.50         4.79      13264         24
  sdc1           2812.97         0.60       995.41          3       4987

so I am not sure if there is a correlation between the read and tps
settings.

Also I am assuming blks written is filesystem metadata although that
seems like a lot of data

If I stop the pgiosim, the iostat drops to 0 write and reads as
expected.

So does anybody have any comments on how to test with pgiosim and how
to correlate the iostat and pgiosim outputs?

Thanks for your feedback.
--
                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

Re: Using pgiosim realistically

От
"ktm@rice.edu"
Дата:
On Fri, May 13, 2011 at 09:09:41PM +0000, John Rouillard wrote:
> Hi all:
>
> I am adding pgiosim to our testing for new database hardware and I am
> seeing something I don't quite get and I think it's because I am using
> pgiosim incorrectly.
>
> Specs:
>
>   OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
>   memory: 96GB
>   cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
>   disks: WD2003FYYS RE4
>   raid: lsi - 9260-4i with 8 disks in raid 10 configuration
>               1MB stripe size
>               raid cache enabled w/ bbu
>               disk caches disabled
>   filesystem: ext3 created with -E stride=256
>
> I am seeing really poor (70) iops with pgiosim.  According to:
> http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
> database benchmark they are seeing ~170 iops on a single disk for
> these drives. I would expect an 8 disk raid 10 should get better then
> 3x the single disk rate (assuming the data is randomly distributed).
>
> To test I am using 5 100GB files with
>
>     sudo ~/pgiosim -c -b 100G -v file?
>
> I am using 100G sizes to make sure that the data read and files sizes
> exceed the memory size of the system.
>
> However if I use 5 1GB files (and still 100GB read data) I see 200+ to
> 400+ iops at 50% of the 100GB of data read, which I assume means that
> the data is cached in the OS cache and I am not really getting hard
> drive/raid I/O measurement of iops.
>
> However, IIUC postgres will never have an index file greater than 1GB
> in size
> (http://www.postgresql.org/docs/8.4/static/storage-file-layout.html)
> and will just add 1GB segments, so the 1GB size files seems to be more
> realistic.
>
> So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to
> feed pgiosim? That way I will have enough data that not all of it can
> be cached in memory and the file sizes (and file operations:
> open/close) more closely match what postgres is doing with index
> files?
>
> Also in the output of pgiosim I see:
>
>   25.17%,   2881 read,      0 written, 2304.56kB/sec  288.07 iops
>
> which I interpret (left to right) as the % of the 100GB that has been
> read, the number of read operations over some time period, number of
> bytes read/written and the io operations/sec. Iops always seems to be
> 1/10th of the read number (rounded up to an integer). Is this
> expected and if so anybody know why?
>
> While this is running if I also run "iostat -p /dev/sdc 5" I see:
>
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   sdc             166.40      2652.80         4.80      13264         24
>   sdc1           2818.80         1.20       999.20          6       4996
>
> which I am interpreting as 2818 read/io operations (corresponding more
> or less to read in the pgiosim output) to the partition and of those
> only 116 are actually going to the drive??? with the rest handled from
> OS cache.
>
> However the tps isn't increasing when I see pgiosim reporting:
>
>    48.47%,   4610 read,      0 written, 3687.62kB/sec  460.95 iops
>
> an iostat 5 output near the same time is reporting:
>
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   sdc             165.87      2647.50         4.79      13264         24
>   sdc1           2812.97         0.60       995.41          3       4987
>
> so I am not sure if there is a correlation between the read and tps
> settings.
>
> Also I am assuming blks written is filesystem metadata although that
> seems like a lot of data
>
> If I stop the pgiosim, the iostat drops to 0 write and reads as
> expected.
>
> So does anybody have any comments on how to test with pgiosim and how
> to correlate the iostat and pgiosim outputs?
>
> Thanks for your feedback.
> --
>                 -- rouilj
>
> John Rouillard       System Administrator
> Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111
>

Hi John,

Those drives are 7200 rpm drives which would give you a maximum write
rate of 120/sec at best with the cache disabled. I actually think your
70/sec is closer to reality and what you should anticipate in real use.
I do not see how they could make 170/sec. Did they strap a jet engine to
the drive. :)

Regards,
Ken

Re: Using pgiosim realistically

От
John Rouillard
Дата:
On Sat, May 14, 2011 at 12:07:02PM -0500, ktm@rice.edu wrote:
> On Fri, May 13, 2011 at 09:09:41PM +0000, John Rouillard wrote:
> > I am adding pgiosim to our testing for new database hardware and I am
> > seeing something I don't quite get and I think it's because I am using
> > pgiosim incorrectly.
> >
> > Specs:
> >
> >   OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
> >   memory: 96GB
> >   cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
> >   disks: WD2003FYYS RE4
> >   raid: lsi - 9260-4i with 8 disks in raid 10 configuration
> >               1MB stripe size
> >               raid cache enabled w/ bbu
> >               disk caches disabled
> >   filesystem: ext3 created with -E stride=256
> >
> > I am seeing really poor (70) iops with pgiosim.  According to:
> > http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
> > database benchmark they are seeing ~170 iops on a single disk for
> > these drives. I would expect an 8 disk raid 10 should get better then
> > 3x the single disk rate (assuming the data is randomly distributed).
> Those drives are 7200 rpm drives which would give you a maximum write
> rate of 120/sec at best with the cache disabled. I actually think your
> 70/sec is closer to reality and what you should anticipate in real use.
> I do not see how they could make 170/sec. Did they strap a jet engine to
> the drive. :)

Hmm, I stated the disk cache was disabled. I should have said the disk
write cache, but it's possible the readhead cache is disabled as well
(not quite sure how to tell on the lsi cards). Also there isn't a lot
of detail in what the database test mix is and I haven't tried
researching the site to see if the spec the exact test. If it included
a lot of writes and they were being handled by a cache then that could
explain it.

However, in my case I have an 8 disk raid 10 with a read only load (in
this testing configuration). Shouldn't I expect more iops than a
single disk can provide? Maybe pgiosim is hitting some other boundary
than just i/o?

Also it turns out that pgiosim can only handle 64 files. I haven't
checked to see if this is a compile time changable item or not.

--
                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

Re: Using pgiosim realistically

От
Jeff
Дата:
On May 16, 2011, at 9:17 AM, John Rouillard wrote:

>>>
>>> I am seeing really poor (70) iops with pgiosim.  According to:
>>> http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
>>> database benchmark they are seeing ~170 iops on a single disk for
>>> these drives. I would expect an 8 disk raid 10 should get better
>>> then
>>> 3x the single disk rate (assuming the data is randomly distributed).
>> Those drives are 7200 rpm drives which would give you a maximum write
>> rate of 120/sec at best with the cache disabled. I actually think
>> your
>> 70/sec is closer to reality and what you should anticipate in real
>> use.
>> I do not see how they could make 170/sec. Did they strap a jet
>> engine to
>> the drive. :)
>

also you are reading with a worst case scenario for the mechanical
disk - randomly seeking around everywhere, which will lower
performance drastically.

> Hmm, I stated the disk cache was disabled. I should have said the disk
> write cache, but it's possible the readhead cache is disabled as well
> (not quite sure how to tell on the lsi cards). Also there isn't a lot
> of detail in what the database test mix is and I haven't tried
> researching the site to see if the spec the exact test. If it included
> a lot of writes and they were being handled by a cache then that could
> explain it.
>

you'll get some extra from the os readahead and the drive's potential
own readahead.


> However, in my case I have an 8 disk raid 10 with a read only load (in
> this testing configuration). Shouldn't I expect more iops than a
> single disk can provide? Maybe pgiosim is hitting some other boundary
> than just i/o?
>

given your command line you are only running a single thread - use the
-t argument to add more threads and that'll increase concurrency.  a
single process can only process so much at once and with multiple
threads requesting different things the drive will actually be able to
respond faster since it will have more work to do.
I tend to test various levels - usually a single (-t 1 - the default)
to get a base line, then -t (drives / 2), -t (#drives) up to probably
4x drives (you'll see iops level off).

> Also it turns out that pgiosim can only handle 64 files. I haven't
> checked to see if this is a compile time changable item or not.
>

that is a #define in pgiosim.c

also, are you running the latest pgiosim from pgfoundry?

the -w param to pgiosim has it rewrite blocks out as it runs. (it is a
percentage).

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Using pgiosim realistically

От
John Rouillard
Дата:
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote:
> On May 16, 2011, at 9:17 AM, John Rouillard wrote:
> >However, in my case I have an 8 disk raid 10 with a read only load (in
> >this testing configuration). Shouldn't I expect more iops than a
> >single disk can provide? Maybe pgiosim is hitting some other boundary
> >than just i/o?
> >
>
> given your command line you are only running a single thread - use
> the -t argument to add more threads and that'll increase
> concurrency.  a single process can only process so much at once and
> with multiple threads requesting different things the drive will
> actually be able to respond faster since it will have more work to
> do.
> I tend to test various levels - usually a single (-t 1 - the
> default) to get a base line, then -t (drives / 2), -t (#drives) up
> to probably 4x drives (you'll see iops level off).

Ok cool. I'll try that.

> >Also it turns out that pgiosim can only handle 64 files. I haven't
> >checked to see if this is a compile time changable item or not.
>
> that is a #define in pgiosim.c

So which is a better test, modifying the #define to allow specifying
200-300 1GB files, or using 64 files but increasing the size of my
files to 2-3GB for a total bytes in the file two or three times the
memory in my server (96GB)?

> also, are you running the latest pgiosim from pgfoundry?

yup version 0.5 from the foundry.

> the -w param to pgiosim has it rewrite blocks out as it runs. (it is
> a percentage).

Yup, I was running with that and getting low enough numbers, that I
switched to pure read tests. It looks like I just need multiple
threads so I can have multiple reads/writes in flight at the same
time.

--
                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

Re: Using pgiosim realistically

От
Jeff
Дата:
On May 16, 2011, at 1:06 PM, John Rouillard wrote:

>> that is a #define in pgiosim.c
>
> So which is a better test, modifying the #define to allow specifying
> 200-300 1GB files, or using 64 files but increasing the size of my
> files to 2-3GB for a total bytes in the file two or three times the
> memory in my server (96GB)?
>

I tend to make 10G chunks with dd and run pgiosim over that.
dd if=/dev/zero of=bigfile bs=1M count=10240

>> the -w param to pgiosim has it rewrite blocks out as it runs. (it is
>> a percentage).
>
> Yup, I was running with that and getting low enough numbers, that I
> switched to pure read tests. It looks like I just need multiple
> threads so I can have multiple reads/writes in flight at the same
> time.
>

Yep - you need multiple threads to get max throughput of your io.

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Using pgiosim realistically

От
John Rouillard
Дата:
On Mon, May 16, 2011 at 01:54:06PM -0400, Jeff wrote:
> Yep - you need multiple threads to get max throughput of your io.

I am running:

   ~/pgiosim -c -b 100G -v -t4 file[0-9]*

Will each thread move 100GB of data? I am seeing:

  158.69%,   4260 read,      0 written, 3407.64kB/sec  425.95 iops

Maybe the completion target percentage is off because of the threads?

--
                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111