Обсуждение: fsync on ext4 does not work
Hi all,
Somewhy fsync does not work for me.
PgSql 9.1.2
Debian, 2.6.32 kernel
WAL filesystem: ext4 with defaults
config:
fsync=on
sync_commit=on
wal_sync_method=fsync
Even though the TPS in pgbench about 700 with 1 client.
I have tried other sync methods (fdatasync, open_sync), but all are similar.
Should I disable write cache on HDD to make it work?
Have you any idea why?
Thanks,
Otto
Somewhy fsync does not work for me.
PgSql 9.1.2
Debian, 2.6.32 kernel
WAL filesystem: ext4 with defaults
config:
fsync=on
sync_commit=on
wal_sync_method=fsync
Even though the TPS in pgbench about 700 with 1 client.
I have tried other sync methods (fdatasync, open_sync), but all are similar.
Should I disable write cache on HDD to make it work?
Have you any idea why?
Thanks,
Otto
On 19 Prosinec 2011, 16:52, Havasvölgyi Ottó wrote: > config: > fsync=on > sync_commit=on > wal_sync_method=fsync I don't think you need to set wal_sync_method, comment it out. > Even though the TPS in pgbench about 700 with 1 client. > I have tried other sync methods (fdatasync, open_sync), but all are > similar. > Should I disable write cache on HDD to make it work? Yes, disable that. > Have you any idea why? What scale factor have you used with pgbench? And how long are the pgbench runs? The smaller the data set, the more it will be affected by the write cache. Tomas
* Havasvölgyi Ottó: > Even though the TPS in pgbench about 700 with 1 client. > I have tried other sync methods (fdatasync, open_sync), but all are similar. > Should I disable write cache on HDD to make it work? Did you mount your ext4 file system with the nobarrier option? By default, ext4 is supposed to cope properly with hard disk caches, unless the drive is lying about completing writes (but in that case, disabling write caching is probably not going to help much with reliability, either). -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
2011/12/19 Tomas Vondra <tv@fuzzy.cz>
Scale factor was 1, client count 1, and ran it for 100 seconds. I just wanted to check that the commit rate does not go beyond 120 (7200 rpm HDD).
On 19 Prosinec 2011, 16:52, Havasvölgyi Ottó wrote:
> config:
> fsync=on
> sync_commit=on
> wal_sync_method=fsync
I don't think you need to set wal_sync_method, comment it out.
Yes, disable that.
> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are
> similar.
> Should I disable write cache on HDD to make it work?What scale factor have you used with pgbench? And how long are the pgbench
> Have you any idea why?
runs? The smaller the data set, the more it will be affected by the write
cache.
Scale factor was 1, client count 1, and ran it for 100 seconds. I just wanted to check that the commit rate does not go beyond 120 (7200 rpm HDD).
Tomas
2011/12/19 Florian Weimer <fweimer@bfk.de>
It is mounted with defaults, no other option yet, so it should flush.
These HDDs are 7200 rpm SATA with some low level software RAID1.
I cannot understand why disabling HDD write cache does not help either. Could you explain please?
There is also an InnoDB transaction log on this partition, but its commit time is quite longer. On the same workload PgSql's commit is about 1 ms, but InnoDB's is about 4-7 ms. I think 4-7 is also too short to flush something to such disk, am I right? Or perhaps does it do something different? It is set to fsync synchronously. Also a difference that as I increase concurrency, InnoDb's avg. commit time is going up quite quickly, however PgSql's one rather slowly. I wonder if this is because InnoDb really flushes to disk, or just because PostgreSQL is better :).
Best regards,
Otto
* Havasvölgyi Ottó:Did you mount your ext4 file system with the nobarrier option?
> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are similar.
> Should I disable write cache on HDD to make it work?
By default, ext4 is supposed to cope properly with hard disk caches,
unless the drive is lying about completing writes (but in that case,
disabling write caching is probably not going to help much with
reliability, either).
It is mounted with defaults, no other option yet, so it should flush.
These HDDs are 7200 rpm SATA with some low level software RAID1.
I cannot understand why disabling HDD write cache does not help either. Could you explain please?
There is also an InnoDB transaction log on this partition, but its commit time is quite longer. On the same workload PgSql's commit is about 1 ms, but InnoDB's is about 4-7 ms. I think 4-7 is also too short to flush something to such disk, am I right? Or perhaps does it do something different? It is set to fsync synchronously. Also a difference that as I increase concurrency, InnoDb's avg. commit time is going up quite quickly, however PgSql's one rather slowly. I wonder if this is because InnoDb really flushes to disk, or just because PostgreSQL is better :).
Best regards,
Otto
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
* Havasvölgyi Ottó: > 2011/12/19 Florian Weimer <fweimer@bfk.de> > >> * Havasvölgyi Ottó: >> >> > Even though the TPS in pgbench about 700 with 1 client. >> > I have tried other sync methods (fdatasync, open_sync), but all are >> similar. >> > Should I disable write cache on HDD to make it work? >> >> Did you mount your ext4 file system with the nobarrier option? >> >> By default, ext4 is supposed to cope properly with hard disk caches, >> unless the drive is lying about completing writes (but in that case, >> disabling write caching is probably not going to help much with >> reliability, either). >> > > It is mounted with defaults, no other option yet, so it should flush. > These HDDs are 7200 rpm SATA with some low level software RAID1. > I cannot understand why disabling HDD write cache does not help either. > Could you explain please? The drive appears to be fundamentally broken. Disabling the cache won't change that. But you mention software RAID1---perhaps your version of the RAID code doesn't pass down the barriers to the disk? > There is also an InnoDB transaction log on this partition, but its commit > time is quite longer. On the same workload PgSql's commit is about 1 ms, > but InnoDB's is about 4-7 ms. I think 4-7 is also too short to flush > something to such disk, am I right? Yes, it's still too low, unless multiple commits are grouped together. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 12/19/2011 10:52 AM, Havasvölgyi Ottó wrote: > PgSql 9.1.2 > Debian, 2.6.32 kernel > WAL filesystem: ext4 with defaults There's a pg_test_fsync program included with the postgresql-contrib package that might help you sort out what's going on here. This will eliminate the possibility that you're doing something wrong with pgbench, and give an easy to interpret number relative to the drive RPM rate. You said default settings, which eliminated "nobarrier" as a cause here. The only other thing I know of that can screw up fsync here is using one of the incompatible LVM features to build your filesystem. I don't know which currently work and don't work, but last I checked there were a few ways you could set LVM up that would eliminate filesystem barriers from working properly. You might check: dmesg | grep barrier To see if you have any kernel messages related to this. Here's a pg_test_fsync example from a Debian system on 2.6.32 with ext4 filesystem and 7200 RPM drive, default mount parameters and no LVM: $ ./pg_test_fsync 2000 operations per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync n/a fdatasync 113.901 ops/sec fsync 28.794 ops/sec fsync_writethrough n/a open_sync 111.726 ops/sec Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync n/a fdatasync 112.637 ops/sec fsync 28.641 ops/sec fsync_writethrough n/a open_sync 55.546 ops/sec Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16kB in different write open_sync sizes.) 16kB open_sync write 111.909 ops/sec 8kB open_sync writes 55.278 ops/sec 4kB open_sync writes 28.026 ops/sec 2kB open_sync writes 14.002 ops/sec 1kB open_sync writes 7.011 ops/sec Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close 28.836 ops/sec write, close, fsync 28.890 ops/sec Non-Sync'ed 8kB writes: write 112113.908 ops/sec -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Thank you guys for the ideas and suggestions, I will check them.
Best regards,
Otto
Best regards,
Otto
I have run fsync_test on this partition, and I got 2500+ for all kind of sync method.
dmesg says:
blkfront: xvde: barriers enabled
blkfront: xvda: barriers enabled
One thing I haven't mentioned yet, that this a VM virtualized with Xen. Perhaps this has some effect.
Thanks,
Otto
dmesg says:
blkfront: xvde: barriers enabled
blkfront: xvda: barriers enabled
One thing I haven't mentioned yet, that this a VM virtualized with Xen. Perhaps this has some effect.
Thanks,
Otto
2011/12/20 Greg Smith <greg@2ndquadrant.com>
On 12/19/2011 10:52 AM, Havasvölgyi Ottó wrote:There's a pg_test_fsync program included with the postgresql-contrib package that might help you sort out what's going on here. This will eliminate the possibility that you're doing something wrong with pgbench, and give an easy to interpret number relative to the drive RPM rate.PgSql 9.1.2
Debian, 2.6.32 kernel
WAL filesystem: ext4 with defaults
You said default settings, which eliminated "nobarrier" as a cause here. The only other thing I know of that can screw up fsync here is using one of the incompatible LVM features to build your filesystem. I don't know which currently work and don't work, but last I checked there were a few ways you could set LVM up that would eliminate filesystem barriers from working properly. You might check:
dmesg | grep barrier
To see if you have any kernel messages related to this.
Here's a pg_test_fsync example from a Debian system on 2.6.32 with ext4 filesystem and 7200 RPM drive, default mount parameters and no LVM:
$ ./pg_test_fsync
2000 operations per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 113.901 ops/sec
fsync 28.794 ops/sec
fsync_writethrough n/a
open_sync 111.726 ops/sec
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 112.637 ops/sec
fsync 28.641 ops/sec
fsync_writethrough n/a
open_sync 55.546 ops/sec
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
16kB open_sync write 111.909 ops/sec
8kB open_sync writes 55.278 ops/sec
4kB open_sync writes 28.026 ops/sec
2kB open_sync writes 14.002 ops/sec
1kB open_sync writes 7.011 ops/sec
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 28.836 ops/sec
write, close, fsync 28.890 ops/sec
Non-Sync'ed 8kB writes:
write 112113.908 ops/sec
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general