Обсуждение: fsync on ext4 does not work

Поиск
Список
Период
Сортировка

fsync on ext4 does not work

От
Havasvölgyi Ottó
Дата:
Hi all,

Somewhy fsync does not work for me.

PgSql 9.1.2

Debian, 2.6.32 kernel

WAL filesystem: ext4 with defaults

config:
fsync=on
sync_commit=on
wal_sync_method=fsync


Even though the TPS in pgbench about 700 with 1 client.
I have tried other sync methods (fdatasync, open_sync), but all are similar.
Should I disable write cache on HDD to make it work?

Have you any idea why?

Thanks,
Otto

Re: fsync on ext4 does not work

От
"Tomas Vondra"
Дата:
On 19 Prosinec 2011, 16:52, Havasvölgyi Ottó wrote:
> config:
> fsync=on
> sync_commit=on
> wal_sync_method=fsync

I don't think you need to set wal_sync_method, comment it out.

> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are
> similar.
> Should I disable write cache on HDD to make it work?

Yes, disable that.

> Have you any idea why?

What scale factor have you used with pgbench? And how long are the pgbench
runs? The smaller the data set, the more it will be affected by the write
cache.

Tomas


Re: fsync on ext4 does not work

От
Florian Weimer
Дата:
* Havasvölgyi Ottó:

> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are similar.
> Should I disable write cache on HDD to make it work?

Did you mount your ext4 file system with the nobarrier option?

By default, ext4 is supposed to cope properly with hard disk caches,
unless the drive is lying about completing writes (but in that case,
disabling write caching is probably not going to help much with
reliability, either).

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: fsync on ext4 does not work

От
Havasvölgyi Ottó
Дата:


2011/12/19 Tomas Vondra <tv@fuzzy.cz>
On 19 Prosinec 2011, 16:52, Havasvölgyi Ottó wrote:
> config:
> fsync=on
> sync_commit=on
> wal_sync_method=fsync

I don't think you need to set wal_sync_method, comment it out.

> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are
> similar.
> Should I disable write cache on HDD to make it work?

Yes, disable that.

> Have you any idea why?

What scale factor have you used with pgbench? And how long are the pgbench
runs? The smaller the data set, the more it will be affected by the write
cache.

Scale factor was 1, client count 1, and ran it for 100 seconds. I just wanted to check that the commit rate does not go beyond 120 (7200 rpm HDD).


Tomas


Re: fsync on ext4 does not work

От
Havasvölgyi Ottó
Дата:


2011/12/19 Florian Weimer <fweimer@bfk.de>
* Havasvölgyi Ottó:

> Even though the TPS in pgbench about 700 with 1 client.
> I have tried other sync methods (fdatasync, open_sync), but all are similar.
> Should I disable write cache on HDD to make it work?

Did you mount your ext4 file system with the nobarrier option?

By default, ext4 is supposed to cope properly with hard disk caches,
unless the drive is lying about completing writes (but in that case,
disabling write caching is probably not going to help much with
reliability, either).

It is mounted with defaults, no other option yet, so it should flush.
These HDDs are 7200 rpm SATA with some low level software RAID1.
I cannot understand why disabling HDD write cache does not help either. Could you explain please?

There is also an InnoDB transaction log on this partition, but its commit time is quite longer. On the same workload PgSql's commit is about 1 ms, but InnoDB's is about 4-7 ms. I think 4-7 is also too short to flush something to such disk, am I right? Or perhaps does it do something different? It is set to fsync synchronously. Also a difference that as I increase concurrency, InnoDb's avg. commit time is going up quite quickly, however PgSql's one rather slowly. I wonder if this is because InnoDb really flushes to disk, or just because PostgreSQL is better :).

Best regards,
Otto


--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: fsync on ext4 does not work

От
Florian Weimer
Дата:
* Havasvölgyi Ottó:

> 2011/12/19 Florian Weimer <fweimer@bfk.de>
>
>> * Havasvölgyi Ottó:
>>
>> > Even though the TPS in pgbench about 700 with 1 client.
>> > I have tried other sync methods (fdatasync, open_sync), but all are
>> similar.
>> > Should I disable write cache on HDD to make it work?
>>
>> Did you mount your ext4 file system with the nobarrier option?
>>
>> By default, ext4 is supposed to cope properly with hard disk caches,
>> unless the drive is lying about completing writes (but in that case,
>> disabling write caching is probably not going to help much with
>> reliability, either).
>>
>
> It is mounted with defaults, no other option yet, so it should flush.
> These HDDs are 7200 rpm SATA with some low level software RAID1.
> I cannot understand why disabling HDD write cache does not help either.
> Could you explain please?

The drive appears to be fundamentally broken.  Disabling the cache won't
change that.

But you mention software RAID1---perhaps your version of the RAID code
doesn't pass down the barriers to the disk?

> There is also an InnoDB transaction log on this partition, but its commit
> time is quite longer. On the same workload PgSql's commit is about 1 ms,
> but InnoDB's is about 4-7 ms. I think 4-7 is also too short to flush
> something to such disk, am I right?

Yes, it's still too low, unless multiple commits are grouped together.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: fsync on ext4 does not work

От
Greg Smith
Дата:
On 12/19/2011 10:52 AM, Havasvölgyi Ottó wrote:
> PgSql 9.1.2
> Debian, 2.6.32 kernel
> WAL filesystem: ext4 with defaults

There's a pg_test_fsync program included with the postgresql-contrib
package that might help you sort out what's going on here.  This will
eliminate the possibility that you're doing something wrong with
pgbench, and give an easy to interpret number relative to the drive RPM
rate.

You said default settings, which eliminated "nobarrier" as a cause
here.  The only other thing I know of that can screw up fsync here is
using one of the incompatible LVM features to build your filesystem.  I
don't know which currently work and don't work, but last I checked there
were a few ways you could set LVM up that would eliminate filesystem
barriers from working properly.  You might check:

dmesg | grep barrier

To see if you have any kernel messages related to this.

Here's a pg_test_fsync example from a Debian system on 2.6.32 with ext4
filesystem and 7200 RPM drive, default mount parameters and no LVM:

$ ./pg_test_fsync
2000 operations per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
         open_datasync                                 n/a
         fdatasync                         113.901 ops/sec
         fsync                              28.794 ops/sec
         fsync_writethrough                            n/a
         open_sync                         111.726 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
         open_datasync                                 n/a
         fdatasync                         112.637 ops/sec
         fsync                              28.641 ops/sec
         fsync_writethrough                            n/a
         open_sync                          55.546 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         16kB open_sync write              111.909 ops/sec
          8kB open_sync writes              55.278 ops/sec
          4kB open_sync writes              28.026 ops/sec
          2kB open_sync writes              14.002 ops/sec
          1kB open_sync writes               7.011 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
         write, fsync, close                28.836 ops/sec
         write, close, fsync                28.890 ops/sec

Non-Sync'ed 8kB writes:
         write                           112113.908 ops/sec

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: fsync on ext4 does not work

От
Havasvölgyi Ottó
Дата:
Thank you guys for the ideas and suggestions, I will check them.

Best regards,
Otto

Re: fsync on ext4 does not work

От
Havasvölgyi Ottó
Дата:
I have run fsync_test on this partition, and I got 2500+ for all kind of sync method.

dmesg says:
blkfront: xvde: barriers enabled
blkfront: xvda: barriers enabled

One thing I haven't mentioned yet, that this a VM virtualized with Xen. Perhaps this has some effect.

Thanks,
Otto



2011/12/20 Greg Smith <greg@2ndquadrant.com>
On 12/19/2011 10:52 AM, Havasvölgyi Ottó wrote:
PgSql 9.1.2
Debian, 2.6.32 kernel
WAL filesystem: ext4 with defaults

There's a pg_test_fsync program included with the postgresql-contrib package that might help you sort out what's going on here.  This will eliminate the possibility that you're doing something wrong with pgbench, and give an easy to interpret number relative to the drive RPM rate.

You said default settings, which eliminated "nobarrier" as a cause here.  The only other thing I know of that can screw up fsync here is using one of the incompatible LVM features to build your filesystem.  I don't know which currently work and don't work, but last I checked there were a few ways you could set LVM up that would eliminate filesystem barriers from working properly.  You might check:

dmesg | grep barrier

To see if you have any kernel messages related to this.

Here's a pg_test_fsync example from a Debian system on 2.6.32 with ext4 filesystem and 7200 RPM drive, default mount parameters and no LVM:

$ ./pg_test_fsync
2000 operations per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
       open_datasync                                 n/a
       fdatasync                         113.901 ops/sec
       fsync                              28.794 ops/sec
       fsync_writethrough                            n/a
       open_sync                         111.726 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
       open_datasync                                 n/a
       fdatasync                         112.637 ops/sec
       fsync                              28.641 ops/sec
       fsync_writethrough                            n/a
       open_sync                          55.546 ops/sec

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
       16kB open_sync write              111.909 ops/sec
        8kB open_sync writes              55.278 ops/sec
        4kB open_sync writes              28.026 ops/sec
        2kB open_sync writes              14.002 ops/sec
        1kB open_sync writes               7.011 ops/sec

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
       write, fsync, close                28.836 ops/sec
       write, close, fsync                28.890 ops/sec

Non-Sync'ed 8kB writes:
       write                           112113.908 ops/sec

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general