Re: patch to allow disable of WAL recycling

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: patch to allow disable of WAL recycling
Дата
Msg-id CAEepm=2QXmF9xDmGDyMtoEeTEH6=jcf=b8--yLzdeVzBfVLVuA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: patch to allow disable of WAL recycling  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: patch to allow disable of WAL recycling
Список pgsql-hackers
On Mon, Aug 27, 2018 at 10:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> zfs (Linux)
> -----------
> On scale 200, there's pretty much no difference.

Speculation: It could be that the dnode and/or indirect blocks that point to data blocks are falling out of memory in my test setup[1] but not in yours.  I don't know, but I guess those blocks compete with regular data blocks in the ARC?  If so it might come down to ARC size and the amount of other data churning through it.

Further speculation:  Other filesystems have equivalent data structures, but for example XFS jams that data into the inode itself in a compact "extent list" format[2] if it can, to avoid the need for an external btree.  Hmm, I wonder if that format tends to be used for our segment files.  Since cached inodes are reclaimed in a different way than cached data pages, I wonder if that makes them more sticky in the face of high data churn rates (or I guess less, depending on your Linux vfs_cache_pressure setting and number of active files).  I suppose the combination of those two things, sticky inodes with internalised extent lists, might make it more likely that we can overwrite an old file without having to fault anything in.

One big difference between your test rig and mine is that your Optane 900P claims to do about half a million random IOPS.  That is about half a million more IOPS than my spinning disks.  (Actually I used my 5400RPM steam powered machine deliberately for that test: I disabled fsync so that commit rate wouldn't be slowed down but cache misses would be obvious.  I guess Joyent's storage is somewhere between these two extremes...)

> On scale 2000, the
> throughput actually decreased a bit, by about 5% - from the chart it
> seems disabling the WAL reuse somewhat amplifies impact of checkpoints,
> for some reason.

Huh.

> I have no idea what happened at the largest scale (8000) - on master
> there's a huge drop after ~120 minutes, which somewhat recovers at ~220
> minutes (but not fully). Without WAL reuse there's no such drop,
> although there seems to be some degradation after ~220 minutes (i.e. at
> about the same time the master partially recovers. I'm not sure what to
> think about this, I wonder if it might be caused by almost filling the
> disk space, or something like that. I'm rerunning this with scale 600.

There are lots of reports of ZFS performance degrading when free space gets below something like 20%.

[1] https://www.postgresql.org/message-id/CAEepm%3D2pypg3nGgBDYyG0wuCH%2BxTWsAJddvJUGBNsDiyMhcXaQ%40mail.gmail.com
[2] http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Data_Extents.html

--
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Adding a note to protocol.sgml regarding CopyData
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: simplehash.h comment