Re: WAL Re-Writes

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: WAL Re-Writes
Дата
Msg-id 20160208144639.r425hm2gbbi7w7gi@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: WAL Re-Writes  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: WAL Re-Writes  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> I think deciding it automatically without user require to configure it,
> certainly has merits, but what about some cases where user can get
> benefits by configuring themselves like the cases where we use
> PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> buffers and won't cause misaligned writes even for smaller chunk sizes
> like 512 bytes or so).  Some googling [1] reveals that other databases
> also provides user with option to configure wal block/chunk size (as
> BLOCKSIZE), although they seem to decide chunk size based on
> disk-sector size.

FWIW, you usually can't do that small writes with O_DIRECT. Usually it
has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
filesystems that do support doing such writes, they essentially fall
back to doing buffered IO.

> An additional thought, which is not necessarily related to this patch is,
> if user chooses and or we decide to write in 512 bytes sized chunks,
> which is usually a disk sector size, then can't we think of avoiding
> CRC for each record for such cases, because each WAL write in
> it-self will be atomic.  While reading, if we process in wal-chunk-sized
> units, then I think it should be possible to detect end-of-wal based
> on data read.

O_DIRECT doesn't give any useful guarantees to do something like the
above. It doesn't have any ordering or durability implications. You
still need to do fdatasyncs and such.

Besides, with the new CRC implications, that doesn't really seem like
such a large win anyway.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: WIP: Make timestamptz_out less slow.
Следующее
От: Andres Freund
Дата:
Сообщение: Re: checkpointer continuous flushing - V16