Re: Write lifetime hints for NVMe

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Write lifetime hints for NVMe
Дата
Msg-id 30965a3e-5bde-4f70-dc06-1ff297abca4c@2ndquadrant.com
обсуждение исходный текст
Ответ на Write lifetime hints for NVMe  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: Write lifetime hints for NVMe
Список pgsql-hackers

On 01/27/2018 02:20 PM, Dmitry Dolgov wrote:
> Hi,
> 
> From what I see some time ago the write lifetime hints support for NVMe multi
> streaming was merged into Linux kernel [1]. Theoretically it allows data
> written together on media so they can be erased together, which minimizes
> garbage collection, resulting in reduced write amplification as well as
> efficient flash utilization [2]. I couldn't find any discussion about that on
> hackers, so I decided to experiment with this feature a bit. My idea was to
> test quite naive approach when all file descriptors, that are related to
> temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them
> `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any
> infrastructure around to enable/disable hints.
> 
> It turns out that it's possible to perform benchmarks on some EC2 instance
> types (e.g. c5) with the corresponding version of the kernel, since they expose
> a volume as nvme device:
> 
> ```
> # nvme list
> Node             SN                   Model
>         Namespace Usage                      Format           FW Rev
> ---------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme0n1     vol01cdbc7ec86f17346 Amazon Elastic Block Store
>         1           0.00   B /   8.59  GB    512   B +  0 B   1.0
> ```
> 
> To get some baseline results I've run several rounds of pgbench on these quite
> modest instances (dedicated, with optimized EBS) with slightly adjusted
> `max_wal_size` and with default configuration:
> 
> $ pgbench -s 200 -i
> $ pgbench -T 600 -c 2 -j 2
> 
> Analyzing `strace` output I can see that during this test there were some
> significant number of operations with pg_stat_tmp and xlogtemp, so I assume
> write lifetime hints should have some effect.
> 
> As a result I've got reduction of latency about 5-8% (but so far these numbers
> are unstable, probably because of virtualization).
> 
> ```
> # without patch
> number of transactions actually processed: 491945
> latency average = 2.439 ms
> tps = 819.906323 (including connections establishing)
> tps = 819.908755 (excluding connections establishing)
> ```
> 
> ```
> with patch
> number of transactions actually processed: 521805
> latency average = 2.300 ms
> tps = 869.665330 (including connections establishing)
> tps = 869.668026 (excluding connections establishing)
> ```
> 

Aren't those numbers far lower that you'd expect from NVMe storage? I do
have a NVMe drive (Intel 750) in my machine, and I can do thousands of
transactions on it with two clients. Seems a bit suspicious.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] proposal: psql command \graw
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add RANGE with values and exclusions clauses to the Window Functions