[HACKERS] lseek/read/write overhead becomes visible at scale ..

Поиск
Список
Период
Сортировка
От Tobias Oberstein
Тема [HACKERS] lseek/read/write overhead becomes visible at scale ..
Дата
Msg-id b8748d39-0b19-0514-a1b9-4e5a28e6a208@gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi guys,

pls bare with me, this is my first post here. Pls also excuse the length 
.. I was trying to do all my homework before posting here;)

The overhead of lseek/read/write vs pread/pwrite (or even 
pvread/pvwrite) was previously discussed here


https://www.postgresql.org/message-id/flat/CABUevEzZ%3DCGdmwSZwW9oNuf4pQZMExk33jcNO7rseqrAgKzj5Q%40mail.gmail.com#CABUevEzZ=CGdmwSZwW9oNuf4pQZMExk33jcNO7rseqrAgKzj5Q@mail.gmail.com

The thread ends with

"Well, my point remains that I see little value in messing with
long-established code if you can't demonstrate a benefit that's clearly
above the noise level."

I have done lots of benchmarking over the last days on a massive box, 
and I can provide numbers that I think show that the impact can be 
significant.

Our storage tops out at 9.4 million random 4kB read IOPS.

Storage consists of 8 x Intel P3608 4TB NVMe (which logically is 16 NVMe 
block devices).

Above number was using psync FIO engine .. with libaio, it's at 9.7 mio 
with much lower CPU load - but this doesn't apply to PG of course.

Switching to sync engine, it drops to 9.1 mio - but the system load then 
is also much higher!

In a way, our massive CPU 4 x E7 8880 with 88 cores / 176 threads) hides 
the impact of sync vs psync.

So, with less CPU, the syscall overhead kicks in (we are CPU bound).

It also becomes much more visible with Linux MD in the mix, because MD 
comes with it's own overhead/bottleneck, and our then CPU cannot hide 
the overhead of sync vs psync anymore:

sync on MD: IOPS=1619k
psync on MD: IOPS=4289k
sync on non-MD: IOPS=9165k
psync on non-MD: IOPS=9410k

Please find all the details here

https://github.com/oberstet/scratchbox/tree/master/cruncher/sync-engines

Note: MD has a lock contention (lock_qsc) - I am going down that rabbit 
hole too. But this is only related to PG in that the negative impacts 
multiply.

What I am trying to say is: the syscall overhead of doing 
lseek/read/write instead of pread/pwrite do become visible and hurt at a 
certain point.

I totally agree with the entry citation ("show up numbers first!"), but 
I think I have shown numbers;)

I'd love to get the 9.4 mio IOPS right through MD and XFS up to PG 
(yeah, I know, PG does 8kB, but it'll be similar).

Cheers,
/Tobias

PS:
This isn't academic, as we have experience (in prod) with a similarily 
designed box and PostgreSQL used as a data-warehouse.

We are using an internal tool to parallelize via sessions and this box 
is completely CPU bound (same NVMes, 3TB RAM as the new one, but only 48 
cores and no HT).

Squeezing out CPU and imrpoving CPU usage efficiency is hence very 
important for us.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Daniel Verite"
Дата:
Сообщение: Re: [HACKERS] Improvements in psql hooks for variables
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..