Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
От | Tobias Oberstein |
---|---|
Тема | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. |
Дата | |
Msg-id | a55b21d1-7c99-2c66-d661-ef5288f29e30@gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
(Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Hi, >> pid | syscall | cnt | cnt_per_sec >> -----+---------------------------------------+---------+------------- >> | syscalls:sys_enter_lseek | 4091584 | 136386 >> | syscalls:sys_enter_newfstat | 2054988 | 68500 >> | syscalls:sys_enter_read | 767990 | 25600 >> | syscalls:sys_enter_close | 503803 | 16793 >> | syscalls:sys_enter_newstat | 434080 | 14469 >> | syscalls:sys_enter_open | 380382 | 12679 >> >> Note: there isn't a lot of load currently (this is from production). > > That doesn't really mean that much - sure it shows that lseek is > frequent, but it doesn't tell you how much impact this has to the Above is on a mostly idle system ("idle" for our loads) .. when things get hot, lseek calls can reach into the millions/sec. Doing 5 million syscalls per sec comes with overhead no matter how lightweight the syscall is, doesn't it? Using pread instead of lseek+read halfes the syscalls. I really don't understand what you are fighting here .. > overall workload. For that'd you'd need a generic (i.e. not syscall > tracepoint, but cpu cycle) perf profile, and look in the call graph (via > perf report --children) how much of that is below the lseek syscall. I see. I might find time to extend our helper function f_perf_syscalls. >>>>> I'm much less against this change than Tom, but doing artificial syscall >>>>> microbenchmark seems unlikely to make a big case for using it in >>>> >>>> This isn't a syscall benchmark, but FIO. >>> >>> There's not really a difference between those, when you use fio to >>> benchmark seek vs pseek. >> >> Sorry, I don't understand what you are talking about. > > Fio as you appear to have used is a microbenchmark benchmarking > individual syscalls. I am benchmarking IOPS, and while doing so, it becomes apparent that at these scales it does matter _how_ IO is done. The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU load. Using any synchronous IO engine is slower and produces higher load. I do understand that switching to libaio isn't going to fly for PG (completely different approach). But doing pread instead of lseek+read seems simple enough. But then, I don't know about the PG codebase .. Among the synchronous methods of doing IO, psync is much better than sync. pvsync, pvsync2 and pvsync2 + hipri (busy polling, no interrupts) are better, but the gain is smaller, and all of them are inferior to libaio. >>> Glad to hear it. >> >> With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs >> down in TLB etc overhead). > > I was one of the people working on adding hugepage support to pg, that's > why I was glad ;) Ahh;) Sorry, wasn't aware. This is really invaluable. Thanks for that! Cheers, /Tobias
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Corey HuinkerДата:
Сообщение: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)
Следующее
От: Alvaro HerreraДата:
Сообщение: Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..