Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Дата
Msg-id 20170124181121.pgk7kqfkq4dd3hpo@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..  (Tobias Oberstein <tobias.oberstein@gmail.com>)
Ответы Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..  (Tobias Oberstein <tobias.oberstein@gmail.com>)
Список pgsql-hackers
Hi,

On 2017-01-24 18:57:47 +0100, Tobias Oberstein wrote:
> Am 24.01.2017 um 18:41 schrieb Andres Freund:
> > On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
> > > The syscall overhead is visible in production too .. I watched PG using perf
> > > live, and lseeks regularily appear at the top of the list.
> > 
> > Could you show such perf profiles? That'll help us.
> 
> oberstet@bvr-sql18:~$ psql -U postgres -d adr
> psql (9.5.4)
> Type "help" for help.
> 
> adr=# select * from svc_sqlbalancer.f_perf_syscalls();
> NOTICE:  starting Linux perf syscalls sampling - be patient, this can take
> some time ..
> NOTICE:  sudo /usr/bin/perf stat -e "syscalls:sys_enter_*"      -x ";" -a
> sleep 30 2>&1
>  pid |                syscall                |   cnt   | cnt_per_sec
> -----+---------------------------------------+---------+-------------
>      | syscalls:sys_enter_lseek              | 4091584 |      136386
>      | syscalls:sys_enter_newfstat           | 2054988 |       68500
>      | syscalls:sys_enter_read               |  767990 |       25600
>      | syscalls:sys_enter_close              |  503803 |       16793
>      | syscalls:sys_enter_newstat            |  434080 |       14469
>      | syscalls:sys_enter_open               |  380382 |       12679
>      | syscalls:sys_enter_mmap               |  301491 |       10050
>      | syscalls:sys_enter_munmap             |  182313 |        6077
>      | syscalls:sys_enter_getdents           |  162443 |        5415
>      | syscalls:sys_enter_rt_sigaction       |  158947 |        5298
>      | syscalls:sys_enter_openat             |   85325 |        2844
>      | syscalls:sys_enter_readlink           |   77439 |        2581
>      | syscalls:sys_enter_rt_sigprocmask     |   60929 |        2031
>      | syscalls:sys_enter_mprotect           |   58372 |        1946
>      | syscalls:sys_enter_futex              |   49726 |        1658
>      | syscalls:sys_enter_access             |   40845 |        1362
>      | syscalls:sys_enter_write              |   39513 |        1317
>      | syscalls:sys_enter_brk                |   33656 |        1122
>      | syscalls:sys_enter_epoll_wait         |   23776 |         793
>      | syscalls:sys_enter_ioctl              |   19764 |         659
>      | syscalls:sys_enter_wait4              |   17371 |         579
>      | syscalls:sys_enter_newlstat           |   13008 |         434
>      | syscalls:sys_enter_exit_group         |   10135 |         338
>      | syscalls:sys_enter_recvfrom           |    8595 |         286
>      | syscalls:sys_enter_sendto             |    8448 |         282
>      | syscalls:sys_enter_poll               |    7200 |         240
>      | syscalls:sys_enter_lgetxattr          |    6477 |         216
>      | syscalls:sys_enter_dup2               |    5790 |         193
> 
> <snip>
> 
> Note: there isn't a lot of load currently (this is from production).

That doesn't really mean that much - sure it shows that lseek is
frequent, but it doesn't tell you how much impact this has to the
overall workload.  For that'd you'd need a generic (i.e. not syscall
tracepoint, but cpu cycle) perf profile, and look in the call graph (via
perf report --children) how much of that is below the lseek syscall.


> > > > I'm much less against this change than Tom, but doing artificial syscall
> > > > microbenchmark seems unlikely to make a big case for using it in
> > > 
> > > This isn't a syscall benchmark, but FIO.
> > 
> > There's not really a difference between those, when you use fio to
> > benchmark seek vs pseek.
> 
> Sorry, I don't understand what you are talking about.

Fio as you appear to have used is a microbenchmark benchmarking
individual syscalls.


> > > > postgres, where it's part of vastly more expensive operations (like
> > > > actually reading data afterwards, exclusive locks, ...).
> > > 
> > > PG is very CPU hungry, yes.
> > 
> > Indeed - working on it ;)
> > 
> > 
> > > But there are quite some system related effects
> > > too .. eg we've managed to get down the system load with huge pages (big
> > > improvement).
> > 
> > Glad to hear it.
> 
> With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs
> down in TLB etc overhead).

I was one of the people working on adding hugepage support to pg, that's
why I was glad ;)


Regards,

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Daniel Verite"
Дата:
Сообщение: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
Следующее
От: Corey Huinker
Дата:
Сообщение: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)