Re: Avoid stack frame setup in performance critical routines using tail calls

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Avoid stack frame setup in performance critical routines using tail calls
Дата
Msg-id 20210720155723.dau4xqsnfq72uih5@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Avoid stack frame setup in performance critical routines using tail calls  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

On 2021-07-20 19:37:46 +1200, David Rowley wrote:
> On Tue, 20 Jul 2021 at 19:04, Andres Freund <andres@anarazel.de> wrote:
> > > * AllocateSetAlloc.txt
> > > * palloc.txt
> > > * percent.txt
> >
> > Huh, that's interesting. You have some control flow enforcement stuff turned on (the endbr64). And it looks like it
hasa non zero cost (or maybe it's just skid). Did you enable that intentionally? If not, what compiler/version/distro
isit? I think at least on GCC that's -fcf-protection=...
 
>
> It's ubuntu 21.04 with gcc 10.3 (specifically gcc version 10.3.0
> (Ubuntu 10.3.0-1ubuntu1)
>
> I've attached the same results from compiling with clang 12
> (12.0.0-3ubuntu1~21.04.1)

It looks like the ubuntu folks have changed the default for CET to on.


andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2  -c -o test.o test.c && objdump -S test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:    f3 0f 1e fa              endbr64
   4:    b8 11 00 00 00           mov    $0x11,%eax
   9:    c3                       retq
andres@ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -fcf-protection=none -c -o test.o test.c &&
objdump-S test.o
 

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:    b8 11 00 00 00           mov    $0x11,%eax
   5:    c3                       retq


Independent of this patch, it might be worth running a benchmark with
the default options, and one with -fcf-protection=none. None of my
machines support it...

$ cpuid -1|grep CET
      CET_SS: CET shadow stack                 = false
      CET_IBT: CET indirect branch tracking    = false
         XCR0 supported: CET_U state          = false
         XCR0 supported: CET_S state          = false

Here it adds about 40kB of .text, but I can't measure the CET
overhead...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ronan Dunklau
Дата:
Сообщение: Re: Early Sort/Group resjunk column elimination.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Question about non-blocking mode in libpq