Re: A few more opportunities to use TupleTableSlotOps fields

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: A few more opportunities to use TupleTableSlotOps fields
Дата
Msg-id 20190521163440.vo6yjyy7xeqwtlbx@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: A few more opportunities to use TupleTableSlotOps fields  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2019-05-21 12:16:22 -0400, Robert Haas wrote:
> On Tue, May 21, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote:
> > Do you refer to the fact that for example the address of
> >
> >         tts_virtual_clear(dstslot);
> >
> > is immediately available in the text section while in this case
> >
> >         dstslot->tts_ops->clear(dstslot);
> >
> > the CPU first needs to fetch the address of tts_ops and also that of the
> > ->clear function?
> 
> Yes.  And since tts_virtual_clear() is marked static, it seems like it
> might even possible for it to inline that function at compile time.

I sure hope so. And I did verify that at some point. We, for example,
don't want changes to slot->tts_flags tts_virtual_clear does to be
actually written to memory, if it's called from a callsite that knows
it's a virtual slot.

I just checked, and for me tts_virtual_copyslot() inlines all of
tts_virtual_clear(), and eliminates most of its contents except for the
pfree() branch. All the rest of the state changes are overwritten by
tts_virtual_copyslot() anyway.


> > I admit I didn't think about this problem. Nevertheless I imagine that due to
> > constness of the variables like TTSOpsVirtual (and due to several other const
> > declarations) the compiler might be able to compute the address of the
> > tts_ops->clear() expression. Thus the only extra work for the CPU would be to
> > fetch tts_ops from dstslot, but that should not be a big deal because other
> > fields of dstslot are accessed by the surrounding code (so all of them might
> > be available in the CPU cache).
> 
> I think the issue is pipeline stalls.  If the compiler can see a
> direct call coming up, it can start fetching the instructions from the
> target address.  If it sees an indirect call, that's probably harder
> to do.

Some CPUs can do so, but it'll often be more expensive / have a higher
chance of misspeculating (rather than the 0 chance in case of a straight
line code.


> But really, I'm not an expect on this area.  I would write the code as
> Andres did on the general principle of making things as easy for the
> compiler and CPU as possible, but I do not know how much it really
> matters.  Andres probably does know...

I think the inlining bit is more crucial, but that having as few
indirect calls as possible matters here. It wasn't that easy to get the
slot virtualization to not regress performance meaningfully.

If anything, I really want to go the *opposite* direction, i.e. remove
*more* indirect calls from within execTuples.c, and get the compiler to
realize at callsites external to execTuples.c that it can cache tts_ops
in more places.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: New EXPLAIN option: ALL
Следующее
От: Andres Freund
Дата:
Сообщение: Re: A few more opportunities to use TupleTableSlotOps fields