Re: track needed attributes in plan nodes for executor use

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: track needed attributes in plan nodes for executor use
Дата
Msg-id CA+HiwqFpyzhpfNjqu1kYRdfTdts29gf6izi3ui1VdiH7r_t9Bg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: track needed attributes in plan nodes for executor use  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Thanks for the thoughts, Tom.

On Mon, Jul 14, 2025 at 11:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > Not quite -- the optimization doesn’t require changes to the tuple
> > header or representation. The existing deforming code already stops
> > once all requested attributes are filled, using tts_nvalid to track
> > that. What I’m proposing is to additionally allow the slot to skip
> > ahead to the first needed attribute, rather than always starting
> > deformation from attno 0. That lets us avoid alignment/null checks for
> > preceding fixed-width attributes that are guaranteed to be unused.
>
> I'm quite skeptical about this being a net win.  You could only skip
> deformation for attributes that are both fixed-width and
> guaranteed-not-null.  Having a lot of those at the start may be true
> in our system catalogs (because we have other reasons to lay them out
> that way) but I doubt it occurs often in user tables.  So I'm afraid
> that this would eat more in planning time than it'd usually save in
> practice.

That’s fair, and I agree that a fixed-not-null prefix is not a common
pattern across all user schemas, and our handling of dropped columns
only makes that less likely. Still, I think it’s worth exploring this
optimization in the context of OLAP-style workloads, where the
executor processes large volumes of tuples and per-tuple CPU
efficiency can matter. In practice, users often copy operational data
into separate OLAP tables to gain performance, designing those tables
with specific layouts in mind (for example, wide tables with
fixed-width keys near the front followed by varlena columns). There is
a good deal of public guidance -- including talks, blog posts, and
vendor materials -- that promotes that pattern. Users adopting it, and
even those promoting it, might not realize that tuple deforming
overhead remains a bottleneck despite their schema work.  But we have
seen that it can be and I think we now have a reasonably clean way to
mitigate that.

The example I showed, with 15 fixed-width columns followed by varlena
ones, was meant to demonstrate that deformation cost is mechanically
avoidable in some cases, not because we expect that exact schema to be
common. For instance, in that example, ExecInterpExpr() can account
for 70% of runtime in perf profiles of a backend running SELECT
sum(col_10) FROM foo WHERE col_1 = $1, most of which is spent in
slot_getsomeattrs_int() (62%) -- without HEAD that is.  With the PoC
patch applied, total time in ExecInterpExpr() drops to 36%, and
slot_getsomeattrs_int() accounts for only 18%.

> I'm also bothered by the assumption that the planner has full
> knowledge of which attributes will be used at run-time.  I don't
> believe that the plan tree contains every Var reference that will
> occur during execution.  Triggers, CHECK constraints, FK constraints,
> etc are all things that aren't in the plan tree.

Right, I agree that plan-time knowledge does not cover everything.
This optimization is not aimed at mechanisms like triggers or
constraints, which may access attributes outside the Plan tree. More
importantly, those mechanisms are not part of the hot executor loop I
am trying to optimize as mentioned above.

That said, computing the needed attribute set in the executor might
turn out to be more extensible in practice now that I think about it.
Once a TupleTableSlot has been populated during plan execution,
expressions that read from it -- including those outside the plan tree
-- can potentially benefit. For example, ModifyTable reuses the same
slot populated by its subplan when performing per-row operations like
CHECK constraint evaluation and trigger firing. Planner-side analysis
would miss such uses, but executor-side computation naturally covers
them.  So while my current goal is just to improve performance for
plan-node expression evaluation, executor-side analysis could
naturally extend the benefit to other deforming paths without extra
effort. In contrast, planner-side analysis is inherently limited to
the Plan tree.

Thanks again for the feedback.

--
Thanks, Amit Langote



В списке pgsql-hackers по дате отправления: