Re: track needed attributes in plan nodes for executor use
От | Amit Langote |
---|---|
Тема | Re: track needed attributes in plan nodes for executor use |
Дата | |
Msg-id | CA+HiwqFpyzhpfNjqu1kYRdfTdts29gf6izi3ui1VdiH7r_t9Bg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: track needed attributes in plan nodes for executor use (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Thanks for the thoughts, Tom. On Mon, Jul 14, 2025 at 11:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > Not quite -- the optimization doesn’t require changes to the tuple > > header or representation. The existing deforming code already stops > > once all requested attributes are filled, using tts_nvalid to track > > that. What I’m proposing is to additionally allow the slot to skip > > ahead to the first needed attribute, rather than always starting > > deformation from attno 0. That lets us avoid alignment/null checks for > > preceding fixed-width attributes that are guaranteed to be unused. > > I'm quite skeptical about this being a net win. You could only skip > deformation for attributes that are both fixed-width and > guaranteed-not-null. Having a lot of those at the start may be true > in our system catalogs (because we have other reasons to lay them out > that way) but I doubt it occurs often in user tables. So I'm afraid > that this would eat more in planning time than it'd usually save in > practice. That’s fair, and I agree that a fixed-not-null prefix is not a common pattern across all user schemas, and our handling of dropped columns only makes that less likely. Still, I think it’s worth exploring this optimization in the context of OLAP-style workloads, where the executor processes large volumes of tuples and per-tuple CPU efficiency can matter. In practice, users often copy operational data into separate OLAP tables to gain performance, designing those tables with specific layouts in mind (for example, wide tables with fixed-width keys near the front followed by varlena columns). There is a good deal of public guidance -- including talks, blog posts, and vendor materials -- that promotes that pattern. Users adopting it, and even those promoting it, might not realize that tuple deforming overhead remains a bottleneck despite their schema work. But we have seen that it can be and I think we now have a reasonably clean way to mitigate that. The example I showed, with 15 fixed-width columns followed by varlena ones, was meant to demonstrate that deformation cost is mechanically avoidable in some cases, not because we expect that exact schema to be common. For instance, in that example, ExecInterpExpr() can account for 70% of runtime in perf profiles of a backend running SELECT sum(col_10) FROM foo WHERE col_1 = $1, most of which is spent in slot_getsomeattrs_int() (62%) -- without HEAD that is. With the PoC patch applied, total time in ExecInterpExpr() drops to 36%, and slot_getsomeattrs_int() accounts for only 18%. > I'm also bothered by the assumption that the planner has full > knowledge of which attributes will be used at run-time. I don't > believe that the plan tree contains every Var reference that will > occur during execution. Triggers, CHECK constraints, FK constraints, > etc are all things that aren't in the plan tree. Right, I agree that plan-time knowledge does not cover everything. This optimization is not aimed at mechanisms like triggers or constraints, which may access attributes outside the Plan tree. More importantly, those mechanisms are not part of the hot executor loop I am trying to optimize as mentioned above. That said, computing the needed attribute set in the executor might turn out to be more extensible in practice now that I think about it. Once a TupleTableSlot has been populated during plan execution, expressions that read from it -- including those outside the plan tree -- can potentially benefit. For example, ModifyTable reuses the same slot populated by its subplan when performing per-row operations like CHECK constraint evaluation and trigger firing. Planner-side analysis would miss such uses, but executor-side computation naturally covers them. So while my current goal is just to improve performance for plan-node expression evaluation, executor-side analysis could naturally extend the benefit to other deforming paths without extra effort. In contrast, planner-side analysis is inherently limited to the Plan tree. Thanks again for the feedback. -- Thanks, Amit Langote
В списке pgsql-hackers по дате отправления: