Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
От | James Hunter |
---|---|
Тема | Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators |
Дата | |
Msg-id | CAJVSvF6ckAvcFkCT-cWAqYpVn2MM5zOEiTT-ubBB4mjFfCkAWQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators (James Hunter <james.hunter.pg@gmail.com>) |
Ответы |
Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
|
Список | pgsql-hackers |
On Mon, Feb 24, 2025 at 9:55 PM James Hunter <james.hunter.pg@gmail.com> wrote: > > On Mon, Feb 24, 2025 at 6:54 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > On Mon, 2025-02-24 at 12:46 -0800, James Hunter wrote: > > > Attached please find the patch set I mentioned, above, in [1]. It > > > consists of 4 patches that serve as the building blocks for and a > > > prototype of the "query_work_mem" GUC I proposed: > > > > I didn't look at the details yet. But from: > > > > https://www.postgresql.org/message-id/CAJVSvF7x_DLj7-JrXvMB4_j%2BjzuvjG_7iXNjx5KmLBTXHPNdGA%40mail.gmail.com > > > > I expected something much smaller in scope, where we just add a > > "plan_work_mem" field to the Plan struct, copy the work_mem global GUC > > to that field when we construct a Plan node, and then reference the > > plan_work_mem instead of the GUC directly. Attaching a new refactoring, which splits the code changes into patches by functionality. This refactoring yields 5 patches, each of which is relatively localized. I hope that the result will be more focused and more feasible to review. * Patch 1: modifies file setrefs.c, to track all (regular) SubPlan objects that occur inside of Plan node (qual) expressions, on a new Plan.subPlan list (parallel to the existing Plan.initPlan list, which is for SubPlans that have been turned into init plans). [Patch 1 has no visible side effects, since it just populates a list on the Plan object.] * Patch 2: copies the work_mem [* hash_mem_multiplier] GUC(s) to a new Plan field, Plan.workmem_limit; and modifies existing exec nodes to read the limit from this field instead of the GUCs. Adds a new file, "execWorkmem.c", that does the GUC copying, and modifies existing exec nodes to read the new field(s). [Patch 2 has no visible side effects, since it just refactors code, to store the GUCs on a field and then read those fields instead of the GUCs.] * Patch 3: stores the optimizer's estimate of how much working memory a given Path / Plan node will use, on the Path / Plan, in a new field, "workmem". (I used "workmem" for the estimate, vs. "workmem_limit," in Patch 2, for the limit. This is to try to be parallel with the existing "rows" and "cost" estimates.) Involves a significant amount of code in costsize.c and createplan.c, because sometimes this estimate is not readily available. (What I mean is: while Patch 2 just reads the workmem_limit from a GUC, Patch 3 has to estimate the actual workmem by basically multiplying (width * rows). But not all Paths / Plans cost the possibility of spilling, so sometimes I have to copy this formula from the corresponding exec node, etc. The logical changes in Patch 3 are simple, but the physical LoC is larger.) [Patch 3 has no visible side effect, since it just stores an estimate on the Plan object.] * Patch 4: modifies file explain.c to implement a "work_mem on" option to the EXPLAIN command. Also adds a unit test that shows that this "work_mem on" option works as expected. [Patch 4 is pure visible side effect -- all it does it add a new option to display workmem stats to the customer. But it doesn't change any existing behavior: it just adds a new EXPLAIN option.] * Patch 5: adds a sample extension / hook that shows how Patches 2 and 3 can be used -- without much effort! -- to implement a per-query working-memory limit, that gives more working memory to exec nodes that (are estimated to) need it, while taking working memory away, if necessary, from exec nodes that (we estimate) don't need it. The refactored patch set should be more feasible to review, since each patch is now localized to a single piece of functionality. Note that Patch 5 isn't essential to merge into core PostgreSQL, since it's mostly a proof-of-concept for how a "work_mem.query_work_mem" GUC could be implemented. But Patches 2 and 3 are needed, since they expose the limit and estimate, on the Plan, on which Patch 5 (or any similar working-memory extension) relies. Thanks again, James Hunter
Вложения
В списке pgsql-hackers по дате отправления: