Re: [HACKERS] Small improvement to parallel query docs
От | David Rowley |
---|---|
Тема | Re: [HACKERS] Small improvement to parallel query docs |
Дата | |
Msg-id | CAKJS1f_1=kJGYR-VOAiMiS=zwWLT=wr8t8X0hiQ4NYSgG37Nhg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Small improvement to parallel query docs (Brad DeJong <Brad.Dejong@infor.com>) |
Ответы |
Re: [HACKERS] Small improvement to parallel query docs
(Brad DeJong <Brad.Dejong@infor.com>)
|
Список | pgsql-hackers |
On 14 February 2017 at 10:10, Brad DeJong <Brad.Dejong@infor.com> wrote: > Robert Haas wrote: > >> + <literal>COUNT(*)</>, each worker must compute subtotals which later must >> + be combined to produce an overall total in order to produce the final >> + answer. If the query involves a <literal>GROUP BY</> clause, >> + separate subtotals must be computed for each group seen by each parallel >> + worker. Each of these subtotals must then be combined into an overall >> + total for each group once the parallel aggregate portion of the plan is >> + complete. This means that queries which produce a low number of groups >> + relative to the number of input rows are often far more attractive to the >> + query planner, whereas queries which don't collect many rows into each >> + group are less attractive, due to the overhead of having to combine the >> + subtotals into totals, of which cannot run in parallel. > >> I don't think "of which cannot run in parallel" is good grammar. I'm somewhat unsure whether the rest is an improvementor not. Other opinions? > > Does this read any more clearly? > > + <literal>COUNT(*)</>, each worker must compute subtotals which are later > + combined in order to produce an overall total for the final answer. If > + the query involves a <literal>GROUP BY</> clause, separate subtotals > + must be computed for each group seen by each parallel worker. After the > + parallel aggregate portion of the plan is complete, there is a serial step > + where the group subtotals from all of the parallel workers are combined > + into an overall total for each group. Because of the overhead of combining > + the subtotals into totals, plans which produce few groups relative to the > + number of input rows are often more attractive to the query planner > + than plans which produce many groups relative to the number of input rows. Actually looking over this again I think it's getting into too much detail which is already described in the next paragraph (of which I think is very clear). I propose we just remove the whole paragraph, and mention about the planning and estimated number of groups stuff in another new paragraph. I've attached a patch to this effect, which also just removes the text about why we don't support Merge Join. I felt something needed written in its place, so I mentioned that identical hash tables are created in each worker. This is perhaps not required, but the paragraph seemed a bit empty without it. I also noticed a mistake "based on a column taken from the inner table", this "inner" I assume should be "outer" since it surely must be talking of a parameterised index scan?, in which case the parameter is from the outer side, not the inner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: