Re: make Gather node projection-capable
От | Robert Haas |
---|---|
Тема | Re: make Gather node projection-capable |
Дата | |
Msg-id | CA+Tgmobet41kbVZ9pPO+q7cjvw9PpsgiL2jTaY_GFJ8JQJETQQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: make Gather node projection-capable (Simon Riggs <simon@2ndQuadrant.com>) |
Список | pgsql-hackers |
On Sun, Oct 25, 2015 at 11:59 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 22 October 2015 at 16:01, Robert Haas <robertmhaas@gmail.com> wrote: >> If we make Gather projection-capable, >> we can just end up with Gather->PartialSeqScan. > > Is there a reason not to do projection in the Gather node? I don't see one. I don't see one either. There may be some work that needs to be done to get the projection to happen in the Gather node in all of the cases where we want it to happen in the Gather node, but that's not an argument against having the capability. >> > That said, I don't understand Tom's comment either. Surely the planner >> > is going to choose to do the projection in the innermost node possible, >> > so that the children nodes are going to do the projections most of the >> > time. But if for whatever reason this fails to happen, wouldn't it make >> > more sense to do it at Gather than having to put a Result on top? >> >> The planner doesn't seem to choose to do projection in the innermost >> node possible. The final tlist only gets projected at the top of the >> join tree. Beneath that, it seems like we project in order to avoid >> carrying Vars through nodes where that would be a needless expense, >> but that's just dropping columns, not computing anything. That having >> been said, I don't think that takes anything away from your chain of >> reasoning here, and I agree with your conclusion. There seems to be >> little reason to force a Result node atop a Gather node when we don't >> do that for most other node types. > > Presumably this is a performance issue then? If we are calculating something > *after* a join which increases rows then the query will be slower than need > be. I don't think there will be a performance issue in most cases because in most cases the node immediately beneath the Gather node will be able to do projection, which in most cases is in fact better, because then the work gets done in the workers. However, there may be some cases where it is useful. After having mulled it over, I think it's likely that the reason why we didn't introduce a separate node for projection is that you generally want to project to remove unnecessary columns at the earliest stage that doesn't lose performance. So if we didn't have projection capabilities built into the individual nodes, then you'd end up with things like Aggregate -> Project -> Join -> Project -> Scan, which would start to get silly, and likely inefficient. > I agree the rule should be to project as early as possible. Cool. I'm not sure Tom was really disagreeing with the idea of making Gather projection-capable ... it seems like he may have just been saying that there wasn't as much of a rule as I was alleging. Which is fine: we can decide what is best here, and I still think this is it. Barring further objections, I'm going to commit this, because (1) the status quo is definitely weird because Gather is abusing the projection stuff to come up with an extra slot, so doing thing seems unappealing and (2) I need to make other changes that touch the same areas of the code, and I want to get this stuff done quickly so that we get a user-visible feature people can test without writing C code in the near future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: