Обсуждение: Size of Path nodes

Поиск

Список

Период

Сортировка

Size of Path nodes

От

Tom Lane

Дата:

04 декабря 2015 г., 17:50:15

So over in my private branch where I've been fooling around with upper
planner pathification, I've been sweating bullets to avoid enlarging the
size of struct Path, because I was aware that common path types like
IndexPath and AppendPath were already a power-of-2 size (at least on
64-bit machines), meaning that palloc'ing them would cost twice as much
space if I added any fields.

When I got around to merging up to HEAD, I found this in commit f0661c4e8:

@@ -753,6 +753,7 @@ typedef struct Path    RelOptInfo *parent;            /* the relation this path can build */
ParamPathInfo*param_info;     /* parameterization info, or NULL if none */
 
+    bool        parallel_aware;    /* engage parallel-aware logic? */    /* estimated size/costs for path (see
costsize.cfor more info) */    double        rows;            /* estimated number of result tuples */
 

which means Robert has already blown the planner's space consumption out
by close to a factor of 2, and I should stop worrying.  Or else he should
start worrying.  Do we really need this field?  Did anyone pay any
attention to what happened to planner space consumption and performance
when the parallel-scan patch went in?  Or am I just too conditioned by
working with last-century hardware, and I should stop caring about how
large these nodes are?

While I'm bitching about this: a field added to a struct as fundamental
as Path ought to have a pretty well-defined meaning, and this does not
as far as I can tell.  There was certainly no documentation worthy of
the name added by the above commit.  What is the semantic difference
between a Path with this flag set and the identical Path without?
Likewise for struct Plan?
        regards, tom lane

Re: Size of Path nodes

От

Andres Freund

Дата:

04 декабря 2015 г., 18:04:33

On 2015-12-04 12:50:09 -0500, Tom Lane wrote:
> So over in my private branch where I've been fooling around with upper
> planner pathification, I've been sweating bullets to avoid enlarging the
> size of struct Path, because I was aware that common path types like
> IndexPath and AppendPath were already a power-of-2 size (at least on
> 64-bit machines), meaning that palloc'ing them would cost twice as much
> space if I added any fields.
> 
> When I got around to merging up to HEAD, I found this in commit f0661c4e8:
> 
> @@ -753,6 +753,7 @@ typedef struct Path
>  
>      RelOptInfo *parent;            /* the relation this path can build */
>      ParamPathInfo *param_info;     /* parameterization info, or NULL if none */
> +    bool        parallel_aware;    /* engage parallel-aware logic? */
>  
>      /* estimated size/costs for path (see costsize.c for more info) */
>      double        rows;            /* estimated number of result tuples */
> 
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.

FWIW, for me it's still <= 64 bytes:

struct Path {       NodeTag                    type;                 /*     0     4 */       NodeTag
pathtype;            /*     4     4 */       RelOptInfo *               parent;               /*     8     8 */
ParamPathInfo*            param_info;           /*    16     8 */       bool                       parallel_aware;
/*    24     1 */
 
       /* XXX 7 bytes hole, try to pack */
       double                     rows;                 /*    32     8 */       Cost
startup_cost;        /*    40     8 */       Cost                       total_cost;           /*    48     8 */
List*                     pathkeys;             /*    56     8 */       /* --- cacheline 1 boundary (64 bytes) --- */
 
       /* size: 64, cachelines: 1, members: 9 */       /* sum members: 57, holes: 1, sum holes: 7 */
};

> While I'm bitching about this: a field added to a struct as fundamental
> as Path ought to have a pretty well-defined meaning, and this does not
> as far as I can tell.  There was certainly no documentation worthy of
> the name added by the above commit.  What is the semantic difference
> between a Path with this flag set and the identical Path without?
> Likewise for struct Plan?

That part is obviously independently true, regardless of the size.

Andres

Re: Size of Path nodes

От

Tom Lane

Дата:

04 декабря 2015 г., 18:07:42

Andres Freund <andres@anarazel.de> writes:
> On 2015-12-04 12:50:09 -0500, Tom Lane wrote:
>> which means Robert has already blown the planner's space consumption out
>> by close to a factor of 2, and I should stop worrying.

> FWIW, for me it's still <= 64 bytes:

No, it's not bare Path I'm worried about, it's IndexPath, which went
from 128 bytes in previous versions to 136 in HEAD.  Likewise for
BitmapHeapPath, TidPath, ForeignPath, AppendPath, ResultPath,
MaterialPath, and possibly others.
        regards, tom lane

Re: Size of Path nodes

От

Robert Haas

Дата:

04 декабря 2015 г., 21:00:44

On Fri, Dec 4, 2015 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So over in my private branch where I've been fooling around with upper
> planner pathification, I've been sweating bullets to avoid enlarging the
> size of struct Path, because I was aware that common path types like
> IndexPath and AppendPath were already a power-of-2 size (at least on
> 64-bit machines), meaning that palloc'ing them would cost twice as much
> space if I added any fields.
>
> When I got around to merging up to HEAD, I found this in commit f0661c4e8:
>
> @@ -753,6 +753,7 @@ typedef struct Path
>
>      RelOptInfo *parent;            /* the relation this path can build */
>      ParamPathInfo *param_info;     /* parameterization info, or NULL if none */
> +    bool        parallel_aware;    /* engage parallel-aware logic? */
>
>      /* estimated size/costs for path (see costsize.c for more info) */
>      double        rows;            /* estimated number of result tuples */
>
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.  Or else he should
> start worrying.  Do we really need this field?  Did anyone pay any
> attention to what happened to planner space consumption and performance
> when the parallel-scan patch went in?  Or am I just too conditioned by
> working with last-century hardware, and I should stop caring about how
> large these nodes are?

If it's really true that the extra byte I added there has doubled
planner memory use, then that's definitely cause for concern.
However, I am skeptical that's really what has happened here.  Not
every path has crossed a power-of-two threshold, and paths are not the
only things the planner allocates.  What's a reasonable way to assess
the effect of this on planner memory use in general?

I really HOPE that this isn't a serious problem.  The pending patch to
all parallel joins adds another int right after that bool, and I doubt
rather deeply that'll be the end of it.  I'm skeptical that int should
be a double, and that maybe there should be a few more things.... I
don't expect to need tons of space here, but the current code is
pretty stupid about estimating the number of workers that are needed
for a parallel plan, and at some point we're going to want to make
that less stupid, and that's probably going to require some more
bookkeeping.  I don't know exactly how that's all going to work yet,
but it's pretty hard to make the planner general parallel query paths
for a wide variety of plan types if none of those plan types carry any
information specific to parallel query.

> While I'm bitching about this: a field added to a struct as fundamental
> as Path ought to have a pretty well-defined meaning, and this does not
> as far as I can tell.  There was certainly no documentation worthy of
> the name added by the above commit.  What is the semantic difference
> between a Path with this flag set and the identical Path without?
> Likewise for struct Plan?

/me crawls into a hole.

Well, funny story about that.  The parallel sequential scan code
forgot to make that flag actually do what it was budgeted to do, but
it turns out not to matter as long as the only thing that can be done
in parallel is a sequential scan.  So there's no live bug right now,
and the parallel join patch adds the missing code to make that flag
actually meaningful.  So I'm not very surprised that you found the
current state of things stupid. The reason it's needed is because the
parallel join patch generates plans like this:

Gather
-> Hash Join -> Parallel Seq Scan -> Hash   -> Seq Scan

The idea of this plan is that each worker builds a hash table on the
inner rel, and then reads a subset of the outer rel and performs the
join for the subset of rows which it reads.  Then the Gather brings
all the worker results together in the leader.  Well, when I initially
coded this, it wasn't working properly, and I couldn't for the life of
me figure out why.

Eventually, after hours of debugging, I realized that while I'd added
this parallel_aware flag with the idea of distinguishing between the
outer seq scan (which needs to be split between the workers) from the
inner seq scan (which needs to be executed to completion by every
worker) there was nothing in the code which would actually cause it to
work like that.  The inner seq scan was behaving as if it were
parallel - that is, it returned only a subset of the rows in each
worker - even though the parallel_aware flag was not set.  I had
thought that there was some logic in execParallel.c or, uh, somewhere,
which would give that flag meaning but it turns out that, in master,
there isn't.  So the parallel join patch, which is otherwise just an
optimizer patch, also adds the missing bits to execParallel.c.  Maybe
I should have committed that separately as a bug fix.

To try to clarify a little further, execParallel.c contains three
functions that make passes over the PlanState tree.  The first one is
made before creating the parallel DSM, and its purpose is to estimate
the amount of space that each plan node will require in the DSM (it
can overestimate, but it can't underestimate).  The second one is made
after creating the parallel DSM, and gives plan nodes a chance to
initialize the space they estimated they would use (or some lesser
amount).  The third one is made in each worker, and gives plan nodes a
chance to fish out a pointer to the space previously initialized in
the master.  All of this processing is of course skipped for plan
nodes that have no parallel smarts.  But the parallel-aware flag is
supposed to allow these steps to be skipped even for a node that does
have parallel smarts, so that those smarts can in effect be shut off
when they're not desired.

Of course, it would be possible to rejigger things to avoid having
this flag in every path.  Currently, SeqScans are the only plan type
that is ever parallel-aware, so we could put the plan in only that
path type.  However, I expect to want to expand the set of
parallel-aware paths pretty soon.  In particular, I want to add at
least AppendPath and IndexPath to the list.  So I fear that this
approach will just be kicking the can down the road, and maybe not all
that far down the road.  I'm pretty skeptical that we can continue to
add planner features without making the data structures any bigger.
It looks to me like I'm ending up touching all the same places that
the parameterized path stuff touched - that needed a few extra bytes
in a whole bunch of data structures, and parallelism wants a few bytes
in many of those very same data structures for pretty similar reasons,
and from the sound of it, the upper planner path-ification also wants
a few bytes in all of those places (for tlists, maybe?).  I think that
if we make a rule that IndexPath can never again get bigger, we are
going to have a very difficult time continuing to improve the planner.

One thing to keep in mind is that most modern allocators do not do
power-of-two allocation, at least AFAICT.  They carve up super-blocks
into a bunch of equal sized chunks.  So instead of falling over a
power-of-two boundary where space suddenly doubles, they instead just
move into a larger size class, and those size classes are
closely-enough spaced that this isn't such a big deal.  We're stuck
using palloc, and I can't help noticing that the last time aset.c had
a commit touching a three-digit number of lines was in July of 2000.
It might be time to consider other alternatives there.  I posted a new
allocator on -hackers a while back that didn't get a real warm
reception, but it was significantly more memory-efficient on memory
contexts that contain many allocations than what we're doing today.
It may well be that what I did there wasn't a good plan for any number
of reasons, but the principle is worth thinking about.  Of course, no
allocator can avoid the problem that large allocations will need to
span cache lines, but it CAN avoid doubling memory consumption when
you cross a power-of-two boundary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Size of Path nodes

От

Jim Nasby

Дата:

04 декабря 2015 г., 22:44:14

On 12/4/15 11:50 AM, Tom Lane wrote:
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.  Or else he should
> start worrying.  Do we really need this field?  Did anyone pay any
> attention to what happened to planner space consumption and performance
> when the parallel-scan patch went in?  Or am I just too conditioned by
> working with last-century hardware, and I should stop caring about how
> large these nodes are?

I suspect Cachegrind[1] would answer a lot of these questions (though 
I've never actually used it). I can't get postgres to run under valgrind 
on my laptop, but maybe someone that's been successful at valgrind can 
try cachegrind (It's just another mode of valgrind).

[1] http://valgrind.org/docs/manual/cg-manual.html
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Size of Path nodes

От

Peter Geoghegan

Дата:

04 декабря 2015 г., 23:14:32

On Fri, Dec 4, 2015 at 2:44 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> I suspect Cachegrind[1] would answer a lot of these questions (though I've
> never actually used it). I can't get postgres to run under valgrind on my
> laptop, but maybe someone that's been successful at valgrind can try
> cachegrind (It's just another mode of valgrind).

I've used Cachegrind, and think it's pretty good. You still need a
test case that exercises what you're interested in, though.
Distributed costs are really hard to quantify. Sometimes that's
because they don't exist, and sometimes it's because they can only be
quantified as part of a value judgement.

As frustrated as I've sometimes been with those discussions, I do
recognize that there has to be a middle ground, and that the emphasis
on distributed costs has as much to do with fairness for every
contributor as anything else. I would have appreciated some attempt to
have quantified the overhead here, but would not have insisted on
Robert being as thorough as he conceivably could have been.

-- 
Peter Geoghegan

Re: Size of Path nodes

От

Jim Nasby

Дата:

05 декабря 2015 г., 00:38:41

On 12/4/15 5:14 PM, Peter Geoghegan wrote:
> On Fri, Dec 4, 2015 at 2:44 PM, Jim Nasby<Jim.Nasby@bluetreble.com>  wrote:
>> >I suspect Cachegrind[1] would answer a lot of these questions (though I've
>> >never actually used it). I can't get postgres to run under valgrind on my
>> >laptop, but maybe someone that's been successful at valgrind can try
>> >cachegrind (It's just another mode of valgrind).
> I've used Cachegrind, and think it's pretty good. You still need a
> test case that exercises what you're interested in, though.
> Distributed costs are really hard to quantify. Sometimes that's
> because they don't exist, and sometimes it's because they can only be
> quantified as part of a value judgement.

If we had a good way to run cachegrind (and maybe if it was run 
automatically somewhere) then at least we'd know what effect a patch had 
on things. (For those not familiar, there is a tool for diffing too 
cachegrind runs). Knowing is half the battle and all that.

Another interesting possibility would be a standardized perf test. [1] 
makes an argument for that.

Maybe a useful way to set stuff like this up would be to create support 
for things to run in travis-ci. Time-based tools presumably would be 
useless, but something doing analysis like cachegrind would probably be 
OK (though I think they do cap test runs to an hour or something).

[1] https://news.ycombinator.com/item?id=8426302
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Size of Path nodes

От

Robert Haas

Дата:

05 декабря 2015 г., 01:41:34

On Fri, Dec 4, 2015 at 6:14 PM, Peter Geoghegan <pg@heroku.com> wrote:
> As frustrated as I've sometimes been with those discussions, I do
> recognize that there has to be a middle ground, and that the emphasis
> on distributed costs has as much to do with fairness for every
> contributor as anything else. I would have appreciated some attempt to
> have quantified the overhead here, but would not have insisted on
> Robert being as thorough as he conceivably could have been.

Honestly, I really never thought of the issue of whether this would
push some structure over a power-of-two size.  It's not like either
Path or IndexPath have comments saying "for the love of heaven, don't
make this bigger".  This is in contrast to, say, PGXACT, where there
is a such a comment, because I knew it was an issue and I added a
comment explaining that issue.  Now maybe I should have foreseen the
objection anyway, but, you know, it's hard to foresee everything
somebody might object to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Size of Path nodes

От

Amit Kapila

Дата:

05 декабря 2015 г., 04:25:55

On Sat, Dec 5, 2015 at 2:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Dec 4, 2015 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > So over in my private branch where I've been fooling around with upper
> > planner pathification, I've been sweating bullets to avoid enlarging the
> > size of struct Path, because I was aware that common path types like
> > IndexPath and AppendPath were already a power-of-2 size (at least on
> > 64-bit machines), meaning that palloc'ing them would cost twice as much
> > space if I added any fields.
> >
> > When I got around to merging up to HEAD, I found this in commit f0661c4e8:
> >
>
> > While I'm bitching about this: a field added to a struct as fundamental
> > as Path ought to have a pretty well-defined meaning, and this does not
> > as far as I can tell. There was certainly no documentation worthy of
> > the name added by the above commit. What is the semantic difference
> > between a Path with this flag set and the identical Path without?
> > Likewise for struct Plan?
>
> /me crawls into a hole.
>
> Well, funny story about that. The parallel sequential scan code
> forgot to make that flag actually do what it was budgeted to do, but
> it turns out not to matter as long as the only thing that can be done
> in parallel is a sequential scan. So there's no live bug right now,
> and the parallel join patch adds the missing code to make that flag
> actually meaningful. So I'm not very surprised that you found the
> current state of things stupid. The reason it's needed is because the
> parallel join patch generates plans like this:
>
> Gather
> -> Hash Join
> -> Parallel Seq Scan
> -> Hash
> -> Seq Scan
>
> The idea of this plan is that each worker builds a hash table on the
> inner rel, and then reads a subset of the outer rel and performs the
> join for the subset of rows which it reads. Then the Gather brings
> all the worker results together in the leader. Well, when I initially
> coded this, it wasn't working properly, and I couldn't for the life of
> me figure out why.
>
> Eventually, after hours of debugging, I realized that while I'd added
> this parallel_aware flag with the idea of distinguishing between the
> outer seq scan (which needs to be split between the workers) from the
> inner seq scan (which needs to be executed to completion by every
> worker) there was nothing in the code which would actually cause it to
> work like that. The inner seq scan was behaving as if it were
> parallel - that is, it returned only a subset of the rows in each
> worker - even though the parallel_aware flag was not set. I had
> thought that there was some logic in execParallel.c or, uh, somewhere,
> which would give that flag meaning but it turns out that, in master,
> there isn't. So the parallel join patch, which is otherwise just an
> optimizer patch, also adds the missing bits to execParallel.c. Maybe
> I should have committed that separately as a bug fix.
>
> To try to clarify a little further, execParallel.c contains three
> functions that make passes over the PlanState tree. The first one is
> made before creating the parallel DSM, and its purpose is to estimate
> the amount of space that each plan node will require in the DSM (it
> can overestimate, but it can't underestimate). The second one is made
> after creating the parallel DSM, and gives plan nodes a chance to
> initialize the space they estimated they would use (or some lesser
> amount). The third one is made in each worker, and gives plan nodes a
> chance to fish out a pointer to the space previously initialized in
> the master. All of this processing is of course skipped for plan
> nodes that have no parallel smarts. But the parallel-aware flag is
> supposed to allow these steps to be skipped even for a node that does
> have parallel smarts, so that those smarts can in effect be shut off
> when they're not desired.
>
>
> Of course, it would be possible to rejigger things to avoid having
> this flag in every path. Currently, SeqScans are the only plan type
> that is ever parallel-aware, so we could put the plan in only that
> path type. However, I expect to want to expand the set of
> parallel-aware paths pretty soon. In particular, I want to add at
> least AppendPath and IndexPath to the list.
>

To add to whatever has been said above, intention of adding that flag

was also to avoid adding new nodes for parallelism. Basically modify

the existing nodes (like SeqScan) to take care of whatever is needed

for parallel execution. Now fortunately, not much is needed for Seq Scan

(except for explain part and distinguishing backward scan), but I think

that will not be the case going forward when we need to make other nodes

like Append parallel aware. The need and idea for this flag has been

proposed and discussed on parallel seq scan thread [1].

[1] - http://www.postgresql.org/message-id/CA+Tgmoa4=KaObDcnvADrKY5WwEWDN+rnGAfjzrc=d3UG0sz22w@mail.gmail.com

With Regards,
Amit Kapila.

EnterpriseDB: http://www.enterprisedb.com

Re: Size of Path nodes

От

Robert Haas

Дата:

05 декабря 2015 г., 17:20:05

On Fri, Dec 4, 2015 at 4:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> If it's really true that the extra byte I added there has doubled
> planner memory use, then that's definitely cause for concern.
> However, I am skeptical that's really what has happened here.  Not
> every path has crossed a power-of-two threshold, and paths are not the
> only things the planner allocates.  What's a reasonable way to assess
> the effect of this on planner memory use in general?

So I did a really crude test of this.  I put
MemoryContextStats(MessageContext) - which seems to be where the
planner garbage is going - at the end of the message-processing loop,
and then ran the regression tests with and without parallel_aware in
the Path structure.  Then I ran a little grep and awk magic over the
postmaster logs and compared the sizes of contexts.  For reasons I
haven't tracked down, the number of instrumentation lines I got with
and without the flag differed.  But the overall pattern seems pretty
clear.  In the results below, the "without" number is the number of
times MessageContext had allocated the specified amount of storage
space without the parallel_aware flag; the "with" number is the number
of times it had allocated the specified amount of storage with the
parallel_aware flag.

size 8192 without 7589 with 7605
size 16384 without 6074 with 6078
size 16448 without 1 with 1
size 24576 without 26 with 27
size 24640 without 75 with 68
size 26448 without 3 with 3
size 32768 without 1747 with 1760
size 36512 without 0 with 1
size 42832 without 1 with 1
size 57344 without 7 with 9
size 57520 without 151 with 152
size 65536 without 1319 with 1349
size 66448 without 1 with 1
size 73728 without 4 with 5
size 73792 without 1 with 1
size 73904 without 2 with 2
size 116448 without 4 with 4
size 122880 without 4 with 4
size 131072 without 631 with 638
size 139264 without 12 with 12
size 216512 without 4 with 4
size 262144 without 496 with 504
size 270336 without 5 with 5
size 316448 without 1 with 1
size 516448 without 2 with 2
size 524288 without 73 with 74
size 532480 without 1 with 1
size 816512 without 1 with 1
size 1048576 without 19 with 19
size 1216448 without 1 with 0
size 2097152 without 4 with 5
queries_with=18337 queries_without=18259 total_with=612886960
total_without=605744096

What I think this is showing is that making Path bigger occasionally
pushes palloc over a boundary so that it allocates another chunk, but
most of the time t doesn't.  Also, it suggests to me that if we're
concerned about keeping memory utilization tight on these kinds of
queries, we could think about changing palloc's allocation pattern.
For example, if we did 8k, 16k, 32k, 64k, 64k, 64k, 64k, 128k, 128k,
128k, 128k, 256k, 256k, 256k, 256k ... a lot of these queries would
consume less memory.

If there's a strong feeling that I should find a way to make this
testing more rigorous, I'm willing to do so, but I suspect that we're
not going to find anything very exciting here.  A more likely angle of
investigation here is to try to figure out what a worst case for
enlarging the Path structure might look like, and test that.  I don't
have a brilliant idea there right at the moment, but I'll mull it
over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Size of Path nodes

От

Tom Lane

Дата:

05 декабря 2015 г., 17:35:10

Amit Kapila <amit.kapila16@gmail.com> writes:
> To add to whatever has been said above, intention of adding that flag
> was also to avoid adding new nodes for parallelism.  Basically modify
> the existing nodes (like SeqScan) to take care of whatever is needed
> for parallel execution.

TBH, I would say that that's a damn-fool idea.  I think you should instead
create a separate ParallelSeqScan path type and plan type, and the same
for every other thing that becomes parallel-aware.  The planner does not
normally[1] use the same path type to represent two fundamentally different
execution plans with enormously different cost estimates, but that is the
direction you want to push in for parallel query.  I think it will lead to
a mess: lots of unreadable code that has to do things in a way unlike the
code around it, and lots of bugs-of-omission in places that should have
distinguished seq and parallel cases but didn't.
        regards, tom lane

[1] Yes, I'm aware that UniquePath is an exception.  It has reasons to
live though, in particular that if it were multiple path types there would
still need to be places that understood the common property of those path
types all delivering unique results.  I do not see any corresponding
benefit of fuzzing the distinction between SeqScan and ParallelSeqScan.

Re: Size of Path nodes

От

Robert Haas

Дата:

05 декабря 2015 г., 18:43:40

On Sat, Dec 5, 2015 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> To add to whatever has been said above, intention of adding that flag
>> was also to avoid adding new nodes for parallelism.  Basically modify
>> the existing nodes (like SeqScan) to take care of whatever is needed
>> for parallel execution.
>
> TBH, I would say that that's a damn-fool idea.  I think you should instead
> create a separate ParallelSeqScan path type and plan type, and the same
> for every other thing that becomes parallel-aware.  The planner does not
> normally[1] use the same path type to represent two fundamentally different
> execution plans with enormously different cost estimates, but that is the
> direction you want to push in for parallel query.  I think it will lead to
> a mess: lots of unreadable code that has to do things in a way unlike the
> code around it, and lots of bugs-of-omission in places that should have
> distinguished seq and parallel cases but didn't.

Maybe.  But if we go down that path, we're eventually going to have
ParallelSeqScan, ParallelIndexScan, ParallelBitmapHeapScan,
ParallelForeignScan, ParallelAppend, ParallelHashJoin, and probably a
bunch of others.  That could lead to a lot of code duplication.  Even
for ParallelSeqScan:
src/backend/executor/nodeSeqscan.c | 136
++++++++++++++++++++++++++++++++++++++++++++++++++-----------------1 file changed, 103 insertions(+), 33 deletions(-)

Now that file has 344 lines today, but only 33 of those lines are
actual changes to pre-existing code, and most of those are mechanical.
So the effect of making that a whole separate node type would have
been to copy about 200 lines of code and comments.  That didn't seem
like a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Size of Path nodes

От

Greg Stark

Дата:

05 декабря 2015 г., 18:46:19

On Sat, Dec 5, 2015 at 5:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The planner does not
> normally[1] use the same path type to represent two fundamentally different
> execution plans with enormously different cost estimates

Eh, this is a typical object modelling dilemma. There are lots of
different path types and many of them share a lot of properties but
depending on which way you look at things different sets of them seem
to be the same thing. BitmapScan is very like a Seqscan and
MergeAppend is like Append but have different node types, but nodes
with Filters attached didn't become FilteredSeqScan and
FilteredIndexScan etc... I'm not sure which Parallel is more like and
it may be more convenient for the planner one way and other parts the
other.

-- 
greg

Re: Size of Path nodes

От

Tom Lane

Дата:

05 декабря 2015 г., 19:29:48

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Dec 5, 2015 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Amit Kapila <amit.kapila16@gmail.com> writes:
>>> To add to whatever has been said above, intention of adding that flag
>>> was also to avoid adding new nodes for parallelism.  Basically modify
>>> the existing nodes (like SeqScan) to take care of whatever is needed
>>> for parallel execution.

>> TBH, I would say that that's a damn-fool idea.  I think you should instead
>> create a separate ParallelSeqScan path type and plan type, and the same
>> for every other thing that becomes parallel-aware.

> Maybe.  But if we go down that path, we're eventually going to have
> ParallelSeqScan, ParallelIndexScan, ParallelBitmapHeapScan,
> ParallelForeignScan, ParallelAppend, ParallelHashJoin, and probably a
> bunch of others.  That could lead to a lot of code duplication.  Even
> for ParallelSeqScan:

>  src/backend/executor/nodeSeqscan.c | 136
> ++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 103 insertions(+), 33 deletions(-)

> Now that file has 344 lines today, but only 33 of those lines are
> actual changes to pre-existing code, and most of those are mechanical.
> So the effect of making that a whole separate node type would have
> been to copy about 200 lines of code and comments.  That didn't seem
> like a good idea.

Meh.  Size of first patch is a pretty bad guide to whether a particular
structural decision is going to be a good idea in the long run.  Do you
need some examples of patches we've rejected because they were done in a
way that minimized the size of the patch rather than a way that made sense
from a system structural viewpoint?

I would also point out that if we'd followed this approach in the past,
nodeIndexscan.c, nodeBitmapIndexscan.c and nodeIndexonlyscan.c would be
one file, and it would be nigh unintelligible.  Those files nonetheless
manage to share code where it makes sense for them to.

In any case, even if there's an argument for keeping the two cases
together in the executor, I remain of the opinion that you'd be better off
with them being distinct Path types in the planner.
        regards, tom lane

Re: Size of Path nodes

От

Robert Haas

Дата:

06 декабря 2015 г., 02:11:13

On Sat, Dec 5, 2015 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Meh.  Size of first patch is a pretty bad guide to whether a particular
> structural decision is going to be a good idea in the long run.  Do you
> need some examples of patches we've rejected because they were done in a
> way that minimized the size of the patch rather than a way that made sense
> from a system structural viewpoint?

You can provide some examples if you like, but I thought about the
issue of overall system structure and settled on this design after
coming to the conclusion that this design was better from that
viewpoint. If it becomes clear that I'm wrong, fine: we can change it.
I think I've done a pretty good job - I'm going to go out on a limb
and say an excellent job - modularizing the parallelism code so that,
if it turns out I've picked the wrong design for some part of it, we
can replace just that part without having a huge impact on all the
rest of it.  But at this point I am not by any means convinced that
I've got it wrong.  I've been working on this for several years and
have thought about it quite deeply, discussed it extensively on and
off the mailing list with anyone and everyone who was willing to
participate in that discussion, and read what relevant literature I
could find.  That doesn't guarantee that I've got it right, but if
your premise here is that I haven't thought about system structure,
I'm going to say that premise is wrong.  I've thought about it a lot.

> I would also point out that if we'd followed this approach in the past,
> nodeIndexscan.c, nodeBitmapIndexscan.c and nodeIndexonlyscan.c would be
> one file, and it would be nigh unintelligible.  Those files nonetheless
> manage to share code where it makes sense for them to.

True, but I don't think that's a good analogy.  I think the best
analogy to parallelism is parameterized plans.  And we DON'T have
separate nodes for parameterized index scans and unparameterized index
scans, nor do we have separate path nodes.  We have something called
an index scan and it can be parameterized or unparameterized, and we
keep track of that distinction during both planning and execution.  So
here.  For example, in the case of nodeSeqscan.c, almost two-thirds of
the new lines of code are to add three additional methods that aren't
called in parallel aware mode, and about half of the rest are due to a
SeqScanState now having one integer field that's not part of
ScanState, which require mechanical changes in a bunch of places ("ps"
becomes "ss.ps").  The rest amounts to 15-20 lines of real code
change.

It may be useful context - although you will already know this, if you
have been reading the relevant threads - to know that up until a
couple of months ago I actually had this exactly the way you are
describing, with parallel sequential scan as a separate node type,
mostly because I know that you've leaned in the direction of making
things separate node types whenever there's any doubt about the
matter.  The main reason I changed my mind, aside from the massive
code duplication that seemed to be resulting, was the realization that
every scan type needs a parallel-aware mode, and that the
parallel-aware mode need to degenerate back to exactly the same thing
as the non-parallel-aware case when no workers are available.  It
makes no sense to me to say that we should double the number of
executor nodes, or even the number of path nodes, just because we have
parallelism.  I'm sure we'd never actually double it, as there are
some plan nodes like Result for which I can't foresee a parallel-aware
version.  But we'd add a heck of a lot of them, and in every case the
parallel-aware version would be a strict superset of what the
non-parallel-aware version can do.

Consider Append, for example.  A regular Append runs all of the child
plans one after another.  A parallel-aware Append, like any other
parallel-aware node, will be running in multiple copies, one per
worker.  Well, what it should do in order to minimize contention is
try to spread out the workers among the subplans so as to minimize
contention, rather than throwing all the workers at the first subplan,
then when that gets done throwing them all at the second subplan, and
so forth.  See the parallel append thread for further discussion of
these issues.  Now, if you are in a parallel plan, you always want
this behavior: there is never any reason (at least that I can see
currently) to prefer the regular Append behavior (though the
architecture I've chosen could support that if it turns out we need
it).  And if you are not in a parallel plan, the behavior of the
parallel-aware version degenerates to exactly what our current Append
node does anyway.  So if you separate that into two nodes, a regular
Append and a Parallel Append, it's just stupid.  You just have one
node that is a dumber version of another node, and you have code
somewhere to distinguish between them when you could just as well
always use the more capable one.

I believe this kind of pattern is quite general.  Most - though maybe
not all - parallel-aware nodes are going to do basically the same
thing that they do in a regular computation with a little bit of extra
smarts specific to parallelism.  If we have parallel-aware nodes that
do things that are noticeably different than what the non-parallel
aware versions do, well, your bias in this matter is certainly known
to me and I will respect it.  But I can't stomach the idea that as
soon as we add even one line of code to cater to parallelism to a
node, we have to duplicate that node and all of the supporting
infrastructure.  I really doubt you would be happy with that design in
the long term either, or even the medium term.

We have drifted a bit from the subject of this thread.  Maybe that is
OK, but we should make sure to finish dealing with the original issue
you raised.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Size of Path nodes

От

Robert Haas

Дата:

06 декабря 2015 г., 02:12:33

On Sat, Dec 5, 2015 at 9:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> here.  For example, in the case of nodeSeqscan.c, almost two-thirds of
> the new lines of code are to add three additional methods that aren't
> called in parallel aware mode, and about half of the rest are due to a
> SeqScanState now having one integer field that's not part of
> ScanState, which require mechanical changes in a bunch of places ("ps"
> becomes "ss.ps").  The rest amounts to 15-20 lines of real code
> change.

Oops, bad proofreading.  It should say "are ONLY called in parallel
mode", not "aren't called in parallel aware mode".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Size of Path nodes