Re: Size of Path nodes

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Size of Path nodes
Дата
Msg-id CA+TgmobamTriZkaDzeRS6eza7sqpt8dUUGgKL0RayPifbRUBwg@mail.gmail.com
обсуждение исходный текст
Ответ на Size of Path nodes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Size of Path nodes  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Size of Path nodes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Dec 4, 2015 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So over in my private branch where I've been fooling around with upper
> planner pathification, I've been sweating bullets to avoid enlarging the
> size of struct Path, because I was aware that common path types like
> IndexPath and AppendPath were already a power-of-2 size (at least on
> 64-bit machines), meaning that palloc'ing them would cost twice as much
> space if I added any fields.
>
> When I got around to merging up to HEAD, I found this in commit f0661c4e8:
>
> @@ -753,6 +753,7 @@ typedef struct Path
>
>      RelOptInfo *parent;            /* the relation this path can build */
>      ParamPathInfo *param_info;     /* parameterization info, or NULL if none */
> +    bool        parallel_aware;    /* engage parallel-aware logic? */
>
>      /* estimated size/costs for path (see costsize.c for more info) */
>      double        rows;            /* estimated number of result tuples */
>
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.  Or else he should
> start worrying.  Do we really need this field?  Did anyone pay any
> attention to what happened to planner space consumption and performance
> when the parallel-scan patch went in?  Or am I just too conditioned by
> working with last-century hardware, and I should stop caring about how
> large these nodes are?

If it's really true that the extra byte I added there has doubled
planner memory use, then that's definitely cause for concern.
However, I am skeptical that's really what has happened here.  Not
every path has crossed a power-of-two threshold, and paths are not the
only things the planner allocates.  What's a reasonable way to assess
the effect of this on planner memory use in general?

I really HOPE that this isn't a serious problem.  The pending patch to
all parallel joins adds another int right after that bool, and I doubt
rather deeply that'll be the end of it.  I'm skeptical that int should
be a double, and that maybe there should be a few more things.... I
don't expect to need tons of space here, but the current code is
pretty stupid about estimating the number of workers that are needed
for a parallel plan, and at some point we're going to want to make
that less stupid, and that's probably going to require some more
bookkeeping.  I don't know exactly how that's all going to work yet,
but it's pretty hard to make the planner general parallel query paths
for a wide variety of plan types if none of those plan types carry any
information specific to parallel query.

> While I'm bitching about this: a field added to a struct as fundamental
> as Path ought to have a pretty well-defined meaning, and this does not
> as far as I can tell.  There was certainly no documentation worthy of
> the name added by the above commit.  What is the semantic difference
> between a Path with this flag set and the identical Path without?
> Likewise for struct Plan?

/me crawls into a hole.

Well, funny story about that.  The parallel sequential scan code
forgot to make that flag actually do what it was budgeted to do, but
it turns out not to matter as long as the only thing that can be done
in parallel is a sequential scan.  So there's no live bug right now,
and the parallel join patch adds the missing code to make that flag
actually meaningful.  So I'm not very surprised that you found the
current state of things stupid. The reason it's needed is because the
parallel join patch generates plans like this:

Gather
-> Hash Join -> Parallel Seq Scan -> Hash   -> Seq Scan

The idea of this plan is that each worker builds a hash table on the
inner rel, and then reads a subset of the outer rel and performs the
join for the subset of rows which it reads.  Then the Gather brings
all the worker results together in the leader.  Well, when I initially
coded this, it wasn't working properly, and I couldn't for the life of
me figure out why.

Eventually, after hours of debugging, I realized that while I'd added
this parallel_aware flag with the idea of distinguishing between the
outer seq scan (which needs to be split between the workers) from the
inner seq scan (which needs to be executed to completion by every
worker) there was nothing in the code which would actually cause it to
work like that.  The inner seq scan was behaving as if it were
parallel - that is, it returned only a subset of the rows in each
worker - even though the parallel_aware flag was not set.  I had
thought that there was some logic in execParallel.c or, uh, somewhere,
which would give that flag meaning but it turns out that, in master,
there isn't.  So the parallel join patch, which is otherwise just an
optimizer patch, also adds the missing bits to execParallel.c.  Maybe
I should have committed that separately as a bug fix.

To try to clarify a little further, execParallel.c contains three
functions that make passes over the PlanState tree.  The first one is
made before creating the parallel DSM, and its purpose is to estimate
the amount of space that each plan node will require in the DSM (it
can overestimate, but it can't underestimate).  The second one is made
after creating the parallel DSM, and gives plan nodes a chance to
initialize the space they estimated they would use (or some lesser
amount).  The third one is made in each worker, and gives plan nodes a
chance to fish out a pointer to the space previously initialized in
the master.  All of this processing is of course skipped for plan
nodes that have no parallel smarts.  But the parallel-aware flag is
supposed to allow these steps to be skipped even for a node that does
have parallel smarts, so that those smarts can in effect be shut off
when they're not desired.

Of course, it would be possible to rejigger things to avoid having
this flag in every path.  Currently, SeqScans are the only plan type
that is ever parallel-aware, so we could put the plan in only that
path type.  However, I expect to want to expand the set of
parallel-aware paths pretty soon.  In particular, I want to add at
least AppendPath and IndexPath to the list.  So I fear that this
approach will just be kicking the can down the road, and maybe not all
that far down the road.  I'm pretty skeptical that we can continue to
add planner features without making the data structures any bigger.
It looks to me like I'm ending up touching all the same places that
the parameterized path stuff touched - that needed a few extra bytes
in a whole bunch of data structures, and parallelism wants a few bytes
in many of those very same data structures for pretty similar reasons,
and from the sound of it, the upper planner path-ification also wants
a few bytes in all of those places (for tlists, maybe?).  I think that
if we make a rule that IndexPath can never again get bigger, we are
going to have a very difficult time continuing to improve the planner.

One thing to keep in mind is that most modern allocators do not do
power-of-two allocation, at least AFAICT.  They carve up super-blocks
into a bunch of equal sized chunks.  So instead of falling over a
power-of-two boundary where space suddenly doubles, they instead just
move into a larger size class, and those size classes are
closely-enough spaced that this isn't such a big deal.  We're stuck
using palloc, and I can't help noticing that the last time aset.c had
a commit touching a three-digit number of lines was in July of 2000.
It might be time to consider other alternatives there.  I posted a new
allocator on -hackers a while back that didn't get a real warm
reception, but it was significantly more memory-efficient on memory
contexts that contain many allocations than what we're doing today.
It may well be that what I did there wasn't a good plan for any number
of reasons, but the principle is worth thinking about.  Of course, no
allocator can avoid the problem that large allocations will need to
span cache lines, but it CAN avoid doubling memory consumption when
you cross a power-of-two boundary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Remaining 9.5 open items
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Support of partial decompression for datums