Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Дата
Msg-id CAAaqYe-i7D+D+oih1AZ5QyyAMSWKjzNT7+eBz-RMr9i18zG96g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Tue, Jun 25, 2019 at 4:32 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Tue, Jun 25, 2019 at 12:13:01PM -0700, Peter Geoghegan wrote:
> >On Tue, Jun 25, 2019 at 11:03 AM James Coleman <jtc331@gmail.com> wrote:
> >> No, I haven't confirmed that it's called less frequently, and I'd be
> >> extremely surprised if it were given the diff doesn't suggest any
> >> changes to that at all.
> >
> >I must have misunderstood, then. I thought that you were suggesting
> >that that might have happened.
> >
> >> If you think it's important enough to do so, I can instrument it to
> >> confirm, but I was mostly wanting to know if there were any other
> >> plausible explanations, and I think you've provided one: there *are*
> >> changes in the patch to memory contexts in tuplesort.c, so if memory
> >> fragmentation is a real concern this patch could definitely notice
> >> changes in that regard.
> >
> >Sounds like it's probably fragmentation. That's generally hard to measure.
> >
>
> I'm not sure I'm really conviced this explains the difference, because
> the changes in tuplesort.c are actually fairly small - we do split the
> tuplesort context into two, but vast majority of the stuff is allocated
> in one of the contexts (essentially just the tuplesort state gets moved
> to a new context). I wouldn't expect this to have such strong impact on
> locality/fragmentation.

OTOH it is as you noted heavily dependent on data...so it's hard to
say if it's a real win or not.

> But maybe it does - in that case it seems it might be worthwile to do it
> separately, irrespectedly of the incremental sort patch. I wonder if
> perf would show that as cache hits/misses, or something?
>
> It shouldn't be that difficult to separate this change into a separate
> patch, and benchmark it on it's own, though.

I don't know enough about perf to say, but unless this ends up being a
sticking point for the patch I'll probably avoid it for now because
there are too many other things to worry about in the patch.

> FWIW while looking at the tuplesort.c changes, I've noticed some
> inaccurate comments in tuplesort_free. Firstly, the top-level comment
> says:
>
> /*
>  * tuplesort_free
>  *
>  *      Internal routine for freeing resources of tuplesort.
>  */
>
> without mentioning which resources it actually releases, so it kinda
> suggests it releases everything. But that's not true - AFAICS it only
> releases the per-sort resources. IMO this is a poor function name, and
> people will easily keep resources longer than they think - we should
> rename it to something like tuplesort_free_batch().
>
> And then at the end tuplesort_free() does this:
>
>     /*
>      * Free the per-sort memory context, thereby releasing all working memory,
>      * including the Tuplesortstate struct itself.
>      */
>     MemoryContextReset(state->sortcontext);
>
> But that's clearly not true, because the tuplesortstate is allocated in
> the maincontext, not sortcontext.
>
> In general, the comments seem to be a bit confused by what 'sort' means.
> Sometimes it means the whole sort operation, sometimes it means one of
> the batches, etc.  And the fact that the per-batch context is called
> sortcontext does not really improve the situation.

There are also quite a few misleading or out of date comments in
nodeIncrementalSort.c as well. I'm currently working on the hybrid
approach I mentioned earlier, but once the patch proper looks like
we're coming close to addressing the performance concerns/costing I'll
look at doing a pass through the comments to clean them up.

Unrelated: if you or someone else you know that's more familiar with
the parallel code, I'd be interested in their looking at the patch at
some point, because I have a suspicion it might not be operating in
parallel ever (either that or I don't know how to trigger it), but I'm
not really familiar with that stuff at all currently. :)

James Coleman



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Don't allocate IndexAmRoutine dynamically?