Re: Parallel tuplesort (for parallel B-Tree index creation)
От | Claudio Freire |
---|---|
Тема | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Дата | |
Msg-id | CAGTBQpY9esgo4+qC=v3coMd_-uG1VCGQ27h5aLh+L6Uq0GfvfQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: Parallel tuplesort (for parallel B-Tree index creation)
(Peter Geoghegan <pg@heroku.com>)
|
Список | pgsql-hackers |
On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> I noticed, but here n = state->memtupcount >> >> + Assert(memtuples[0].tupindex == newtup->tupindex); >> + >> + CHECK_FOR_INTERRUPTS(); >> + >> + n = state->memtupcount; /* n is heap's size, >> including old root */ >> + imin = 0; /* >> start with caller's "hole" in root */ >> + i = imin; > > I'm fine with using "n" in the later assertion you mentioned, if > that's clearer to you. memtupcount is broken out as "n" simply because > that's less verbose, in a place where that makes things far clearer. > >> In fact, the assert on the patch would allow writing memtuples outside >> the heap, as in calling tuplesort_heap_root_displace if >> memtupcount==0, but I don't think that should be legal (memtuples[0] >> == memtuples[imin] would be outside the heap). > > You have to have a valid heap (i.e. there must be at least one > element) to call tuplesort_heap_root_displace(), and it doesn't > directly compact the heap, so it must remain valid on return. The > assertion exists to make sure that everything is okay with a > one-element heap, a case which is quite possible. More than using "n" or "memtupcount" what I'm saying is to assert that memtuples[imin] is inside the heap, which would catch the same errors the original assert would, and more. Assert(imin < state->memtupcount) If you prefer. The original asserts allows any value of imin for memtupcount>1, and that's my main concern. It shouldn't. On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Sure, that's a weird enough case (that assert up there already reads >> memtuples[0] which would be equally illegal if memtupcount==0), but it >> goes on to show that the assert expression just seems odd for its >> intent. >> >> BTW, I know it's not the scope of the patch, but shouldn't >> root_displace be usable on the TSS_BOUNDED phase? > > I don't think it should be, no. With a top-n heap sort, the > expectation is that after a little while, we can immediately determine > that most tuples do not belong in the heap (this will require more > than one comparison per tuple when the tuple that may be entered into > the heap will in fact go in the heap, which should be fairly rare > after a time). That's why that general strategy can be so much faster, > of course. I wasn't proposing getting rid of that optimization, but just replacing the siftup+insert step with root_displace... > Note that that heap is "reversed" -- the sort order is inverted, so > that we can use a minheap. The top of the heap is the most marginal > tuple in the top-n heap so far, and so is the next to be removed from > consideration entirely (not the next to be returned to caller, when > merging). ...but I didn't pause to consider that point. It still looks like a valid optimization, instead rearranging the heap twice (siftup + insert), do it once (replace + relocate). However, I agree that it's not worth the risk conflating the two optimizations. That one can be done later as a separate patch.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: Parallel tuplesort (for parallel B-Tree index creation)
Следующее
От: Peter GeogheganДата:
Сообщение: Re: Parallel tuplesort (for parallel B-Tree index creation)