Re: Shrinking tuplesort.c's SortTuple struct (Was: More ideas forspeeding up sorting)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Shrinking tuplesort.c's SortTuple struct (Was: More ideas forspeeding up sorting)
Дата
Msg-id CAH2-WznjUiA4aD19OJ1rfiUjZsp6qViWTbxtgr=Ah3FCEr1AjA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Shrinking tuplesort.c's SortTuple struct (Was: More ideas forspeeding up sorting)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Sat, Aug 10, 2019 at 1:20 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm. Wouldn't it be more straightforward to have the extra tupindex
> field at the end of the struct?

> The initial sorting phase would deal with SortTuples, and the merge
> phase would deal with MergeTuples. The same comparison routines work
> with both.

Maybe, but then you would have to use MergeTuples in the
tuplesort_heap* routines, which are not just used when merging
external sort runs. You'd probably incur a penalty for top-N heap
sorts too. Now, that could still be worth it, but it's something to
consider.

> If you separate the NULLs from non-NULLs in a separate array, as was
> discussed back in 2016, instead of stealing a bit, you can squeeze some
> instructions out of the comparison routines, which might give some extra
> speedup.

That might work well, but partitioning the memtuples array isn't
trivial. Routines like grow_memtuples() still need to work, and that
seems like it would be tricky. So again, this may well be a better way
to do it, but that isn't obvious.

> > But in cases that users really care about, such as REINDEX,
> > the difference is in the noise. ISTM that this is simple not worth the
> > trouble at this time. These days, external sorts are often slightly
> > faster than internal sorts in practice, due to the fact that we can do
> > an on-the-fly merge with external sorts, so we could easily hurt
> > performance by making more memory available!
>
> Yeah, that's a bit sad.

I think that this is likely to be the problem with any combination of
enhancements that remove fields from the SortTuple struct, to get it
down to 16 bytes: Making  SortTuples only 16 bytes just isn't that
compelling.

> That makes me think: even when everything fits in memory, it might make
> sense to divide the input into a few batches, qsort them individually,
> and do an on-the-fly merge of the batches. I guess I'm essentially
> suggesting that we should use merge instead of quicksort for the
> in-memory case, too.

That might make sense. The Alphasort paper [1] recommends using
quicksort on CPU-cached sized chunks, and merging the chunks together
as they're written out as a single on-disk run. The Alphasort paper is
probably the first place where the abbreviated keys technique is
described, and had a lot of good ideas.

> If we had the concept of in-memory batches, you could merge together
> in-memory and external batches. That might be handy. For example, when
> doing an external sort, instead of flushing the last run to disk before
> you start merging, you could keep it in memory. That might be
> significant in the cases where the input is only slightly too big to fit
> in memory.

The patch that I wrote to make tuplesort.c use quicksort in preference
to replacement selection sort for generating initial runs starting out
with an implementation of something that I called "quicksort with
spillover". The idea was that you could only spill a few extra tuples
to disk when you almost had enough workMem, and then merge the on-disk
run with the larger, quicksorted in memory run. It worked alright, but
it felt more important to make external sorts use quicksort in
general. Robert Haas really hated it at the time, because it relied on
various magic numbers, based on heuristics.

The easiest and least controversial way to make internal sorting
faster may be to update our Quicksort algorithm to use the same
implementation that was added to Java 7 [2]. It uses all of the same
tricks as our existing the Bentley & McIlroy implementation, but is
more cache efficient. It's considered the successor to B&M, and had
input from Bentley himself. It is provably faster than B&M for a wide
variety of inputs, at least on modern hardware.

[1] http://www.vldb.org/journal/VLDBJ4/P603.pdf
[2] https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Следующее
От: legrand legrand
Дата:
Сообщение: Re: [survey] New "Stable" QueryId based on normalized query text