Tuplesort merge pre-reading

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Tuplesort merge pre-reading
Дата
Msg-id f298f77a-cf06-e70c-d5a4-a20b472b4627@iki.fi
обсуждение исходный текст
Ответы Re: Tuplesort merge pre-reading  (Peter Geoghegan <pg@heroku.com>)
Re: [HACKERS] Tuplesort merge pre-reading  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
While reviewing Peter's latest round of sorting patches, and trying to
understand the new "batch allocation" mechanism, I started to wonder how
useful the pre-reading in the merge stage is in the first place.

I'm talking about the code that reads a bunch of from each tape, loading
them into the memtuples array. That code was added by Tom Lane, back in
1999:

commit cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Oct 30 17:27:15 1999 +0000

     Further performance improvements in sorting: reduce number of
comparisons
     during initial run formation by keeping both current run and next-run
     tuples in the same heap (yup, Knuth is smarter than I am).  And, during
     merge passes, make use of available sort memory to load multiple tuples
     from any one input 'tape' at a time, thereby improving locality of
     access to the temp file.

So apparently there was a benefit back then, but is it still worthwhile?
The LogicalTape buffers one block at a time, anyway, how much gain are
we getting from parsing the tuples into SortTuple format in batches?

I wrote a quick patch to test that, attached. It seems to improve
performance, at least in this small test case:

create table lotsofints(i integer);
insert into lotsofints select random() * 1000000000.0 from
generate_series(1, 10000000);
vacuum freeze;

select count(*) FROM (select * from lotsofints order by i) t;

On my laptop, with default work_mem=4MB, that select takes 7.8 s on
unpatched master, and 6.2 s with the attached patch.

So, at least in some cases, the pre-loading hurts. I think we should get
rid of it. This patch probably needs some polishing: I replaced the
batch allocations with a simpler scheme with a buffer to hold just a
single tuple for each tape, and that might need some more work to allow
downsizing those buffers if you have a few very large tuples in an
otherwise narrow table. And perhaps we should free and reallocate a
smaller memtuples array for the merging, now that we're not making use
of the whole of it. And perhaps we should teach LogicalTape to use
larger buffers, if we can't rely on the OS to do the buffering for us.
But overall, this seems to make the code both simpler and faster.

Am I missing something?

- Heikki


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Declarative partitioning - another take
Следующее
От: Christian Ullrich
Дата:
Сообщение: Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013