Re: Tuplesort merge pre-reading

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Tuplesort merge pre-reading
Дата
Msg-id CAM3SWZRtiAOMsou4C9EMASWcCE2WgaCEq--eWvw0QkT=Jv4kgw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Tuplesort merge pre-reading  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Thu, Sep 15, 2016 at 1:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I still don't get why you're doing all of this within mergeruns() (the
>> beginning of when we start merging -- we merge all quicksorted runs),
>> rather than within beginmerge() (the beginning of one particular merge
>> pass, of which there are potentially more than one). As runs are
>> merged in a non-final merge pass, fewer tapes will remain active for
>> the next merge pass. It doesn't do to do all that up-front when we
>> have multiple merge passes, which will happen from time to time.
>
>
> Now that the pre-reading is done in logtape.c, it doesn't stop at a run
> boundary. For example, when we read the last 1 MB of the first run on a
> tape, and we're using a 10 MB read buffer, we will merrily also read the
> first 9 MB from the next run. You cannot un-read that data, even if the tape
> is inactive in the next merge pass.

I'm not sure that I like that approach. At the very least, it seems to
not be a good fit with the existing structure of things. I need to
think about it some more, and study how that plays out in practice.

> BTW, does a 1-way merge make any sense?

Not really, no, but it's something that I've seen plenty of times.

This is seen when runs are distributed such that mergeonerun() only
finds one real run on all active tapes, with all other active tapes
having only dummy runs. In general, dummy runs are "logically merged"
(decremented) in preference to any real runs on the same tape (that's
the reason why they exist), so you end up with what we call a "1-way
merge" when you see one real one on one active tape only. You never
see something like "0-way merge" within trace_sort output, though,
because that case is optimized to be almost a no-op.

It's almost a no-op because when it happens then mergeruns() knows to
itself directly decrement the number of dummy runs once for each
active tape, making that "logical merge" completed with only that
simple change in metadata (that is, the "merge" completes by just
decrementing dummy run counts -- no actual call to mergeonerun()).

We could optimize away "1-way merge" cases, perhaps, so that tuples
don't have to be spilt out one at a time (there could perhaps instead
be just some localized change to metadata, a bit like the all-dummy
case). That doesn't seem worth bothering with, especially with this
new approach of yours. I prefer to avoid special cases like that.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Renaming some binaries
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)