Re: Merge algorithms for large numbers of "tapes"

Поиск
Список
Период
Сортировка
От Dann Corbit
Тема Re: Merge algorithms for large numbers of "tapes"
Дата
Msg-id D425483C2C5C9F49B5B7A41F8944154757D5F3@postal.corporate.connx.com
обсуждение исходный текст
Ответ на Merge algorithms for large numbers of "tapes"  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Merge algorithms for large numbers of "tapes"  ("Luke Lonergan" <llonergan@greenplum.com>)
Список pgsql-hackers
> -----Original Message-----
> From: Stephen Frost [mailto:sfrost@snowman.net]
> Sent: Thursday, March 09, 2006 3:49 PM
> To: Tom Lane
> Cc: Luke Lonergan; Jim C. Nasby; Greg Stark; Dann Corbit; Simon Riggs;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > "Luke Lonergan" <llonergan@greenplum.com> writes:
> > > I would only suggest that we replace the existing algorithm with
one
> that
> > > will work regardless of (reasonable) memory requirements.  Perhaps
we
> can
> > > agree that at least 1MB of RAM for external sorting will always be
> available
> > > and proceed from there?
> >
> > If you can sort indefinitely large amounts of data with 1MB
work_mem,
> > go for it.
>
> It seems you two are talking past each other and I'm at least slightly
> confused.  So, I'd like to ask for a bit of clarification and perhaps
> that will help everyone.
>
> #1: I'm as much a fan of eliminating unnecessary code as anyone
> #2: There have been claims of two-pass improving things 400%
> #3: Supposedly two-pass requires on the order of sqrt(total) memory

Two pass does not require sqrt(total) memory.  This figure is clearly
wrong.

Two pass will create the count of subfiles proportional to:
Subfile_count = original_stream_size/sort_memory_buffer_size

The merge pass requires (sizeof record * subfile_count) memory.

Example:
You have a 7 gigabyte table to sort and you have 100 MB sort buffer.
The number of subfiles will be:
7000000000 / 100000000 = 70 files

Suppose that a record is 2K wide.

The merge pass requires 70*2k = 143,360 bytes of RAM.

Suppose that a record is 65535 bytes wide.

The merge pass requires 70*65535 = 4,587,450 bytes of RAM.

> #4: We have planner statistics to estimate size of total
> #5: We have a work_mem limitation for a reason
>
> So, if we get a huge performance increase, what's wrong with:
> if [ sqrt(est(total)) <= work_mem ]; then
>   two-pass-sort();
> else
>   tape-sort();
> fi
>
> ?
>
> If the performance isn't much different and tape-sort can do it with
> less memory then I don't really see any point in removing it.
>
> If the intent is to remove it and then ask for the default work_mem to
> be increased- I doubt going about it this way would work very well. :)
>
>     Thanks,
>
>         Stephen


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Proposal for SYNONYMS
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Merge algorithms for large numbers of "tapes"