Обсуждение: Re: [PERFORM] Releasing memory during External sorting?

Поиск
Список
Период
Сортировка

Re: [PERFORM] Releasing memory during External sorting?

От
Tom Lane
Дата:
Ron Peacetree <rjpeace@earthlink.net> writes:
> 2= No optimal external sorting algorithm should use more than 2 passes.
> 3= Optimal external sorting algorithms should use 1 pass if at all possible.

A comparison-based sort must use at least N log N operations, so it
would appear to me that if you haven't got approximately log N passes
then your algorithm doesn't work.

            regards, tom lane

Re: [PERFORM] Releasing memory during External sorting?

От
Tom Lane
Дата:
Mark Lewis <mark.lewis@mir3.com> writes:
> operations != passes.  If you were clever, you could probably write a
> modified bubble-sort algorithm that only made 2 passes.  A pass is a
> disk scan, operations are then performed (hopefully in memory) on what
> you read from the disk.  So there's no theoretical log N lower-bound on
> the number of disk passes.

Given infinite memory that might be true, but I don't think I believe it
for limited memory.  If you have room for K tuples in memory then it's
impossible to perform more than K*N useful comparisons per pass (ie, as
each tuple comes off the disk you can compare it to all the ones
currently in memory; anything more is certainly redundant work).  So if
K < logN it's clearly not gonna work.

It's possible that you could design an algorithm that works in a fixed
number of passes if you are allowed to assume you can hold O(log N)
tuples in memory --- and in practice that would probably work fine,
if the constant factor implied by the O() isn't too big.  But it's not
really solving the general external-sort problem.

            regards, tom lane

Re: [PERFORM] Releasing memory during External sorting?

От
Mark Lewis
Дата:
operations != passes.  If you were clever, you could probably write a
modified bubble-sort algorithm that only made 2 passes.  A pass is a
disk scan, operations are then performed (hopefully in memory) on what
you read from the disk.  So there's no theoretical log N lower-bound on
the number of disk passes.

Not that I have anything else useful to add to this discussion, just a
tidbit I remembered from my CS classes back in college :)

-- Mark

On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> Ron Peacetree <rjpeace@earthlink.net> writes:
> > 2= No optimal external sorting algorithm should use more than 2 passes.
> > 3= Optimal external sorting algorithms should use 1 pass if at all possible.
>
> A comparison-based sort must use at least N log N operations, so it
> would appear to me that if you haven't got approximately log N passes
> then your algorithm doesn't work.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match