Обсуждение: Question about maxTapes & selectnewtape & dumptuples
Hi, merge sorts requires all the tuples in each input are pre-sorted (1), and in tuplesort.c, when the workmem is full, we dumptuples into destTape. (2). it looks to me that we need a unlimited number of Tapes if the number of input tuples is big enough. However we set a maxTapes in 'inittapes' and we may use a pre-used tapes for storing new tuples in 'selectnewtape'. Wouldn't this break the prerequisite (1)? selectnewtape(Tuplesortstate *state) { if (state->nOutputTapes < state->maxTapes) {..} else { /* * We have reached the max number of tapes. Append to an existing * tape. */ state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes]; state->nOutputRuns++; } } for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7), and later (2, 4, 6, 8) are put into it. so the overall of (1, 3, 5, 7, 2, 4, 6, 8) are not sorted? Where did I go wrong? -- Best Regards Andy Fan
On 30/06/2024 12:48, Andy Fan wrote: > merge sorts requires all the tuples in each input are pre-sorted (1), > and in tuplesort.c, when the workmem is full, we dumptuples into > destTape. (2). it looks to me that we need a unlimited number of Tapes > if the number of input tuples is big enough. However we set a maxTapes > in 'inittapes' and we may use a pre-used tapes for storing new tuples in > 'selectnewtape'. Wouldn't this break the prerequisite (1)? > > selectnewtape(Tuplesortstate *state) > { > if (state->nOutputTapes < state->maxTapes) > {..} > else > { > /* > * We have reached the max number of tapes. Append to an existing > * tape. > */ > state->destTape = state->outputTapes[state->nOutputRuns % state->nOutputTapes]; > state->nOutputRuns++; > } > } > > > for example, at the first use of outputTapes[x], it stores (1, 3, 5, 7), > and later (2, 4, 6, 8) are put into it. so the overall of (1, 3, 5, 7, > 2, 4, 6, 8) are not sorted? Where did I go wrong? There's a distinction between "runs" and "tapes". Each "run" is sorted, but there can be multiple runs on a tape. In that case, multiple merge passes are needed. Note the call to markrunend() between the runs. In your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4, 6, 8). -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 30/06/2024 12:48, Andy Fan wrote: >> for example, at the first use of outputTapes[x], it stores (1, 3, 5, >> 7), >> and later (2, 4, 6, 8) are put into it. so the overall of (1, 3, 5, 7, >> 2, 4, 6, 8) are not sorted? Where did I go wrong? > > There's a distinction between "runs" and "tapes". Each "run" is sorted, > but there can be multiple runs on a tape. In that case, multiple merge > passes are needed. Note the call to markrunend() between the runs. In > your example, the tape would look like (1, 3, 5, 7, <end marker>, 2, 4, > 6, 8). I see, Thank you for your answer! -- Best Regards Andy Fan