Обсуждение: Memory leak in incremental sort re-scan
ExecIncrementalSort() calls tuplesort_begin_common(), which creates the "TupleSort main" and "TupleSort sort" memory contexts, and ExecEndIncrementalSort() calls tuplesort_end(), which destroys them. But ExecReScanIncrementalSort() only resets the memory contexts. Since the next call to ExecIncrementalSort() will create them again, we end up leaking these contexts for every re-scan. Here is a reproducer with the regression test database: SET enable_sort = off; SET enable_hashjoin = off; SET enable_mergejoin = off; SET enable_material = off; SELECT t.unique2, t2.r FROM tenk1 AS t JOIN (SELECT unique1, row_number() OVER (ORDER BY hundred, thousand) AS r FROM tenk1 OFFSET 0) AS t2 ON t.unique1 + 0 = t2.unique1 WHERE t.unique1 < 1000; The execution plan: Nested Loop Join Filter: ((t.unique1 + 0) = tenk1.unique1) -> Bitmap Heap Scan on tenk1 t Recheck Cond: (unique1 < 1000) -> Bitmap Index Scan on tenk1_unique1 Index Cond: (unique1 < 1000) -> WindowAgg -> Incremental Sort Sort Key: tenk1.hundred, tenk1.thousand Presorted Key: tenk1.hundred -> Index Scan using tenk1_hundred on tenk1 A memory context dump at the end of the execution looks like this: ExecutorState: 262144 total in 6 blocks; 74136 free (29 chunks); 188008 used TupleSort main: 32832 total in 2 blocks; 7320 free (0 chunks); 25512 used TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used TupleSort main: 32832 total in 2 blocks; 7256 free (0 chunks); 25576 used TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used TupleSort main: 32832 total in 2 blocks; 7320 free (0 chunks); 25512 used TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used [many more] 1903 more child contexts containing 93452928 total in 7597 blocks; 44073240 free (0 chunks); 49379688 used The following patch fixes the problem for me: --- a/src/backend/executor/nodeIncrementalSort.c +++ b/src/backend/executor/nodeIncrementalSort.c @@ -1145,21 +1145,16 @@ ExecReScanIncrementalSort(IncrementalSortState *node) node->execution_status = INCSORT_LOADFULLSORT; /* - * If we've set up either of the sort states yet, we need to reset them. - * We could end them and null out the pointers, but there's no reason to - * repay the setup cost, and because ExecIncrementalSort guards presorted - * column functions by checking to see if the full sort state has been - * initialized yet, setting the sort states to null here might actually - * cause a leak. + * Release tuplesort resources. */ if (node->fullsort_state != NULL) { - tuplesort_reset(node->fullsort_state); + tuplesort_end(node->fullsort_state); node->fullsort_state = NULL; } if (node->prefixsort_state != NULL) { - tuplesort_reset(node->prefixsort_state); + tuplesort_end(node->prefixsort_state); node->prefixsort_state = NULL; } The original comment hints that this might mot be the correct thing to do... Yours, Laurenz Albe
Hi, On 6/15/23 13:48, Laurenz Albe wrote: > ExecIncrementalSort() calls tuplesort_begin_common(), which creates the "TupleSort main" > and "TupleSort sort" memory contexts, and ExecEndIncrementalSort() calls tuplesort_end(), > which destroys them. > But ExecReScanIncrementalSort() only resets the memory contexts. Since the next call to > ExecIncrementalSort() will create them again, we end up leaking these contexts for every > re-scan. > > Here is a reproducer with the regression test database: > > SET enable_sort = off; > SET enable_hashjoin = off; > SET enable_mergejoin = off; > SET enable_material = off; > > SELECT t.unique2, t2.r > FROM tenk1 AS t > JOIN (SELECT unique1, > row_number() OVER (ORDER BY hundred, thousand) AS r > FROM tenk1 > OFFSET 0) AS t2 > ON t.unique1 + 0 = t2.unique1 > WHERE t.unique1 < 1000; > > The execution plan: > > Nested Loop > Join Filter: ((t.unique1 + 0) = tenk1.unique1) > -> Bitmap Heap Scan on tenk1 t > Recheck Cond: (unique1 < 1000) > -> Bitmap Index Scan on tenk1_unique1 > Index Cond: (unique1 < 1000) > -> WindowAgg > -> Incremental Sort > Sort Key: tenk1.hundred, tenk1.thousand > Presorted Key: tenk1.hundred > -> Index Scan using tenk1_hundred on tenk1 > > > A memory context dump at the end of the execution looks like this: > > ExecutorState: 262144 total in 6 blocks; 74136 free (29 chunks); 188008 used > TupleSort main: 32832 total in 2 blocks; 7320 free (0 chunks); 25512 used > TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used > Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used > TupleSort main: 32832 total in 2 blocks; 7256 free (0 chunks); 25576 used > TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used > Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used > TupleSort main: 32832 total in 2 blocks; 7320 free (0 chunks); 25512 used > TupleSort sort: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used > Caller tuples: 8192 total in 1 blocks (0 chunks); 7984 free (0 chunks); 208 used > [many more] > 1903 more child contexts containing 93452928 total in 7597 blocks; 44073240 free (0 chunks); 49379688 used > > > The following patch fixes the problem for me: > > --- a/src/backend/executor/nodeIncrementalSort.c > +++ b/src/backend/executor/nodeIncrementalSort.c > @@ -1145,21 +1145,16 @@ ExecReScanIncrementalSort(IncrementalSortState *node) > node->execution_status = INCSORT_LOADFULLSORT; > > /* > - * If we've set up either of the sort states yet, we need to reset them. > - * We could end them and null out the pointers, but there's no reason to > - * repay the setup cost, and because ExecIncrementalSort guards presorted > - * column functions by checking to see if the full sort state has been > - * initialized yet, setting the sort states to null here might actually > - * cause a leak. > + * Release tuplesort resources. > */ > if (node->fullsort_state != NULL) > { > - tuplesort_reset(node->fullsort_state); > + tuplesort_end(node->fullsort_state); > node->fullsort_state = NULL; > } > if (node->prefixsort_state != NULL) > { > - tuplesort_reset(node->prefixsort_state); > + tuplesort_end(node->prefixsort_state); > node->prefixsort_state = NULL; > } > > > The original comment hints that this might mot be the correct thing to do... > I think it's correct, but I need to look at the code more closely - it's been a while. The code is a bit silly, as it resets the tuplesort and then throws away all the pointers - so what could the _end() break? AFAICS the comment says that we can't just do tuplesort_reset and keep the pointers, because some other code depends on them being NULL. In hindsight, that's a bit awkward - it'd probably be better to have a separate flag, which would allow us to just reset the tuplesort. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 6/15/23 13:48, Laurenz Albe wrote: >> ExecIncrementalSort() calls tuplesort_begin_common(), which creates the "TupleSort main" >> and "TupleSort sort" memory contexts, and ExecEndIncrementalSort() calls tuplesort_end(), >> which destroys them. >> But ExecReScanIncrementalSort() only resets the memory contexts. > I think it's correct, but I need to look at the code more closely - it's > been a while. The code is a bit silly, as it resets the tuplesort and > then throws away all the pointers - so what could the _end() break? The report at [1] seems to be the same issue of ExecReScanIncrementalSort leaking memory. I applied Laurenz's fix, and that greatly reduces the speed of leak but doesn't remove the problem entirely. It looks like the remaining issue is that the data computed by preparePresortedCols() is recomputed each time we rescan the node. This seems entirely gratuitous, because there's nothing in that that could change across rescans. I see zero leakage in that example after applying the attached quick hack. (It might be better to make the check in the caller, or to just move the call to ExecInitIncrementalSort.) regards, tom lane [1] https://www.postgresql.org/message-id/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c index 34257ce34b..655b1c30e1 100644 --- a/src/backend/executor/nodeIncrementalSort.c +++ b/src/backend/executor/nodeIncrementalSort.c @@ -166,6 +166,9 @@ preparePresortedCols(IncrementalSortState *node) { IncrementalSort *plannode = castNode(IncrementalSort, node->ss.ps.plan); + if (node->presorted_keys) + return; + node->presorted_keys = (PresortedKeyData *) palloc(plannode->nPresortedCols * sizeof(PresortedKeyData)); @@ -1140,7 +1143,6 @@ ExecReScanIncrementalSort(IncrementalSortState *node) node->outerNodeDone = false; node->n_fullsort_remaining = 0; node->bound_Done = 0; - node->presorted_keys = NULL; node->execution_status = INCSORT_LOADFULLSORT; @@ -1154,12 +1156,12 @@ ExecReScanIncrementalSort(IncrementalSortState *node) */ if (node->fullsort_state != NULL) { - tuplesort_reset(node->fullsort_state); + tuplesort_end(node->fullsort_state); node->fullsort_state = NULL; } if (node->prefixsort_state != NULL) { - tuplesort_reset(node->prefixsort_state); + tuplesort_end(node->prefixsort_state); node->prefixsort_state = NULL; }
On 6/15/23 22:11, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 6/15/23 13:48, Laurenz Albe wrote: >>> ExecIncrementalSort() calls tuplesort_begin_common(), which creates the "TupleSort main" >>> and "TupleSort sort" memory contexts, and ExecEndIncrementalSort() calls tuplesort_end(), >>> which destroys them. >>> But ExecReScanIncrementalSort() only resets the memory contexts. > >> I think it's correct, but I need to look at the code more closely - it's >> been a while. The code is a bit silly, as it resets the tuplesort and >> then throws away all the pointers - so what could the _end() break? > > The report at [1] seems to be the same issue of ExecReScanIncrementalSort > leaking memory. Funny how these reports often come in pairs ... > I applied Laurenz's fix, and that greatly reduces the > speed of leak but doesn't remove the problem entirely. It looks like > the remaining issue is that the data computed by preparePresortedCols() is > recomputed each time we rescan the node. This seems entirely gratuitous, > because there's nothing in that that could change across rescans. Yeah, I was wondering about that too when I skimmed over that code earlier today. > I see zero leakage in that example after applying the attached quick > hack. (It might be better to make the check in the caller, or to just > move the call to ExecInitIncrementalSort.) > Thanks for looking. Are you planning to work on this and push the fix, or do you want me to finish this up? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 6/15/23 22:11, Tom Lane wrote: >> I see zero leakage in that example after applying the attached quick >> hack. (It might be better to make the check in the caller, or to just >> move the call to ExecInitIncrementalSort.) > Thanks for looking. Are you planning to work on this and push the fix, > or do you want me to finish this up? I'm happy to let you take it -- got lots of other stuff on my plate. regards, tom lane
On 6/15/23 22:36, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 6/15/23 22:11, Tom Lane wrote: >>> I see zero leakage in that example after applying the attached quick >>> hack. (It might be better to make the check in the caller, or to just >>> move the call to ExecInitIncrementalSort.) > >> Thanks for looking. Are you planning to work on this and push the fix, >> or do you want me to finish this up? > > I'm happy to let you take it -- got lots of other stuff on my plate. > OK, will do. -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 15, 2023 at 6:35 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 6/15/23 22:36, Tom Lane wrote: > > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > >> On 6/15/23 22:11, Tom Lane wrote: > >>> I see zero leakage in that example after applying the attached quick > >>> hack. (It might be better to make the check in the caller, or to just > >>> move the call to ExecInitIncrementalSort.) > > > >> Thanks for looking. Are you planning to work on this and push the fix, > >> or do you want me to finish this up? > > > > I'm happy to let you take it -- got lots of other stuff on my plate. > > > > OK, will do. I think the attached is enough to fix it -- rather than nulling out the sort states in rescan, we can reset them (as the comment says), but not set them to null (we also have the same mistake with presorted_keys). That avoids unnecessary recreation of the sort states, but it also fixes the problem Tom noted as well: the call to preparePresortedCols() is already guarded by a test on fullsort_state being NULL, so with this change we also won't unnecessarily redo that work. Regards, James Coleman
Вложения
On Fri, 2023-06-16 at 00:34 +0200, Tomas Vondra wrote: > On 6/15/23 22:36, Tom Lane wrote: > > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > > On 6/15/23 22:11, Tom Lane wrote: > > > > I see zero leakage in that example after applying the attached quick > > > > hack. (It might be better to make the check in the caller, or to just > > > > move the call to ExecInitIncrementalSort.) > > > > > Thanks for looking. Are you planning to work on this and push the fix, > > > or do you want me to finish this up? > > > > I'm happy to let you take it -- got lots of other stuff on my plate. > > OK, will do. It would be cool if we could get that into the next minor release in August. Yours, Laurenz Albe
On 6/29/23 13:49, Laurenz Albe wrote: > On Fri, 2023-06-16 at 00:34 +0200, Tomas Vondra wrote: >> On 6/15/23 22:36, Tom Lane wrote: >>> Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >>>> On 6/15/23 22:11, Tom Lane wrote: >>>>> I see zero leakage in that example after applying the attached quick >>>>> hack. (It might be better to make the check in the caller, or to just >>>>> move the call to ExecInitIncrementalSort.) >>> >>>> Thanks for looking. Are you planning to work on this and push the fix, >>>> or do you want me to finish this up? >>> >>> I'm happy to let you take it -- got lots of other stuff on my plate. >> >> OK, will do. > > It would be cool if we could get that into the next minor release in August. > FWIW I've pushed the fix prepared by James a couple days ago. Thanks for the report! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, 2023-07-02 at 20:13 +0200, Tomas Vondra wrote: > FWIW I've pushed the fix prepared by James a couple days ago. Thanks for > the report! Thanks, and sorry for being pushy. Yours, Laurenz Albe