Обсуждение: Unlimited memory consumption with long-lived connection
If I execute the attached python script against a postgresql 15.2 (Ubuntu 15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers = 128M), then the server memory usage goes up and up, apparently endlessly. After about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end handling the connection as greater than 1 gigabyte, which seems far too high given the server configuration. The script just performs the same SELECT endlessly in a loop. The memory is released when the script is killed. Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64. PS: The testcase was reduced from a script that kept a connection open for a long time in order to LISTEN, and would execute a query using the same connection every time there was a notification on the channel. It consumed ever more memory to the point of crashing the postgresql server. Changing the script to perform the query using a new short-lived connection was an effective workaround.
Вложения
On 21/02/2023 14:57, Duncan Sands wrote: > If I execute the attached python script against a postgresql 15.2 (Ubuntu > 15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers = > 128M), then the server memory usage goes up and up, apparently endlessly. After > about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end > handling the connection as greater than 1 gigabyte, which seems far too high > given the server configuration. The script just performs the same SELECT > endlessly in a loop. The memory is released when the script is killed. > > Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64. > > PS: The testcase was reduced from a script that kept a connection open for a > long time in order to LISTEN, and would execute a query using the same > connection every time there was a notification on the channel. It consumed ever > more memory to the point of crashing the postgresql server. Changing the script > to perform the query using a new short-lived connection was an effective workaround. I can reproduce this on my laptop. When I turn 'jit=off', or disable JIT inlining with 'jit_inline_above_cost = -1', the leak stops, or at least gets small enough that I don't readily see it with 'top' anymore. So it seems to be related to LLVM inlining. I'm using LLVM and clang 14. To track down the leak, I started postgres server with Valgrind, and let the script run for 40 iterations with Valgrind. It did report some leaks from LLVM inline functions (attached), but I'm not very familiar with this code so I'm not sure which ones might be genuine leaks or how to fix them. - Heikki
Вложения
On Tue, Feb 21, 2023 at 04:32:02PM +0200, Heikki Linnakangas wrote: > On 21/02/2023 14:57, Duncan Sands wrote: > > If I execute the attached python script against a postgresql 15.2 (Ubuntu > > 15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers = > > 128M), then the server memory usage goes up and up, apparently endlessly. After > > about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end > > handling the connection as greater than 1 gigabyte, which seems far too high > > given the server configuration. The script just performs the same SELECT > > endlessly in a loop. The memory is released when the script is killed. > > > > Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64. > > > > PS: The testcase was reduced from a script that kept a connection open for a > > long time in order to LISTEN, and would execute a query using the same > > connection every time there was a notification on the channel. It consumed ever > > more memory to the point of crashing the postgresql server. Changing the script > > to perform the query using a new short-lived connection was an effective workaround. It sounds like the same as the issue here: https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items There are patches proposed here, which fixed the issue for me. But I've always been suspicious that there may be a 2nd, undiagnosed issue lurking behind this one... https://www.postgresql.org/message-id/20210417021602.7dilihkdc7oblrf7%40alap3.anarazel.de
Hi Justin, >>> PS: The testcase was reduced from a script that kept a connection open for a >>> long time in order to LISTEN, and would execute a query using the same >>> connection every time there was a notification on the channel. It consumed ever >>> more memory to the point of crashing the postgresql server. Changing the script >>> to perform the query using a new short-lived connection was an effective workaround. > > It sounds like the same as the issue here: > https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items I agree that it is likely the same issue. > There are patches proposed here, which fixed the issue for me. > But I've always been suspicious that there may be a 2nd, undiagnosed > issue lurking behind this one... > https://www.postgresql.org/message-id/20210417021602.7dilihkdc7oblrf7%40alap3.anarazel.de Thanks. Since those patches are just a workaround, and I already worked around it, I don't plan to give them a whirl. On the other hand, if Andres ever gets round to implementing approach (1) from https://www.postgresql.org/message-id/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de then I'd be happy to give it a whirl. I'm rusty now but I used to be quite expert in LLVM, so I might even be able to do some patch review. Best wishes, Duncan.
Hi, On 2023-02-22 08:53:41 -0600, Justin Pryzby wrote: > On Tue, Feb 21, 2023 at 04:32:02PM +0200, Heikki Linnakangas wrote: > > On 21/02/2023 14:57, Duncan Sands wrote: > > > If I execute the attached python script against a postgresql 15.2 (Ubuntu > > > 15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers = > > > 128M), then the server memory usage goes up and up, apparently endlessly. After > > > about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end > > > handling the connection as greater than 1 gigabyte, which seems far too high > > > given the server configuration. The script just performs the same SELECT > > > endlessly in a loop. The memory is released when the script is killed. > > > > > > Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64. > > > > > > PS: The testcase was reduced from a script that kept a connection open for a > > > long time in order to LISTEN, and would execute a query using the same > > > connection every time there was a notification on the channel. It consumed ever > > > more memory to the point of crashing the postgresql server. Changing the script > > > to perform the query using a new short-lived connection was an effective workaround. > > It sounds like the same as the issue here: > https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items > There are patches proposed here, which fixed the issue for me. > But I've always been suspicious that there may be a 2nd, undiagnosed > issue lurking behind this one... Any more details? > https://www.postgresql.org/message-id/20210417021602.7dilihkdc7oblrf7%40alap3.anarazel.de A slightly edited / rebased version is at https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq%40awork3.anarazel.de I'd feel a lot better applying the patch if there'd be a bit more review. If none is forthcoming, I'm somewhat inclined to just apply it to HEAD and later decide whether we'd want to backpatch. Greetings, Andres Freund
On Wed, Feb 22, 2023 at 10:28:11AM -0800, Andres Freund wrote: > On 2023-02-22 08:53:41 -0600, Justin Pryzby wrote: > > There are patches proposed here, which fixed the issue for me. > > But I've always been suspicious that there may be a 2nd, undiagnosed > > issue lurking behind this one... > > Any more details? No. My suspicion is because the process is killed by OOM roughly every 24h, and the leak didn't seem to be big enough to cause that. Maybe the leak is nonlinear or not completely deterministic or something sometimes exacerbates the original issue, or maybe there's only one issue that's 90% but not 100% understood, or maybe there's a 2nd issue that's latent once the fix is applied for the known issue. There's also this report: https://www.postgresql.org/message-id/1605804350439-0.post%40n3.nabble.com ..which offers no details or reponse, and could easily be a leak in someting other than JIT, or postgres, or not a leak at all. -- Justin
On Wed, Feb 22, 2023 at 10:28:11AM -0800, Andres Freund wrote: > I'd feel a lot better applying the patch if there'd be a bit more review. If > none is forthcoming, I'm somewhat inclined to just apply it to HEAD and later > decide whether we'd want to backpatch. Previously I ran the patches in production. Not sure what else I could do, but it seems important to do something. -- Justin