Обсуждение: Make PQgetResult() not return NULL on out-of-memory error
Hi, Currently, PQgetResult() returns NULL not only when no results remain for a sent query, but also when an out-of-memory error occurs, except when PGconn itself is NULL. As a result, users cannot distinguish between query completion and an out-of-memory error when PQgetResult() returns NULL. The result returned by PQgetResult() is generated by either pqPipelineProcessQueue() or getCopyResult(). While pqPipelineProcessQueue() never returns NULL, even in the case of an out-of-memory error, getCopyResult() may return NULL. Therefore, I propose modifying getCopyResult() so that it never returns NULL, but instead returns OOM_result, as pqPipelineProcessQueue() does. I’ve attached a patch for this. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
> On Nov 11, 2025, at 01:07, Yugo Nagata <nagata@sraoss.co.jp> wrote: > > Hi, > > Currently, PQgetResult() returns NULL not only when no results remain for > a sent query, but also when an out-of-memory error occurs, except when > PGconn itself is NULL. As a result, users cannot distinguish between query > completion and an out-of-memory error when PQgetResult() returns NULL. > > The result returned by PQgetResult() is generated by either pqPipelineProcessQueue() > or getCopyResult(). While pqPipelineProcessQueue() never returns NULL, even in the > case of an out-of-memory error, getCopyResult() may return NULL. > Therefore, I propose modifying getCopyResult() so that it never returns NULL, but > instead returns OOM_result, as pqPipelineProcessQueue() does. > > I’ve attached a patch for this. > > Regards, > Yugo Nagata > > -- > Yugo Nagata <nagata@sraoss.co.jp> > <0001-Make-PQgetResult-not-return-NULL-on-out-of-memory-er.patch> This is a front-end OOM. As it changes the behavior of PGgetResult(), I am just worrying about if callers can handle thebehavior change. For example, pgbench: * When pgbench calls readCommandResponse() * If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR * readCommandResponse() will goto the error tag, then discardAvailableResults() will be called * discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infiniteloop? Otherwise the patch looks good to me. I agree that is a good idea to distinguish a client OOM error from a server side error. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Tue, Nov 11, 2025 at 12:12 PM Chao Li <li.evan.chao@gmail.com> wrote: > For example, pgbench: > > * When pgbench calls readCommandResponse() > * If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR > * readCommandResponse() will goto the error tag, then discardAvailableResults() will be called > * discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infiniteloop? Yes, that's a valid concern. If PQgetResult() keeps returning OOM_result due to an out-of-memory error, some functions (e.g., discardAvailableResults() in pgbench.c or PQexecFinish() in fe-exec.c) that expect PQgetResult() to eventually return NULL could end up in an infinite loop. To address this, callers need a way to distinguish between PGRES_FATAL_ERROR and OOM. Functions that loop until PQgetResult() returns NULL should continue if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. For example, one possible solution would be to add PGRES_OOM to ExecStatusType, allowing callers to detect out-of-memory errors with PQresultStatus() == PGRES_OOM. Currently (even with the patch), OOM_result is returned only when PQmakeEmptyPGresult() fails to allocate memory. But I guess there might be other allocation failure paths. If we explicitly expose OOM_result, we may also need to update PQgetResult() to return it in those other cases. I'm not sure if that's the right direction, though... Probably we need to look into the original commit that introduced OOM_result and its related discussion. Regards, -- Fujii Masao
Fujii Masao <masao.fujii@gmail.com> writes:
> To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
> and OOM. Functions that loop until PQgetResult() returns NULL should continue
> if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.
Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.
I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.
The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?
regards, tom lane
On Tue, Nov 11, 2025 at 2:20 PM Fujii Masao <masao.fujii@gmail.com> wrote: > > On Tue, Nov 11, 2025 at 12:12 PM Chao Li <li.evan.chao@gmail.com> wrote: > > For example, pgbench: > > > > * When pgbench calls readCommandResponse() > > * If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR > > * readCommandResponse() will goto the error tag, then discardAvailableResults() will be called > > * discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infiniteloop? > > Yes, that's a valid concern. If PQgetResult() keeps returning OOM_result due to > an out-of-memory error, some functions (e.g., discardAvailableResults() > in pgbench.c or PQexecFinish() in fe-exec.c) that expect PQgetResult() to > eventually return NULL could end up in an infinite loop. > > To address this, callers need a way to distinguish between PGRES_FATAL_ERROR > and OOM. Functions that loop until PQgetResult() returns NULL should continue > if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > For example, one possible solution would be to add PGRES_OOM to > ExecStatusType, allowing callers to detect out-of-memory errors with > PQresultStatus() == PGRES_OOM. This approach might not be good. Many applications currently would expect PQgetResult() to return PGRES_FATAL_ERROR for any fatal error, including out-of-memory. Introducing a new result status like PGRES_OOM could break those applications, since they would need to be updated to handle this new status explicitly. Regards, -- Fujii Masao
On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Fujii Masao <masao.fujii@gmail.com> writes: > > To address this, callers need a way to distinguish between PGRES_FATAL_ERROR > > and OOM. Functions that loop until PQgetResult() returns NULL should continue > > if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > > Not sure about that. We might or might not be able to make progress > if the caller keeps calling PQgetResult. But if it stops after an > OOM report then we might as well just close the connection, because > there is no hope of getting back in sync. > > I'm inclined to think that it's correct that we should return > OOM_result not NULL if we couldn't make a PGresult, but further > analysis is needed to make sure that libpq can make progress > afterwards. I don't think we should expect applications to > involve themselves in that recovery logic, because they won't, > and most certainly won't test it carefully. > > The whole project might be doomed to failure really. I think > that one very likely failure if we are approaching OOM is that > we can't enlarge libpq's input buffer enough to accept all of > a (long) incoming server message. What hope have we got of > getting out of that situation? When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult() returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL even in OOM case, so functions looping until NULL won't end up in an infinite loop. Of course, issuing another query on the same connection afterward could be problematic, though. I'm still not sure this behavior is correct, but I'm just wondering if asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned. Regards, -- Fujii Masao
On Tue, 11 Nov 2025 20:16:11 +0900 Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Fujii Masao <masao.fujii@gmail.com> writes: > > > To address this, callers need a way to distinguish between PGRES_FATAL_ERROR > > > and OOM. Functions that loop until PQgetResult() returns NULL should continue > > > if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > > > > Not sure about that. We might or might not be able to make progress > > if the caller keeps calling PQgetResult. But if it stops after an > > OOM report then we might as well just close the connection, because > > there is no hope of getting back in sync. > > > > I'm inclined to think that it's correct that we should return > > OOM_result not NULL if we couldn't make a PGresult, but further > > analysis is needed to make sure that libpq can make progress > > afterwards. I don't think we should expect applications to > > involve themselves in that recovery logic, because they won't, > > and most certainly won't test it carefully. > > > > The whole project might be doomed to failure really. I think > > that one very likely failure if we are approaching OOM is that > > we can't enlarge libpq's input buffer enough to accept all of > > a (long) incoming server message. What hope have we got of > > getting out of that situation? > > When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult() > returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to > PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL > even in OOM case, so functions looping until NULL won't end up in an > infinite loop. > Of course, issuing another query on the same connection afterward could be > problematic, though. > > I'm still not sure this behavior is correct, but I'm just wondering if > asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned. I agree that PQgetResult() should return NULL after returning OOM_result, and resetting asyncStatus to PGASYNC_IDLE would work in the normal mode. I also agree that continuing to use the same connection seems problematic, especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult() never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue. Therefore, I wonder about closing the connection and resetting the status when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. In this case, the connection status should also be set to CONNECTINO_BAD. This would prevent applications from continueing to use potentially problamatic connection without providing the way to distinguish OOM from other errors. What do you think? I've attached an updated patch in this direction. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Wed, 12 Nov 2025 16:20:13 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 11 Nov 2025 20:16:11 +0900 > Fujii Masao <masao.fujii@gmail.com> wrote: > > > On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Fujii Masao <masao.fujii@gmail.com> writes: > > > > To address this, callers need a way to distinguish between PGRES_FATAL_ERROR > > > > and OOM. Functions that loop until PQgetResult() returns NULL should continue > > > > if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > > > > > > Not sure about that. We might or might not be able to make progress > > > if the caller keeps calling PQgetResult. But if it stops after an > > > OOM report then we might as well just close the connection, because > > > there is no hope of getting back in sync. > > > > > > I'm inclined to think that it's correct that we should return > > > OOM_result not NULL if we couldn't make a PGresult, but further > > > analysis is needed to make sure that libpq can make progress > > > afterwards. I don't think we should expect applications to > > > involve themselves in that recovery logic, because they won't, > > > and most certainly won't test it carefully. > > > > > > The whole project might be doomed to failure really. I think > > > that one very likely failure if we are approaching OOM is that > > > we can't enlarge libpq's input buffer enough to accept all of > > > a (long) incoming server message. What hope have we got of > > > getting out of that situation? > > > > When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult() > > returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to > > PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL > > even in OOM case, so functions looping until NULL won't end up in an > > infinite loop. > > Of course, issuing another query on the same connection afterward could be > > problematic, though. > > > > I'm still not sure this behavior is correct, but I'm just wondering if > > asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned. > > I agree that PQgetResult() should return NULL after returning OOM_result, and > resetting asyncStatus to PGASYNC_IDLE would work in the normal mode. > > I also agree that continuing to use the same connection seems problematic, > especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult() > never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue. > > Therefore, I wonder about closing the connection and resetting the status > when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. > In this case, the connection status should also be set to CONNECTINO_BAD. > > This would prevent applications from continueing to use potentially problamatic > connection without providing the way to distinguish OOM from other errors. > > What do you think? > > I've attached an updated patch in this direction. I'm sorry, the patch attached to my previous post was incorrect. I've attached the correct one. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
> On Nov 12, 2025, at 15:20, Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Tue, 11 Nov 2025 20:16:11 +0900 > Fujii Masao <masao.fujii@gmail.com> wrote: > >> On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> Fujii Masao <masao.fujii@gmail.com> writes: >>>> To address this, callers need a way to distinguish between PGRES_FATAL_ERROR >>>> and OOM. Functions that loop until PQgetResult() returns NULL should continue >>>> if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. >>> >>> Not sure about that. We might or might not be able to make progress >>> if the caller keeps calling PQgetResult. But if it stops after an >>> OOM report then we might as well just close the connection, because >>> there is no hope of getting back in sync. >>> >>> I'm inclined to think that it's correct that we should return >>> OOM_result not NULL if we couldn't make a PGresult, but further >>> analysis is needed to make sure that libpq can make progress >>> afterwards. I don't think we should expect applications to >>> involve themselves in that recovery logic, because they won't, >>> and most certainly won't test it carefully. >>> >>> The whole project might be doomed to failure really. I think >>> that one very likely failure if we are approaching OOM is that >>> we can't enlarge libpq's input buffer enough to accept all of >>> a (long) incoming server message. What hope have we got of >>> getting out of that situation? >> >> When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult() >> returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to >> PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL >> even in OOM case, so functions looping until NULL won't end up in an >> infinite loop. >> Of course, issuing another query on the same connection afterward could be >> problematic, though. >> >> I'm still not sure this behavior is correct, but I'm just wondering if >> asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned. > > I agree that PQgetResult() should return NULL after returning OOM_result, and > resetting asyncStatus to PGASYNC_IDLE would work in the normal mode. > > I also agree that continuing to use the same connection seems problematic, > especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult() > never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue. > > Therefore, I wonder about closing the connection and resetting the status > when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. > In this case, the connection status should also be set to CONNECTINO_BAD. > > This would prevent applications from continueing to use potentially problamatic > connection without providing the way to distinguish OOM from other errors. > > What do you think? > > I've attached an updated patch in this direction. > > Regards, > Yugo Nagata > > -- > Yugo Nagata <nagata@sraoss.co.jp> > <v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch> Here, the OOM condition is a client-side problem that’s generally beyond libpq’s control. When an out-of-memory error occurs,it usually isn’t caused by a memory leak inside libpq itself. After OOM, any operation that requires additional allocationmay fail — even something as simple as writing a log message. Closing the current connection might release somememory, but that memory could be immediately consumed by other processes, so further actions might still fail. In suchcases, simply aborting the process can be a reasonable and straightforward strategy. However, in many cases the libpq client is an application server, where forcibly aborting the entire process is not acceptable.From that perspective, closing the affected connection is a more appropriate response. The application shoulddetect the broken database connection and attempt to reconnect. If the OOM condition persists and reconnection continuesto fail, the application’s own recovery and failover logic should take over. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
> > Therefore, I wonder about closing the connection and resetting the status > > when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. > > In this case, the connection status should also be set to CONNECTINO_BAD. There are many code paths in libpq that can trigger an out-of-memory error. However, this patch makes libpq drop the connection when OOM_result is returned, which likely covers only a subset of those cases. So I'm not sure dropping the connection only for OOM_result would meaningfully improve things. Perhaps it would be more consistent to drop the connection for any internal libpq error such as out-of-memory, though that might require invasive code changes and coul break existing applications, I'm afraid. Regards, -- Fujii Masao
On Wed, 12 Nov 2025 18:09:28 +0900 Fujii Masao <masao.fujii@gmail.com> wrote: > > > Therefore, I wonder about closing the connection and resetting the status > > > when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. > > > In this case, the connection status should also be set to CONNECTINO_BAD. > > There are many code paths in libpq that can trigger an out-of-memory error. > However, this patch makes libpq drop the connection when OOM_result is returned, > which likely covers only a subset of those cases. So I'm not sure dropping > the connection only for OOM_result would meaningfully improve things. > > Perhaps it would be more consistent to drop the connection for any > internal libpq error > such as out-of-memory, though that might require invasive code changes and coul > break existing applications, I'm afraid. I see that some internal errors are handled by calling handleFatalError() (i.e. dropping the connection), but for some OOM errors in libpq/fe-protocol3.c, functions just fail silently. Therefore, I'm not sure if dropping the connection for any internal libpq errors is good idea. If an OOM during the preparation of a result of PGgetResult() is not considered a fatal error should terminate the connection, then is it the application's responsibility to determine whetehr the connection can continue to be used? In that case, a way to detect OOM failure (e.g. PQresultIsOutOrMemory() or something similar) should be provided. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Wed, 12 Nov 2025 16:40:34 +0800 Chao Li <li.evan.chao@gmail.com> wrote: > > > > On Nov 12, 2025, at 15:20, Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > On Tue, 11 Nov 2025 20:16:11 +0900 > > Fujii Masao <masao.fujii@gmail.com> wrote: > > > >> On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> > >>> Fujii Masao <masao.fujii@gmail.com> writes: > >>>> To address this, callers need a way to distinguish between PGRES_FATAL_ERROR > >>>> and OOM. Functions that loop until PQgetResult() returns NULL should continue > >>>> if the result is PGRES_FATAL_ERROR, but should break if it's an OOM. > >>> > >>> Not sure about that. We might or might not be able to make progress > >>> if the caller keeps calling PQgetResult. But if it stops after an > >>> OOM report then we might as well just close the connection, because > >>> there is no hope of getting back in sync. > >>> > >>> I'm inclined to think that it's correct that we should return > >>> OOM_result not NULL if we couldn't make a PGresult, but further > >>> analysis is needed to make sure that libpq can make progress > >>> afterwards. I don't think we should expect applications to > >>> involve themselves in that recovery logic, because they won't, > >>> and most certainly won't test it carefully. > >>> > >>> The whole project might be doomed to failure really. I think > >>> that one very likely failure if we are approaching OOM is that > >>> we can't enlarge libpq's input buffer enough to accept all of > >>> a (long) incoming server message. What hope have we got of > >>> getting out of that situation? > >> > >> When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult() > >> returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to > >> PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL > >> even in OOM case, so functions looping until NULL won't end up in an > >> infinite loop. > >> Of course, issuing another query on the same connection afterward could be > >> problematic, though. > >> > >> I'm still not sure this behavior is correct, but I'm just wondering if > >> asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned. > > > > I agree that PQgetResult() should return NULL after returning OOM_result, and > > resetting asyncStatus to PGASYNC_IDLE would work in the normal mode. > > > > I also agree that continuing to use the same connection seems problematic, > > especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult() > > never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue. > > > > Therefore, I wonder about closing the connection and resetting the status > > when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does. > > In this case, the connection status should also be set to CONNECTINO_BAD. > > > > This would prevent applications from continueing to use potentially problamatic > > connection without providing the way to distinguish OOM from other errors. > > > > What do you think? > > > > I've attached an updated patch in this direction. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo Nagata <nagata@sraoss.co.jp> > > <v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch> > > Here, the OOM condition is a client-side problem that’s generally beyond libpq’s control. When an out-of-memory error occurs,it usually isn’t caused by a memory leak inside libpq itself. After OOM, any operation that requires additional allocationmay fail — even something as simple as writing a log message. Closing the current connection might release somememory, but that memory could be immediately consumed by other processes, so further actions might still fail. In suchcases, simply aborting the process can be a reasonable and straightforward strategy. > > However, in many cases the libpq client is an application server, where forcibly aborting the entire process is not acceptable.From that perspective, closing the affected connection is a more appropriate response. The application shoulddetect the broken database connection and attempt to reconnect. If the OOM condition persists and reconnection continuesto fail, the application’s own recovery and failover logic should take over. What I meant is that closing the connection on OOM errors is intended to prevent applications from continuing to use a potentially problematic connection, rather than to release memory. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>