Обсуждение: [PATCH] Better Performance for PostgreSQL with large INSERTs
Sometimes, storing documents (eg. PDFs) in a database is much easier than using a separate storage (like S3, NFS, etc.). (Because of issues like backup integrity, availability, service dependencies, access rights, encryption of data, etc..) With this patch: ```diff diff --git i/src/backend/libpq/pqcomm.c w/src/backend/libpq/pqcomm.c index e517146..936b073 100644 --- i/src/backend/libpq/pqcomm.c +++ w/src/backend/libpq/pqcomm.c @@ -117,7 +117,8 @@ static List *sock_paths = NIL; */ #define PQ_SEND_BUFFER_SIZE 8192 -#define PQ_RECV_BUFFER_SIZE 8192 +#define PQ_RECV_BUFFER_SIZE 2097152 + static char *PqSendBuffer; static int PqSendBufferSize; /* Size send buffer */ ``` ie. changing the network receive buffer size from 8KB to 2MB, got 7% better INSERT performance when storing BLOBs. The 2MB value is just what we tried, 128kB or 256kB works as well. The main point is to reduce the number of syscalls for receiving data to about half of what it is with 8KB. Thank you for your consideration! Regards, Phil
Hi! On Mon, 15 Sept 2025 at 18:16, Philipp Marek <philipp@marek.priv.at> wrote: > > Sometimes, storing documents (eg. PDFs) in a database > is much easier than using a separate storage (like S3, NFS, etc.). > > (Because of issues like backup integrity, availability, > service dependencies, access rights, encryption of data, etc..) > > > With this patch: > > ```diff > diff --git i/src/backend/libpq/pqcomm.c w/src/backend/libpq/pqcomm.c > index e517146..936b073 100644 > --- i/src/backend/libpq/pqcomm.c > +++ w/src/backend/libpq/pqcomm.c > @@ -117,7 +117,8 @@ static List *sock_paths = NIL; > */ > > #define PQ_SEND_BUFFER_SIZE 8192 > -#define PQ_RECV_BUFFER_SIZE 8192 > +#define PQ_RECV_BUFFER_SIZE 2097152 > + Changing this constant will result in an overwhelming increase of memory consumption for instances that work with a large number of connections ( max_connections ~ 1e4) for zero benefit. -- Best regards, Kirill Reshke
Hi Kirill,
> Changing this constant will result in an overwhelming increase of
> memory consumption for instances that work with a large number of
> connections (
> max_connections ~ 1e4) for zero benefit.
No, it's not that bad.
1) With this being allocated in the bss segment,
the RAM will only be actually be provided on _first use_ --
so processes that only ever work with small queries will see no
difference.
2) With temp_buffers' default of 8MB, work_mem using 4MB, etc.,
using an additional 256kB RAM to 2MB _when needed_ is a good deal
for about 7% performance improvements.
Here's the patch again, this time with a 128kB buffer size. This gives us nearly the same gains (~7%) for the blob INSERTs, and the additional memory usage (120kB) shouldn't really matter, with "temp_buffer"s 8MB and "work_mem" 4MB defaults. Making it configurable would give a much more complex patch -- so I suggest just using this fixed size. Thanks! diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 25f739a6..72d092b2 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -117,7 +117,7 @@ static List *sock_paths = NIL; */ #define PQ_SEND_BUFFER_SIZE 8192 -#define PQ_RECV_BUFFER_SIZE 8192 +#define PQ_RECV_BUFFER_SIZE (128 * 1024) static char *PqSendBuffer; static int PqSendBufferSize; /* Size send buffer */
Hi,
7% is an interesting improvement for these cases. I see it as a nice feature,
What about adding a GUC variable to have the ability to configure PQ_RECV_BUFFER_SIZE based on the user's needs?
-Filip-
út 30. 9. 2025 v 10:42 odesílatel Philipp Marek <philipp@marek.priv.at> napsal:
Here's the patch again, this time with a 128kB buffer size.
This gives us nearly the same gains (~7%) for the blob INSERTs,
and the additional memory usage (120kB) shouldn't really matter,
with "temp_buffer"s 8MB and "work_mem" 4MB defaults.
Making it configurable would give a much more complex patch --
so I suggest just using this fixed size.
Thanks!
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 25f739a6..72d092b2 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -117,7 +117,7 @@ static List *sock_paths = NIL;
*/
#define PQ_SEND_BUFFER_SIZE 8192
-#define PQ_RECV_BUFFER_SIZE 8192
+#define PQ_RECV_BUFFER_SIZE (128 * 1024)
static char *PqSendBuffer;
static int PqSendBufferSize; /* Size send buffer */
Hi,
On 2025-09-30 10:42:00 +0200, Philipp Marek wrote:
> Here's the patch again, this time with a 128kB buffer size.
>
> This gives us nearly the same gains (~7%) for the blob INSERTs,
> and the additional memory usage (120kB) shouldn't really matter,
> with "temp_buffer"s 8MB and "work_mem" 4MB defaults.
>
> Making it configurable would give a much more complex patch --
> so I suggest just using this fixed size.
Have you tried to verify that this doesn't cause performance regressions in
other workloads? pq_recvbuf() has this code:
if (PqRecvPointer > 0)
{
if (PqRecvLength > PqRecvPointer)
{
/* still some unread data, left-justify it in the buffer */
memmove(PqRecvBuffer, PqRecvBuffer + PqRecvPointer,
PqRecvLength - PqRecvPointer);
PqRecvLength -= PqRecvPointer;
PqRecvPointer = 0;
}
else
PqRecvLength = PqRecvPointer = 0;
}
I do seem to recall that just increasing the buffer size substantially lead to
more time being spent inside that memmove() (likely due to exceeding L1/L2).
Greetings,
Andres Freund
Hi Filip, > 7% is an interesting improvement for these cases. I see it as a nice > feature, > What about adding a GUC variable to have the ability to configure > PQ_RECV_BUFFER_SIZE based on the user's needs? Well, currently this is just the array size as a preprocessor macro, so the executables .bss gets sized differently. For a GUC quite some more code, including dynamical allocation _and reallocation_ would be needed -- and this complexity I wanted to avoid for the first discussion. If a GUC fits the usage patterns better, I'm okay with it!
Hi Andres, > Have you tried to verify that this doesn't cause performance > regressions in > other workloads? pq_recvbuf() has this code: > ... > > I do seem to recall that just increasing the buffer size substantially > lead to > more time being spent inside that memmove() (likely due to exceeding > L1/L2). Do you have any pointers to discussions or other data about that? My (quick) analysis was that clients that send one request, wait for an answer, then send the next request wouldn't run that code as there's nothing behind the individual requests that could be moved. But yes, Pipeline Mode[1] might/would be affected. The interesting question is how much data can userspace copy before that means more load than doing a userspace-kernel-userspace round trip. (I guess that moving 64kB or 128kB should be quicker, especially since the various CPU mitigations.) As long as there are complete requests in the buffer the memmove() could be avoided; only the initial part of the first incomplete request might need moving to the beginning. That patch would be more than +- 1 line again ;) The documentation says > Pipelining is less useful, and more complex, > when a single pipeline contains multiple transactions > (see Section 32.5.1.3). are there any benchmarks/usage statistics for pipeline mode? Regards, Phil Ad 1: https://www.postgresql.org/docs/current/libpq-pipeline-mode.html
Hi, On 2025-10-07 15:03:29 +0200, Philipp Marek wrote: > > Have you tried to verify that this doesn't cause performance regressions > > in > > other workloads? pq_recvbuf() has this code: > > > ... > > > > I do seem to recall that just increasing the buffer size substantially > > lead to > > more time being spent inside that memmove() (likely due to exceeding > > L1/L2). > > > Do you have any pointers to discussions or other data about that? > > > My (quick) analysis was that clients that send one request, > wait for an answer, then send the next request wouldn't run that code > as there's nothing behind the individual requests that could be moved. > > > But yes, Pipeline Mode[1] might/would be affected. > > The interesting question is how much data can userspace copy before > that means more load than doing a userspace-kernel-userspace round trip. > (I guess that moving 64kB or 128kB should be quicker, especially since > the various CPU mitigations.) I unfortunately don't remember the details of where I saw it happening. Unfortunately I suspect it'll depend a lot on hardware and operating system details (like the security mitigations you mention) when it matters too. > As long as there are complete requests in the buffer the memmove() > could be avoided; only the initial part of the first incomplete request > might need moving to the beginning. Right. I'd be inclined that that ought to be addressed as part of this patch, that way we can be sure that it's pretty sure it's not going to cause regressions. > The documentation says > > > Pipelining is less useful, and more complex, > > when a single pipeline contains multiple transactions > > (see Section 32.5.1.3). > > are there any benchmarks/usage statistics for pipeline mode? You can write benchmarks for it using pgbench's pipeline support, with a custom script. Greetings, Andres Freund