Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

Поиск
Список
Период
Сортировка
От Jonathon Nelson
Тема Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
Дата
Msg-id CACJqAM2=3Lrt3GgEdH6qps-ZDE6z40oyWuRCUa_DqBX4TLh++g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send  (Greg Stark <stark@mit.edu>)
Ответы Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZEto max_wal_send  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Sun, Jan 8, 2017 at 11:36 AM, Greg Stark <stark@mit.edu> wrote:
On 8 January 2017 at 17:26, Greg Stark <stark@mit.edu> wrote:
> On 5 January 2017 at 19:01, Andres Freund <andres@anarazel.de> wrote:
>> That's a bit odd - shouldn't the OS network stack take care of this in
>> both cases?  I mean either is too big for TCP packets (including jumbo
>> frames).  What type of OS and network is involved here?
>
> 2x may be plausible. The first 128k goes out, then the rest queues up
> until the first ack comes back. Then the next 128kB goes out again
> without waiting... I think this is what Nagle is supposed to actually
> address but either it may be off by default these days or our usage
> pattern may be defeating it in some way.

Hm. That wasn't very clear.  And the more I think about it, it's not right.

The first block of data -- one byte in the worst case, 128kB in our
case -- gets put in the output buffers and since there's nothing
stopping it it immediately gets sent out. Then all the subsequent data
gets put in output buffers but buffers up due to Nagle. Until there's
a full packet of data buffered, the ack arrives, or the timeout
expires, at which point the buffered data drains efficiently in full
packets. Eventually it all drains away and the next 128kB arrives and
is sent out immediately.

So most packets are full size with the occasional 128kB packet thrown
in whenever the buffer empties. And I think even when the 128kB packet
is pending Nagle only stops small packets, not full packets, and the
window should allow more than one packet of data to be pending.

So, uh, forget what I said. Nagle should be our friend here.

[I have not done a rigid analysis, here, but...]

I *think* libpq is the culprit here.

walsender says "Hey, libpq - please send (up to) 128KB of data!" and doesn't "return" until it's "sent". Then it sends more.  Regardless of the underlying cause (nagle, tcp congestion control algorithms, umpteen different combos of hardware and settings, etc..) in almost every test I saw improvement (usually quite a bit). This was most easily observable with high bandwidth-delay product links, but my time in the lab is somewhat limited.

I calculated "performance" the most simple measurement possible: how long did it take for Y volume of data to get transferred, performed over a long-enough interval (typically 1800 seconds) for TCP windows to open up, etc...

--
Jon Nelson
Dyn / Principal Software Engineer

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Incorrect XLogRegisterBuffer flag for revmapbuf in brin
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: [HACKERS] Increase pltcl test coverage