Re: [External] : Re: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send

Поиск

Список

Период

Сортировка

От	Rony Kurniawan
Тема	Re: [External] : Re: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send
Дата	17 мая 2021 г. 22:45:41
Msg-id	5329e8fe-7c90-0a69-af97-0a4928a70b29@oracle.com обсуждение исходный текст
Ответ на	BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send (PG Bug reporting form <noreply@postgresql.org>)
Список	pgsql-bugs

Дерево обсуждения

On 5/17/2021 11:54 AM, Andres Freund wrote:
> Hi,
>
> On 2021-05-17 11:19:31 -0700, Rony Kurniawan wrote:
>> The networks that I tested were gigabits and docker (local). With
>> TCP_NODELAY enabled, the only time small sends would be aggregated is by
>> auto corking in tcp/ip when there is network congestion. But as you can see
>> from the tcpdump output the messages are in individual packet therefore
>> there is no aggregation and no network congestion.
> I don't understand why "individual packages" implies that there can be
> no network congestion? Or are you just saying that in the specific
> period traced you didn't observe that?

Since TCP_NODELAY=0 in PosgreSQL then it is up to the kernel to 
aggregate those sends. In case of auto corking, it happens when the NIC 
has outstanding packet in the tx queue due to network congestion or the 
NIC can not catch up with the amount of send() by the application.

On a gigabit ethernet, the amount of data produced by the logical 
replication server is not enough to trigger auto corking or other 
aggregation hence the individual packet per message. Although, 
aggregation could still happened sometimes.

In my bigger test case using pgbench to insert 20 records/transaction 
for 1 minute, I see some bigger packets but they are mostly 629 bytes.

> I just verified this with iperf - I see large packets with
> iperf -l 500 --nodelay -c $other_host
> but not
> iperf -b 10M -l 500 --nodelay -c $other_host
>
> I had to remember how to disable tcp segmentation offloading to see
> proper package sizes in the first case, without there were a lot of
> 65226 byte sized packets in the first case...
>
>> There is network overhead in both sender and receiver like tcp/ip header,
>> number of skb, ethernet tx/rx descriptors, and interrupts.
> Right.
>
>
>> Also syscall overhead in pg_recvlogical where for one insert in the
>> example requires 3 recv() calls to read BEGIN, INSERT, COMMIT messages
>> instead of one recv() to read all three messages when Nagle's is
>> enabled. This syscall overhead is the same in transaction case with
>> multiple changes where each change is one recv().
> I think the obvious and unproblematic improvement is to only send data
> to the socket if WalSndWriteData's last_write parameter is set, or if
> there's a certain amount of data in the socket. That'll only get rid of
> some of the overhead, since we'd still send things like transactions
> separately.
>
> Another improvement might be that WalSndWriteData() possibly shouldn't
> block even if pq_is_send_pending() and the pending amount isn't huge,
> iff !last_write. That way we'd end up doing syscalls sending more data
> at once.

Thank you for looking into this,

Rony

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [External] : Re: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send