Re: libpq compression

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: libpq compression
Дата
Msg-id 5cba6693-d1e2-f4f0-ba9f-f057d2d17b23@garret.ru
обсуждение исходный текст
Ответ на Re: libpq compression  (Daniil Zakhlystov <usernamedt@yandex-team.ru>)
Список pgsql-hackers

On 11.02.2021 16:09, Daniil Zakhlystov wrote:
> Hi!
>
>> On 09.02.2021 09:06, Konstantin Knizhnik wrote:
>>
>> Sorry, but my interpretation of your results is completely different:
>> permanent compression is faster than chunked compression (2m15 vs. 2m27)
>> and consumes less CPU (44 vs 48 sec).
>> Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
>> So permanent compression is better from all points of view: it is
>> faster, consumes less CPU and reduces network traffic!
>>
>>  From my point of view your results just prove my original opinion that
>> possibility to control compression on the fly and use different
>> compression algorithm for TX/RX data
>> just complicates implementation and given no significant advantages.
> When I mentioned the lower CPU usage, I was referring to the pgbench test results in attached
> google doc, where chunked compression demonstrated lower CPU usage compared to the permanent compression.
>
> I made another (a little bit larger) pgbench test to demonstrate this:
>
> Pgbench test parameters:
>
> Data load
> pgbench -i -s 100
>
> Run configuration
> pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"
>
> Pgbench test results:
>
> No compression
> latency average = 247.793 ms
> tps = 2421.380067 (including connections establishing)
> tps = 2421.660942 (excluding connections establishing)
>
> real    6m11.818s
> user    1m0.620s
> sys     2m41.087s
> RX bytes diff, human: 703.9221M
> TX bytes diff, human: 772.2580M
>
> Chunked compression (compress only CopyData and DataRow messages)
> latency average = 249.123 ms
> tps = 2408.451724 (including connections establishing)
> tps = 2408.719578 (excluding connections establishing)
>
> real    6m13.819s
> user    1m18.800s
> sys     2m39.941s
> RX bytes diff, human: 707.3872M
> TX bytes diff, human: 772.1594M
>
> Permanent compression
> latency average = 250.312 ms
> tps = 2397.005945 (including connections establishing)
> tps = 2397.279338 (excluding connections establishing)
>
> real    6m15.657s
> user    1m54.281s
> sys     2m37.187s
> RX bytes diff, human: 610.6932M
> TX bytes diff, human: 513.2225M
>
>
> As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is significantly smaller in
> chunked compression because it doesn’t try to compress all of the packets.
Well, but permanent compression provides some (not so large) reducing of 
traffic, while
for chunked compression network traffic is almost the same as with 
no-compression, but it consumes more CPU.

Definitely pgbench queries are not the case where compression should be 
used: both requests and responses are too short to make compression 
efficient.
So in this case compression should not be used at all.
 From my point of view, "chunked compression" is not a good compromise 
between no-compression and permanent-compression cases,
but it combines drawbacks of two approaches: doesn't reduce traffic but 
consume more CPU.
>
> Here is the summary from my POV, according to these and previous tests results:
>
> 1. Permanent compression always brings the highest compression ratio
> 2. Permanent compression might be not worthwhile in load different from COPY data / Replication / BLOBs/JSON queries
> 3. Chunked compression allows to compress only well compressible messages and save the CPU cycles by not compressing
theothers
 
> 4. Chunked compression introduces some traffic overhead compared to the permanent (1.2810G vs 1.2761G TX data on
pg_restoreof IMDB database dump, according to results in my previous message)
 
> 5. From the protocol point of view, chunked compression seems a little bit more flexible:
>   - we can inject some uncompressed messages at any time without the need to decompress/compress the compressed data
>   - we can potentially switch the compression algorithm at any time (but I think that this might be
over-engineering)
>
> Given the summary above, I think it’s time to make a decision on which path we should take and make the final list of
goalsthat need to be reached in this patch to make it committable.
 
>
> Thanks,
>
> Daniil Zakhlystov




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Add tests for bytea LIKE operator