Re: libpq compression
От | Daniil Zakhlystov |
---|---|
Тема | Re: libpq compression |
Дата | |
Msg-id | 6A45DFAA-1682-4EF2-B835-C5F46615EC49@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: libpq compression (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Список | pgsql-hackers |
Hi! I’ve contacted Yann Collet (developer of ZSTD) and told him about our discussion. Here is his comment: > Hi Daniil > • Is this an expected behavior of ZSTD to consume more memory during the decompression of data that was compressedwith a high compression ratio? > > I assume that the target application is employing the streaming mode. > In which case, yes, the memory usage is directly dependent on the Window size, and the Window size tend to increase withcompression level. > > • how we can restrict the maximal memory usage during decompression? > > There are several ways. > > • From a decompression perspective > > the first method is to _not_ use the streaming mode, > and employ the direct buffer-to-buffer compression instead, > like ZSTD_decompress() for example. > In which case, the decompressor will not need additional memory, it will only employ the provided buffers. > > This however entirely depends on the application and can therefore be unpractical. > It’s fine when decompressing small blocks, it’s not when decompressing gigantic streams of data. > > The second method is more straightforward : set a limit to the window size that the decoder accepts to decode. > This is the ZSTD_d_windowLogMax parameter, documented here : https://github.com/facebook/zstd/blob/v1.4.7/lib/zstd.h#L536 > > This can be set to any arbitrary power of 2 limit. > A frame requiring more than this value will be rejected by the decoder, precisely to avoid sustaining large memory requirements. > > Lastly, note that, in presence of a large window size requirement, the decoder will allocate a correspondingly large buffer, > but will not necessarily use it. > For example, if a frame generated with streaming mode at level 22 declares a 128 MB window size, but effectively only contains~200 KB of data, > the buffer will only use 200 KB. > The rest of the buffer is “allocated” from an address space perspective but is not “used” and therefore does not reallyoccupy physical RAM space. > This is a capability of all modern OS and contributes to minimizing the impact of outsized window sizes. > > > • From a compression perspective > > Knowing the set limitation, the compressor should be compliant, and avoid going above the threshold. > One way to do it is to limit the compression level to those which remain below the set limit. > For example, if the limit is 8 MB, all levels <= 19 will be compatible, as they require 8 MB max (and generally less). > > Another method is to manually set a window size, so that it doesn’t exceed the limit. > This is the ZSTD_c_windowLog parameter, which is documented here : https://github.com/facebook/zstd/blob/v1.4.7/lib/zstd.h#L289 > > Another complementary way is to provide the source size when it’s known. > By default, the streaming mode doesn’t know the input size, since it’s supposed to receive it in multiple blocks. > It will only discover it at the end, by which point it’s too late to use this information in the frame header. > This can be solved, by providing the source size upfront, before starting compression. > This is the function ZSTD_CCtx_setPledgedSrcSize(), documented here : https://github.com/facebook/zstd/blob/v1.4.7/lib/zstd.h#L483 > Of course, then the total amount of data in the frame must be exact, otherwise it’s detected as an error. > > Taking again the previous example of compressing 200 KB with level 22, on knowing the source size, > the compressor will resize the window to fit the input, and therefore employ 200 KB, instead of 128 MB. > This information will be present in the header, and the decompressor will also be able to use 200 KB instead of 128 MB. > Also, presuming the decompressor has a hard limit set to 8 MB (for example), the header using a 200 KB window size willpass and be properly decoded, while the header using 128 MB will be rejected. > This method is cumulative with the one setting a manual window size (the compressor will select the smallest of both). > > > So yes, memory consumption is a serious topic, and there are tools in the `zstd` library to deal with it. > > > Hope it helps > > Best Regards > > Yann Collet After reading Yann’s advice I repeated yesterday single-directional decompression benchmarks with ZSTD_d_windowLogMax setto 23, i.e 8MB max window size. Total committed memory (Committed_AS) size for ZSTD compression levels 1-19 was pretty much the same: Committed_AS baseline (size without any benchmark running) - 42.4 GiB Scenario Committed_AS Committed_AS - Baseline no compression 44,36 GiB 1,05 GiB ZSTD:1 45,03 GiB 1,06 GiB ZSTD:5 46,06 GiB 1,09 GiB ZSTD:9 46,00 GiB 1,08 GiB ZSTD:13 47,46 GiB 1,12 GiB ZSTD:17 50,23 GiB 1,18 GiB ZSTD:19 50,21 GiB 1,18 GiB As for ZSTD levels higher than 19, decompressor returned the appropriate error (excerpt from PostgreSQL server log): LOG: failed to decompress data: Frame requires too much memory for decoding Full benchmark report: https://docs.google.com/document/d/1LI8hPzMkzkdQLf7pTN-LXPjIJdjN33bEAqVJj0PLnHA Pull request with max window size limit: https://github.com/postgrespro/libpq_compression/pull/5 This should fix the possible attack vectors related to high ZSTD compression levels. — Daniil Zakhlystov
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Ashutosh BapatДата:
Сообщение: Re: Feature request: Connection string parsing for postgres_fdw
Следующее
От: Bharath RupireddyДата:
Сообщение: Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs