Re: libpq compression (part 3)

Поиск
Список
Период
Сортировка
От Jacob Burroughs
Тема Re: libpq compression (part 3)
Дата
Msg-id CACzsqT6XbZ9bX=mgjHgv-h_6gZEhMB1rLbz0RT4=aWsoGaTvNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: libpq compression (part 3)  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Ответы Re: libpq compression (part 3)  (Jacob Champion <jacob.champion@enterprisedb.com>)
Список pgsql-hackers
On Tue, May 21, 2024 at 10:43 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> To help get everyone on the same page I wanted to list all the
> security concerns in one place:
>
> 1. Triggering excessive CPU usage before authentication, by asking for
> very high compression levels
> 2. Triggering excessive memory/CPU usage before authentication, by
> sending a client sending a zipbomb
> 3. Triggering excessive CPU after authentication, by asking for a very
> high compression level
> 4. Triggering excessive memory/CPU after authentication due to
> zipbombs (i.e. small amount of data extracting to lots of data)
> 5. CRIME style leakage of information about encrypted data
>
> 1 & 2 can easily be solved by not allowing any authentication packets
> to be compressed. This also has benefits for 5.

This is already addressed by only compressing certain message types.
If we think it is important that the server reject compressed packets
of other types I can add that, but it seemed reasonable to just make
the client never send such packets compressed.

> 3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
> deserves some level of trust. But having knobs to limit impact
> definitely seems useful.
>
> 3 can be solved in two ways afaict:
> a. Allow the server to choose the maximum compression level for each
> compression method (using some GUC), and downgrade the level
> transparently when a higher level is requested
> b. Don't allow the client to choose the compression level that the server uses.
>
> I'd prefer option a

3a would seem preferable given discussion upthread. It would probably
be worth doing some measurement to check how much of an actual
difference in compute effort the max vs the default for all 3
algorithms adds, because I would really prefer to avoid needing to add
even more configuration knobs if the max compression level for the
streaming data usecase is sufficiently performant.

> 4 would require some safety limits on the amount of data that a
> (small) compressed message can be decompressed to, and stop
> decompression of that message once that limit is hit. What that limit
> should be seems hard to choose though. A few ideas:
> a. The size of the message reported by the uncompressed header. This
> would mean that at most the 4GB will be uncompressed, since maximum
> message length is 4GB (limited by 32bit message length field)
> b. Allow servers to specify maximum client decompressed message length
> lower than this 4GB, e.g. messages of more than 100MB of uncompressed
> size should not be allowed.

Because we are using streaming decompression, this is much less of an
issue than for things that decompress wholesale onto disk/into memory.
We only read PQ_RECV_BUFFER_SIZE (8k) bytes off the stream at once,
and when reading a packet we already have a `maxmsglen` that is
PQ_LARGE_MESSAGE_LIMIT (1gb) already, and "We abort the connection (by
returning EOF) if client tries to send more than that.)".  Therefore,
we effectively already have a limit of 1gb that applies to regular
messages too, and I think we should rely on this mechanism for
compressed data too (if we really think we need to make that number
configurable we probably could, but again the fewer new knobs we need
to add the better.


> I think 5 is the most complicated to deal with, especially as it
> depends on the actual usage to know what is safe. I believe we should
> let users have the freedom to make their own security tradeoffs, but
> we should protect them against some of the most glaring issues
> (especially ones that benefit little from compression anyway). As
> already shown by Andrey, sending LDAP passwords in a compressed way
> seems extremely dangerous. So I think we should disallow compressing
> any authentication related packets. To reduce similar risks further we
> can choose to compress only the message types that we expect to
> benefit most from compression. IMHO those are the following (marked
> with (B)ackend or (F)rontend to show who sends them):
> - Query (F)
> - Parse (F)
> - Describe (F)
> - Bind (F)
> - RowDescription (B)
> - DataRow (B)
> - CopyData (B/F)

That seems like a reasonable list (current implementation is just
CopyData/DataRow/Query, but I really just copied that fairly blindly
from the previous incarnation of this effort.) See also my comment
below 1&2 for if we think we need to block decompressing them too.

> Then I think we should let users choose how they want to compress and
> where they want their compression stream to restart. Something like
> this:
> a. compression_restart=query: Restart the stream after every query.
> Recommended if queries across the same connection are triggered by
> different end-users. I think this would be a sane default
> b. compression_restart=message: Restart the stream for every message.
> Recommended if the amount of correlation between rows of the same
> query is a security concern.
> c. compression_restart=manual: Don't restart the stream automatically,
> but only when the client user calls a specific function. Recommended
> only if the user can make trade-offs, or if no encryption is used
> anyway.

I reasonably like this idea, though I think maybe we should also
(instead of query?) add per-transaction on the backend side.  I'm
curious what other people think of this.


--
Jacob Burroughs | Staff Software Engineer
E: jburroughs@instructure.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: libpq compression (part 3)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: First draft of PG 17 release notes