Re: libpq compression

Поиск
Список
Период
Сортировка
От Robbie Harwood
Тема Re: libpq compression
Дата
Msg-id jlgo9g5qjst.fsf@redhat.com
обсуждение исходный текст
Ответ на Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

> On 20.06.2018 00:04, Robbie Harwood wrote:
>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>> On 18.06.2018 23:34, Robbie Harwood wrote:
>>>
>>>> I also don't like that you've injected into the *startup* path -
>>>> before authentication takes place.  Fundamentally, authentication
>>>> (if it happens) consists of exchanging some combination of short
>>>> strings (e.g., usernames) and binary blobs (e.g., keys).  None of
>>>> this will compress well, so I don't see it as worth performing this
>>>> negotiation there - it can wait.  It's also another message in
>>>> every startup.  I'd leave it to connection parameters, personally,
>>>> but up to you.
>>>
>>>   From my point of view compression of libpq traffic is similar with
>>> SSL and should be toggled at the same stage.
>>
>> But that's not what you're doing.  This isn't where TLS gets toggled.
>>
>> TLS negotiation happens as the very first packet: after completing
>> the TCP handshake, the client will send a TLS negotiation request.
>> If it doesn't happen there, it doesn't happen at all.
>>
>> (You *could* configure it where TLS is toggled.  This is, I think,
>> not a good idea.  TLS encryption is a probe: the server can reject
>> it, at which point the client tears everything down and connects
>> without TLS.  So if you do the same with compression, that's another
>> point of tearing down an starting over.  The scaling on it isn't good
>> either: if we add another encryption method into the mix, you've
>> doubled the number of teardowns.)
>
> Yes, you are right. There is special message for enabling TLS
> procotol.  But I do not think that the same think is needed for
> compression.  This is why I prefer to specify compression in
> connectoin options. So compression may be enabled straight after
> processing of startup package.  Frankly speaking I still do no see
> reasons to postpone enabling compression till some later moment.

I'm arguing for connection option only (with no additional negotiation
round-trip).  See below.

>>> Definitely authentication parameter are not so large to be
>>> efficiently compressed, by compression (may be in future password
>>> protected) can somehow protect this data.  In any case I do not
>>> think that compression of authentication data may have any influence
>>> on negotiation speed.  So I am not 100% sure that toggling
>>> compression just after receiving startup package is the only right
>>> solution.  But I am not also convinced in that there is some better
>>> place where compressor should be configured.  Do you have some
>>> concrete suggestions for it? In InitPostgres just after
>>> PerformAuthentication ?
>>
>> Hmm, let me try to explain this differently.
>>
>> pq_configure() (as you've called it) shouldn't send a packet.  At its
>> callsite, we *already know* whether we want to use compression -
>> that's what the port->use_compression option says.  So there's no
>> point in having a negotiation there - it's already happened.
>
> My idea was the following: client want to use compression. But server
> may reject this attempt (for any reasons: it doesn't support it, has
> no proper compression library, do not want to spend CPU for
> decompression,...) Right now compression algorithm is hardcoded. But
> in future client and server may negotiate to choose proper compression
> protocol.  This is why I prefer to perform negotiation between client
> and server to enable compression.

Well, for negotiation you could put the name of the algorithm you want
in the startup.  It doesn't have to be a boolean for compression, and
then you don't need an additional round-trip.

>>>> I don't like where you've put the entry points to the compression
>>>> logic: it's a layering violation.  A couple people have expressed
>>>> similar reservations I think, so let me see if I can explain using
>>>> `pqsecure_read()` as an example.  In pseudocode, `pqsecure_read()`
>>>> looks like this:
>>>>
>>>>       if conn->is_tls:
>>>>           n = tls_read(conn, ptr, len)
>>>>       else:
>>>>           n = pqsecure_raw_read(conn, ptr, len)
>>>>       return n
>>>>
>>>> I want to see this extended by your code to something like:
>>>>
>>>>       if conn->is_tls:
>>>>           n = tls_read(conn, ptr, len)
>>>>       else:
>>>>           n = pqsecure_raw_read(conn, ptr, len)
>>>>
>>>>       if conn->is_compressed:
>>>>           n = decompress(ptr, n)
>>>>
>>>>       return n
>>>>
>>>> In conjunction with the above change, this should also
>>>> significantly reduce the size of the patch (I think).
>>>
>>> Yes, it will simplify patch. But make libpq compression completely
>>> useless (see my explanation above).  We need to use streaming
>>> compression, and to be able to use streaming compression I have to
>>> pass function for fetching more data to compression library.
>>
>> I don't think you need that, even with the streaming API.
>>
>> To make this very concrete, let's talk about ZSTD_decompressStream (I'm
>> pulling information from
>> https://facebook.github.io/zstd/zstd_manual.html#Chapter7 ).
>>
>> Using the pseudocode I'm asking for above, the decompress() function
>> would look vaguely like this:
>>
>>      decompress(ptr, n)
>>          ZSTD_inBuffer in = {0}
>>          ZpSTD_outBuffer out = {0}
>>
>>          in.src = ptr
>>          in.size = n
>>
>>          while in.pos < in.size:
>>              ret = ZSTD_decompressStream(out, in)
>>              if ZSTD_isError(ret):
>>                  give_up()
>>
>>          memcpy(ptr, out.dst, out.pos)
>>          return out.pos
>>
>> (and compress() would follow a similar pattern, if we were to talk
>> about it).
>
> It will not work in this way. We do not know how much input data we
> need to be able to decompress message.

Well, that's a design decision you've made.  You could put lengths on
chunks that are sent out - then you'd know exactly how much is needed.
(For instance, 4 bytes of network-order length followed by a complete
payload.)  Then you'd absolutely know whether you have enough to
decompress or not.

> So loop should be something like this:
>
>   decompress(ptr, n)
>          ZSTD_inBuffer in = {0}
>          ZSTD_outBuffer out = {0}
>
>          in.src = ptr
>          in.size = n
>
>          while true
>              ret = ZSTD_decompressStream(out, in)
>              if ZSTD_isError(ret):
>                  give_up()
>             if out.pos != 0
>         // if we deomcpress soemthing
>         return out.pos;
>         read_input(in);

The last line is what I'm taking issue with.  The interface we have
already in postgres's network code has a notion of partial reads, or
that reads might not return data.  (Same with writing and partial
writes.)  So you'd buffer what compressed data you have and return -
this is the preferred idiom so that we don't block or spin on a
nonblocking socket.

This is how the TLS code works already.  Look at, for instance,
pgtls_read().  If we get back SSL_ERROR_WANT_READ (i.e., not enough data
to decrypt), we return no data and wait until the socket becomes
readable again.

Thanks,
--Robbie

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Avoiding Tablespace path collision for primary and standby
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)