Re: libpq compression

Поиск
Список
Период
Сортировка
От Robbie Harwood
Тема Re: libpq compression
Дата
Msg-id jlgin6a6cdm.fsf@redhat.com
обсуждение исходный текст
Ответ на Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

> On 21.06.2018 20:14, Robbie Harwood wrote:
>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>> On 21.06.2018 17:56, Robbie Harwood wrote:
>>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>>>> On 20.06.2018 23:34, Robbie Harwood wrote:
>>>>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>>>>>
>>>>>> Well, that's a design decision you've made.  You could put
>>>>>> lengths on chunks that are sent out - then you'd know exactly how
>>>>>> much is needed.  (For instance, 4 bytes of network-order length
>>>>>> followed by a complete payload.)  Then you'd absolutely know
>>>>>> whether you have enough to decompress or not.
>>>>>
>>>>> Do you really suggest to send extra header for each chunk of data?
>>>>> Please notice that chunk can be as small as one message: dozen of
>>>>> bytes because libpq is used for client-server communication with
>>>>> request-reply pattern.
>>>>
>>>> I want you to think critically about your design.  I *really* don't
>>>> want to design it for you - I have enough stuff to be doing.  But
>>>> again, the design I gave you doesn't necessarily need that - you
>>>> just need to properly buffer incomplete data.
>>>
>>> Right now secure_read may return any number of available bytes. But
>>> in case of using streaming compression, it can happen that available
>>> number of bytes is not enough to perform decompression. This is why
>>> we may need to try to fetch additional portion of data. This is how
>>> zpq_stream is working now.
>>
>> No, you need to buffer and wait until you're called again.  Which is
>> to say: decompress() shouldn't call secure_read().  secure_read()
>> should call decompress().
>
> I this case I will have to implement this code twice: both for backend
> and frontend, i.e. for secure_read/secure_write and
> pqsecure_read/pqsecure_write.

Likely, yes.  You can see that this is how TLS does it (which you should
be using as a model, architecture-wise).

> Frankly speaking i was very upset by design of libpq communication
> layer in Postgres: there are two different implementations of almost
> the same stuff for cbackend and frontend.

Changing the codebases so that more could be shared is not necessarily a
bad idea; however, it is a separate change from compression.

>>> I do not understand how it is possible to implement in different way
>>> and what is wrong with current implementation.
>>
>> The most succinct thing I can say is: absolutely don't modify
>> pq_recvbuf().  I gave you pseudocode for how to do that.  All of your
>> logic should be *below* the secure_read/secure_write functions.
>>
>> I cannot approve code that modifies pq_recvbuf() in the manner you
>> currently do.
>
> Well. I understand you arguments.  But please also consider my
> argument above (about avoiding code duplication).
>
> In any case, secure_read function is called only from pq_recvbuf() as
> well as pqsecure_read is called only from pqReadData.  So I do not
> think principle difference in handling compression in secure_read or
> pq_recvbuf functions and do not understand why it is "destroying the
> existing model".
>
> Frankly speaking, I will really like to destroy existed model, moving
> all system dependent stuff in Postgres to SAL and avoid this awful mix
> of code sharing and duplication between backend and frontend. But it
> is a another story and I do not want to discuss it here.

I understand you want to avoid code duplication.  I will absolutely
agree that the current setup makes it difficult to share code between
postmaster and libpq clients.  But the way I see it, you have two
choices:

1. Modify the code to make code sharing easier.  Once this has been
   done, *then* build a compression patch on top, with the nice new
   architecture.

2. Leave the architecture as-is and add compression support.
   (Optionally, you can make code sharing easier at a later point.)

Fundamentally, I think you value code sharing more than I do.  So while
I might advocate for (2), you might personally prefer (1).

But right now you're not doing either of those.

> If we are speaking about the "right design", then neither your
> suggestion, neither my implementation are good and I do not see
> principle differences between them.
>
> The right approach is using "decorator pattern": this is how streams
> are designed in .Net/Java. You can construct pipe of "secure",
> "compressed" and whatever else streams.

I *strongly* disagree, but I don't think you're seriously suggesting
this.

>>>>>> My idea was the following: client want to use compression. But
>>>>>> server may reject this attempt (for any reasons: it doesn't support
>>>>>> it, has no proper compression library, do not want to spend CPU for
>>>>>> decompression,...) Right now compression algorithm is
>>>>>> hardcoded. But in future client and server may negotiate to choose
>>>>>> proper compression protocol.  This is why I prefer to perform
>>>>>> negotiation between client and server to enable compression.  Well,
>>>>>> for negotiation you could put the name of the algorithm you want in
>>>>>> the startup.  It doesn't have to be a boolean for compression, and
>>>>>> then you don't need an additional round-trip.
>>>>> Sorry, I can only repeat arguments I already mentioned:
>>>>> - in future it may be possible to specify compression algorithm
>>>>> - even with boolean compression option server may have some reasons to
>>>>> reject client's request to use compression
>>>>>
>>>>> Extra flexibility is always good thing if it doesn't cost too
>>>>> much. And extra round of negotiation in case of enabling compression
>>>>> seems to me not to be a high price for it.
>>>> You already have this flexibility even without negotiation.  I don't
>>>> want you to lose your flexibility.  Protocol looks like this:
>>>>
>>>> - Client sends connection option "compression" with list of
>>>>    algorithms it wants to use (comma-separated, or something).
>>>>
>>>> - First packet that the server can compress one of those algorithms
>>>>    (or none, if it doesn't want to turn on compression).
>>>>
>>>> No additional round-trips needed.
>>> This is exactly how it works now...  Client includes compression
>>> option in connection string and server replies with special message
>>> ('Z') if it accepts request to compress traffic between this client
>>> and server.
>>
>> No, it's not.  You don't need this message.  If the server receives a
>> compression request, it should just turn compression on (or not), and
>> then have the client figure out whether it got compression back.
>
> How it will managed to do it. It receives some reply and first of all
> it should know whether it has to be decompressed or not.

You can tell whether a message is compressed by looking at it.  The way
the protocol works, every message has a type associated with it: a
single byte, like 'R', that says what kind of message it is.

Thanks,
--Robbie

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rui DeSousa
Дата:
Сообщение: Re: New function pg_stat_statements_reset_query() to reset statisticsof a specific query
Следующее
От: Nico Williams
Дата:
Сообщение: Re: libpq compression