Re: libpq compression

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: libpq compression
Дата
Msg-id 5b673a96-b16e-9a79-c0ec-f7f03f207c62@postgrespro.ru
обсуждение исходный текст
Ответ на Re: libpq compression  (Robbie Harwood <rharwood@redhat.com>)
Ответы Re: libpq compression  (Robbie Harwood <rharwood@redhat.com>)
Список pgsql-hackers

On 21.06.2018 20:14, Robbie Harwood wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>
>> On 21.06.2018 17:56, Robbie Harwood wrote:
>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>>> On 20.06.2018 23:34, Robbie Harwood wrote:
>>>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>>>>
>>>>> Well, that's a design decision you've made.  You could put lengths
>>>>> on chunks that are sent out - then you'd know exactly how much is
>>>>> needed.  (For instance, 4 bytes of network-order length followed by
>>>>> a complete payload.)  Then you'd absolutely know whether you have
>>>>> enough to decompress or not.
>>>> Do you really suggest to send extra header for each chunk of data?
>>>> Please notice that chunk can be as small as one message: dozen of
>>>> bytes because libpq is used for client-server communication with
>>>> request-reply pattern.
>>> I want you to think critically about your design.  I *really* don't
>>> want to design it for you - I have enough stuff to be doing.  But
>>> again, the design I gave you doesn't necessarily need that - you just
>>> need to properly buffer incomplete data.
>> Right now secure_read may return any number of available bytes. But in
>> case of using streaming compression, it can happen that available
>> number of bytes is not enough to perform decompression. This is why we
>> may need to try to fetch additional portion of data. This is how
>> zpq_stream is working now.
> No, you need to buffer and wait until you're called again.  Which is to
> say: decompress() shouldn't call secure_read().  secure_read() should
> call decompress().
>

I this case I will have to implement this code twice: both for backend 
and frontend, i.e. for secure_read/secure_write and 
pqsecure_read/pqsecure_write.
Frankly speaking i was very upset by design of libpq communication layer 
in Postgres: there are two different implementations of almost the same 
stuff for cbackend and frontend.
I do not see any meaningful argument for it except "historical reasons". 
The better decision was to encapsulate socket communication layer (and 
some other system dependent stuff) in SAL (system abstraction layer) and 
use it both in backend and frontend.

By passing secure_read/pqsecure_read functions to zpq_stream I managed 
to avoid such code duplication at least for compression.

>> I do not understand how it is possible to implement in different way
>> and what is wrong with current implementation.
> The most succinct thing I can say is: absolutely don't modify
> pq_recvbuf().  I gave you pseudocode for how to do that.  All of your
> logic should be *below* the secure_read/secure_write functions.
>
> I cannot approve code that modifies pq_recvbuf() in the manner you
> currently do.

Well. I understand you arguments.
But please also consider my argument above (about avoiding code 
duplication).

In any case, secure_read function is called only from pq_recvbuf() as 
well as pqsecure_read is called only from pqReadData.
So I do not think principle difference in handling compression in 
secure_read or pq_recvbuf functions and do not understand why it is
"destroying the existing model".

Frankly speaking, I will really like to destroy existed model, moving 
all system dependent stuff in Postgres to SAL and avoid this awful mix 
of code
sharing and duplication between backend and frontend. But it is a 
another story and I do not want to discuss it here.

If we are speaking about the "right design", then neither your 
suggestion, neither my implementation are good and I do not see 
principle differences between them.
The right approach is using "decorator pattern": this is how streams are 
designed in .Net/Java. You can construct pipe of "secure", "compressed" 
and whatever else streams.
Yes, it is first of all pattern for object-oriented approach and 
Postgres is implemented in C. But it is actually possible to use OO 
approach in pure C (X-Windows!).
But once again, this discussion may lead other too far away from the 
topic of libpq compression.

As far as I already wrote, the main  points of my design were:
1. Minimize changes in Postgres code
2. Avoid code duplication
3. Provide abstract (compression stream) which can be used somewhere 
else except libpq itself.


>
>>>>> My idea was the following: client want to use compression. But
>>>>> server may reject this attempt (for any reasons: it doesn't support
>>>>> it, has no proper compression library, do not want to spend CPU for
>>>>> decompression,...) Right now compression algorithm is
>>>>> hardcoded. But in future client and server may negotiate to choose
>>>>> proper compression protocol.  This is why I prefer to perform
>>>>> negotiation between client and server to enable compression.  Well,
>>>>> for negotiation you could put the name of the algorithm you want in
>>>>> the startup.  It doesn't have to be a boolean for compression, and
>>>>> then you don't need an additional round-trip.
>>>> Sorry, I can only repeat arguments I already mentioned:
>>>> - in future it may be possible to specify compression algorithm
>>>> - even with boolean compression option server may have some reasons to
>>>> reject client's request to use compression
>>>>
>>>> Extra flexibility is always good thing if it doesn't cost too
>>>> much. And extra round of negotiation in case of enabling compression
>>>> seems to me not to be a high price for it.
>>> You already have this flexibility even without negotiation.  I don't
>>> want you to lose your flexibility.  Protocol looks like this:
>>>
>>> - Client sends connection option "compression" with list of
>>>    algorithms it wants to use (comma-separated, or something).
>>>
>>> - First packet that the server can compress one of those algorithms
>>>    (or none, if it doesn't want to turn on compression).
>>>
>>> No additional round-trips needed.
>> This is exactly how it works now...  Client includes compression
>> option in connection string and server replies with special message
>> ('Z') if it accepts request to compress traffic between this client
>> and server.
> No, it's not.  You don't need this message.  If the server receives a
> compression request, it should just turn compression on (or not), and
> then have the client figure out whether it got compression back.
How it will managed to do it. It receives some reply and first of all it 
should know whether it has to be decompressed or not.


>   This
> is of course made harder by your refusal to use packet framing, but
> still shouldn't be particularly difficult.
But how?

>
> Thanks,
> --Robbie

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: PATCH: backtraces for error messages
Следующее
От: Antonin Houska
Дата:
Сообщение: Re: Push down Aggregates below joins