Re: libpq compression

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: libpq compression
Дата
Msg-id b331206f-77eb-e002-78a2-4d5cd6c8984d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: libpq compression  (Robbie Harwood <rharwood@redhat.com>)
Список pgsql-hackers

On 08.02.2019 19:26, Robbie Harwood wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>
>> On 08.02.2019 10:01, Iwata, Aya wrote:
>>
>>>> I fixed all issues you have reported except using list of supported
>>>> compression algorithms.
>>> Sure. I confirmed that.
>>>
>>>> It will require extra round of communication between client and
>>>> server to make a decision about used compression algorithm.
>>> In beginning of this thread, Robbie Harwood said that no extra
>>> communication needed.  I think so, too.
>> Well, I think that this problem is more complex and requires more
>> discussion.
>> There are three places determining choice of compression algorithm:
>> 1. Specification of compression algorithm by client. Right now it is
>> just boolean "compression" parameter in connection string,
>> but it is obviously possible to specify concrete algorithm here.
>> 2. List of compression algorithms supported by client.
>> 3. List of compression algorithms supported by server.
>>
>> Concerning first option I have very serious doubt that it is good idea
>> to let client choose compression protocol.
>> Without extra round-trip it can be only implemented in this way:
>> if client toggles compression option in connection string, then libpq
>> includes in startup packet list of supported compression algorithms.
>> Then server intersects this list with its own set of supported
>> compression algorithms and if result is not empty, then
>> somehow choose  one of the commonly supported algorithms and sends it to
>> the client with 'z' command.
> The easiest way, which I laid out earlier in
> id:jlgfu1gqjbk.fsf@redhat.com, is to have the server perform selection.
> The client sends a list of supported algorithms in startup.  Startup has
> a reply, so if the server wishes to use compression, then its startup
> reply contains which algorithm to use.  Compression then begins after
> startup.
>
> If you really wanted to compress half the startup for some reason, you
> can even have the server send a packet which consists of the choice of
> compression algorithm and everything else in it subsequently
> compressed.  I don't see this being useful.  However, you can use a
> similar approach to let the client choose the algorithm if there were
> some compelling reason for that (there is none I'm aware of to prefer
> one to the other) - startup from client requests compression, reply from
> server lists supported algorithms, next packet from client indicates
> which one is in use along with compressed payload.
I already replied you that that next package cannot indicate which 
algorithm the client has choosen.
Using magics or some other tricks are not reliable and acceptable solution/
It can be done only by introducing extra message type.

Actually it is not really needed, because if client sends to the server 
list of supported algorithms in startup packet,
then server can made a decision and inform client about it (using 
special message type as it is done now).
Then client doesn't need to make a choice. I can do it if everybody 
think that choice of compression algorithm is so important feature.
Just wonder: Postgres now is living good with hardcoded zlib and 
built-in LZ compression algorithm implementation.

> It may help to keep in mind that you are defining your own message type
> here.
>
>> Frankly speaking, I do not think that such flexibility in choosing
>> compression algorithms is really needed.
>> I do not expect that there will be many situations where old client has
>> to communicate with new server or visa versa.
>> In most cases both client and server belongs to the same postgres
>> distributive and so implements the same compression algorithm.
>> As far as we are compressing only temporary data (traffic), the problem
>> of providing backward compatibility seems to be not so important.
> Your comments have been heard, but this is the model that numerous folks
> from project has told you we have.  Your code will not pass review
> without algorithm agility.
>
>>> src/backend/libpq/pqcomm.c :
>>> In current Postgres source code, pq_recvbuf() calls secure_read()
>>> and pq_getbyte_if_available() also calls secure_read().
>>> It means these functions are on the same level.
>>> However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
>>> and  pq_recvbuf() calls secure_read(). The level of these functions is different.
>>>
>>> I think the purpose of pq_getbyte_if_available() is to get a
>>> character if it exists and the purpose of pq_recvbuf() is to acquire
>>> data up to the expected length.  In your change,
>>> pq_getbyte_if_available() may have to do unnecessary process waiting
>>> or something.
>> Sorry, but this change is essential. We can have some available data
>> in compression buffer and we need to try to fetch it in
>> pq_getbyte_if_available() instead of just returning EOF.
> Aya is correct about the purposes of these functions.  Take a look at
> how the buffering in TLS or GSSAPI works for an example of how to do
> this correctly.
>
> As with agility, this is what multiple folks from the project have told
> you is a hard requirement.  None of us will be okaying your code without
> proper transport layering.
Guys, I wondering which layering violation you are talking about?
Right now there are two cut&pasted peace of almost the same code in 
pqcomm.c:

static int
pq_recvbuf(void)
...
     /* Ensure that we're in blocking mode */
     socket_set_nonblocking(false);

     /* Can fill buffer from PqRecvLength and upwards */
     for (;;)
     {
         int            r;

         r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
                         PQ_RECV_BUFFER_SIZE - PqRecvLength);

         if (r < 0)
         {
             if (errno == EINTR)
                 continue;        /* Ok if interrupted */

             /*
              * Careful: an ereport() that tries to write to the client 
would
              * cause recursion to here, leading to stack overflow and core
              * dump!  This message must go *only* to the postmaster log.
              */
             ereport(COMMERROR,
                     (errcode_for_socket_access(),
                      errmsg("could not receive data from client: %m")));
             return EOF;
         }
         if (r == 0)
         {
             /*
              * EOF detected.  We used to write a log message here, but it's
              * better to expect the ultimate caller to do that.
              */
             return EOF;
         }
         /* r contains number of bytes read, so just incr length */
         PqRecvLength += r;
         return 0;
     }
}

int
pq_getbyte_if_available(unsigned char *c)
{
...
     /* Put the socket into non-blocking mode */
     socket_set_nonblocking(true);

     r = secure_read(MyProcPort, c, 1);
     if (r < 0)
     {
         /*
          * Ok if no data available without blocking or interrupted (though
          * EINTR really shouldn't happen with a non-blocking socket). 
Report
          * other errors.
          */
         if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
             r = 0;
         else
         {
             /*
              * Careful: an ereport() that tries to write to the client 
would
              * cause recursion to here, leading to stack overflow and core
              * dump!  This message must go *only* to the postmaster log.
              */
             ereport(COMMERROR,
                     (errcode_for_socket_access(),
                      errmsg("could not receive data from client: %m")));
             r = EOF;
         }
     }
     else if (r == 0)
     {
         /* EOF detected */
         r = EOF;
     }

     return r;
}


The only difference between them is that in first case we are using 
block mode and in the second case non-blocking mode.
Also there is loop in first case handling ENITR.

I have added nowait parameter to  pq_recvbuf(bool nowait)
and remove second fragment of code so it is changed just to one line:

if (PqRecvPointer < PqRecvLength || (r = pq_recvbuf(true)) > 0)

Both functions (pq_recvbuf and pq_getbyte_if_available) are defined in 
the same file.
So there is not any layering violation here. It is just elimination of 
duplicated code.

If the only objection is that pq_getbyte_if_available is calling 
pq_recvbuf, then I can add some other function compress_read (as analog 
of secure read) and call it from both functions. But frankly speaking I 
do not see any advantages of such approach.
It just introduce extra function call and gives no extra 
encapsulation.modularity, flexibility or whatever else.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: s.cherkashin@postgrespro.ru
Дата:
Сообщение: Re: Add client connection check during the execution of the query
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: pgsql: Remove references to Majordomo