Re: libpq compression

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: libpq compression
Дата
Msg-id 3b0260da-fe47-2341-da01-51383fe9172d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: libpq compression  (Robbie Harwood <rharwood@redhat.com>)
Ответы Re: libpq compression  (Robbie Harwood <rharwood@redhat.com>)
Re: libpq compression  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers


On 18.06.2018 23:34, Robbie Harwood wrote:
t


Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

On 06.06.2018 02:03, Thomas Munro wrote:
On Wed, Jun 6, 2018 at 2:06 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
Thank you for review. Updated version of the patch fixing all reported
problems is attached.
Small problem on Windows[1]:
  C:\projects\postgresql\src\include\common/zpq_stream.h(17): error
C2143: syntax error : missing ')' before '*'
[C:\projects\postgresql\libpq.vcxproj]
2395

You used ssize_t in zpq_stream.h, but Windows doesn't have that type.
We have our own typedef in win32_port.h.  Perhaps zpq_stream.c should
include postgres.h/postgres_fe.h (depending on FRONTEND) like the
other .c files in src/common, before it includes zpq_stream.h?
Instead of "c.h".

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.1106

Thank you very much for reporting the problem.
I attached new patch with include of postgres_fe.h added to zpq_stream.c
Hello!

Due to being in a similar place, I'm offering some code review.  I'm
excited that you're looking at throughput on the network stack - it's
not usually what we think of in database performance.  Here are some
first thoughts, which have some overlap with what others have said on
this thread already:

###

This build still doesn't pass Windows:
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.2277

You can find more about what the bot is doing here:
http://cfbot.cputube.org/
Thank you.
Looks like I found the reason: Mkvsbuild.pm has to be patched.



###

I have a few misgivings about pq_configure(), starting with the name.
The *only* thing this function does is set up compression, so it's
mis-named.  (There's no point in making something generic unless it's
needed - it's just confusing.)
Well, my intention was that this function *may* in future perform some other configuration setting not related with compression.
And it is better to encapsulate this knowledge in pqcomm, rather than make  postmaster (BackendStartup) worry about it.
But I can rename this function to pq_cofigure_compression() or whatever your prefer.


I also don't like that you've injected into the *startup* path - before
authentication takes place.  Fundamentally, authentication (if it
happens) consists of exchanging some combination of short strings (e.g.,
usernames) and binary blobs (e.g., keys).  None of this will compress
well, so I don't see it as worth performing this negotiation there - it
can wait.  It's also another message in every startup.  I'd leave it to
connection parameters, personally, but up to you.

From my point of view compression of libpq traffic is similar with SSL and should be toggled at the same stage.
Definitely authentication parameter are not so large to be efficiently compressed, by compression (may be in future password protected) can somehow protect this data.
In any case I do not think that compression of authentication data may have any influence on negotiation speed.
So I am not 100% sure that toggling compression just after receiving startup package is the only right solution.
But I am not also convinced in that there is some better place where compressor should be configured.
Do you have some concrete suggestions for it? In InitPostgres just after PerformAuthentication ?
Also please notice that compression is useful not only for client-server communication, but also for replication channels.
Right now it is definitely used in both cases, but if we move pq_configure somewhere else, we should check that this code is invoked in both for normal backends and walsender.


###

Documentation!  You're going to need it.  There needs to be enough
around for other people to implement the protocol (or if you prefer,
enough for us to debug the protocol as it exists).

In conjunction with that, please add information on how to set up
compressed vs. uncompressed connections - similarly to how we've
documentation on setting up TLS connection (though presumably compressed
connection documentation will be shorter).

Sorry, definitely I will add documentation for configuring compression.


###

Using terminology from https://facebook.github.io/zstd/zstd_manual.html :

Right now you use streaming (ZSTD_{compress,decompress}Stream()) as the
basis for your API.  I think this is probably a mismatch for what we're
doing here - we're doing one-shot compression/decompression of packets,
not sending video or something.

I think our use case is better served by the non-streaming interface, or
what they call the "Simple API" (ZSTD_{decompress,compress}()).
Documentation suggests it may be worth it to keep an explicit context
around and use that interface instead (i.e.,
ZSTD_{compressCCTx,decompressDCtx}()), but that's something you'll have
to figure out.

You may find making this change helpful for addressing the next issue.

Sorry, but here I completely disagree with you.
What we are doing is really streaming compression, not compression of individual messages/packages.
Yes, it is not a video, but actually COPY data has the same nature as video data.
The main benefit of streaming compression is that we can use the same dictionary for compressing all messages (and adjust this dictionary based on new data).
We do not need to write dictionary and separate header for each record. Otherwize compression of libpq messages will be completely useless: typical size of message is too short to be efficiently compressed. The main drawback of streaming compression is that you can not decompress some particular message without decompression of all previous messages.
This is why streaming compression can not be used to compress database pages  (as it is done in CFS, provided in PostgresPro EE). But for libpq it is no needed.


###

I don't like where you've put the entry points to the compression logic:
it's a layering violation.  A couple people have expressed similar
reservations I think, so let me see if I can explain using
`pqsecure_read()` as an example.  In pseudocode, `pqsecure_read()` looks
like this:
   if conn->is_tls:       n = tls_read(conn, ptr, len)   else:       n = pqsecure_raw_read(conn, ptr, len)   return n

I want to see this extended by your code to something like:
   if conn->is_tls:       n = tls_read(conn, ptr, len)   else:       n = pqsecure_raw_read(conn, ptr, len)
   if conn->is_compressed:       n = decompress(ptr, n)
   return n

In conjunction with the above change, this should also significantly
reduce the size of the patch (I think).

Yes, it will simplify patch. But make libpq compression completely useless (see my explanation above).
We need to use streaming compression, and to be able to use streaming compression I have to pass function for fetching more data to compression library.


###

The compression flag has proven somewhat contentious, as you've already
seen on this thread.  I recommend removing it for now and putting it in
a separate patch to be merged later, since it's separable.

###

It's not worth flagging style violations in your code right now, but you
should be aware that there are quite a few style and whitespace
problems.  In particular, please be sure that you're using hard tabs
when appropriate, and that line lengths are fewer than 80 characters
(unless they contain error messages), and that pointers are correctly
declared (`void *arg`, not `void* arg`).

###

Ok, I will fix it.


Thanks,
--Robbie

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dent John
Дата:
Сообщение: Re: Query Rewrite for Materialized Views (Postgres Extension)
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Partitioning with temp tables is broken