Re: libpq compression
От | Konstantin Knizhnik |
---|---|
Тема | Re: libpq compression |
Дата | |
Msg-id | 3b0260da-fe47-2341-da01-51383fe9172d@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: libpq compression (Robbie Harwood <rharwood@redhat.com>) |
Ответы |
Re: libpq compression
(Robbie Harwood <rharwood@redhat.com>)
Re: libpq compression (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Список | pgsql-hackers |
On 18.06.2018 23:34, Robbie Harwood wrote:
Thank you.tKonstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:On 06.06.2018 02:03, Thomas Munro wrote:On Wed, Jun 6, 2018 at 2:06 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Thank you for review. Updated version of the patch fixing all reported problems is attached.Small problem on Windows[1]: C:\projects\postgresql\src\include\common/zpq_stream.h(17): error C2143: syntax error : missing ')' before '*' [C:\projects\postgresql\libpq.vcxproj] 2395 You used ssize_t in zpq_stream.h, but Windows doesn't have that type. We have our own typedef in win32_port.h. Perhaps zpq_stream.c should include postgres.h/postgres_fe.h (depending on FRONTEND) like the other .c files in src/common, before it includes zpq_stream.h? Instead of "c.h". [1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.1106Thank you very much for reporting the problem. I attached new patch with include of postgres_fe.h added to zpq_stream.cHello! Due to being in a similar place, I'm offering some code review. I'm excited that you're looking at throughput on the network stack - it's not usually what we think of in database performance. Here are some first thoughts, which have some overlap with what others have said on this thread already: ### This build still doesn't pass Windows: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.2277 You can find more about what the bot is doing here: http://cfbot.cputube.org/
Looks like I found the reason: Mkvsbuild.pm has to be patched.
Well, my intention was that this function *may* in future perform some other configuration setting not related with compression.### I have a few misgivings about pq_configure(), starting with the name. The *only* thing this function does is set up compression, so it's mis-named. (There's no point in making something generic unless it's needed - it's just confusing.)
And it is better to encapsulate this knowledge in pqcomm, rather than make postmaster (BackendStartup) worry about it.
But I can rename this function to pq_cofigure_compression() or whatever your prefer.
I also don't like that you've injected into the *startup* path - before authentication takes place. Fundamentally, authentication (if it happens) consists of exchanging some combination of short strings (e.g., usernames) and binary blobs (e.g., keys). None of this will compress well, so I don't see it as worth performing this negotiation there - it can wait. It's also another message in every startup. I'd leave it to connection parameters, personally, but up to you.
From my point of view compression of libpq traffic is similar with SSL and should be toggled at the same stage.
Definitely authentication parameter are not so large to be efficiently compressed, by compression (may be in future password protected) can somehow protect this data.
In any case I do not think that compression of authentication data may have any influence on negotiation speed.
So I am not 100% sure that toggling compression just after receiving startup package is the only right solution.
But I am not also convinced in that there is some better place where compressor should be configured.
Do you have some concrete suggestions for it? In InitPostgres just after PerformAuthentication ?
Also please notice that compression is useful not only for client-server communication, but also for replication channels.
Right now it is definitely used in both cases, but if we move pq_configure somewhere else, we should check that this code is invoked in both for normal backends and walsender.
### Documentation! You're going to need it. There needs to be enough around for other people to implement the protocol (or if you prefer, enough for us to debug the protocol as it exists). In conjunction with that, please add information on how to set up compressed vs. uncompressed connections - similarly to how we've documentation on setting up TLS connection (though presumably compressed connection documentation will be shorter).
Sorry, definitely I will add documentation for configuring compression.
### Using terminology from https://facebook.github.io/zstd/zstd_manual.html : Right now you use streaming (ZSTD_{compress,decompress}Stream()) as the basis for your API. I think this is probably a mismatch for what we're doing here - we're doing one-shot compression/decompression of packets, not sending video or something. I think our use case is better served by the non-streaming interface, or what they call the "Simple API" (ZSTD_{decompress,compress}()). Documentation suggests it may be worth it to keep an explicit context around and use that interface instead (i.e., ZSTD_{compressCCTx,decompressDCtx}()), but that's something you'll have to figure out. You may find making this change helpful for addressing the next issue.
Sorry, but here I completely disagree with you.
What we are doing is really streaming compression, not compression of individual messages/packages.
Yes, it is not a video, but actually COPY data has the same nature as video data.
The main benefit of streaming compression is that we can use the same dictionary for compressing all messages (and adjust this dictionary based on new data).
We do not need to write dictionary and separate header for each record. Otherwize compression of libpq messages will be completely useless: typical size of message is too short to be efficiently compressed. The main drawback of streaming compression is that you can not decompress some particular message without decompression of all previous messages.
This is why streaming compression can not be used to compress database pages (as it is done in CFS, provided in PostgresPro EE). But for libpq it is no needed.
### I don't like where you've put the entry points to the compression logic: it's a layering violation. A couple people have expressed similar reservations I think, so let me see if I can explain using `pqsecure_read()` as an example. In pseudocode, `pqsecure_read()` looks like this: if conn->is_tls: n = tls_read(conn, ptr, len) else: n = pqsecure_raw_read(conn, ptr, len) return n I want to see this extended by your code to something like: if conn->is_tls: n = tls_read(conn, ptr, len) else: n = pqsecure_raw_read(conn, ptr, len) if conn->is_compressed: n = decompress(ptr, n) return n In conjunction with the above change, this should also significantly reduce the size of the patch (I think).
Yes, it will simplify patch. But make libpq compression completely useless (see my explanation above).
We need to use streaming compression, and to be able to use streaming compression I have to pass function for fetching more data to compression library.
### The compression flag has proven somewhat contentious, as you've already seen on this thread. I recommend removing it for now and putting it in a separate patch to be merged later, since it's separable. ### It's not worth flagging style violations in your code right now, but you should be aware that there are quite a few style and whitespace problems. In particular, please be sure that you're using hard tabs when appropriate, and that line lengths are fewer than 80 characters (unless they contain error messages), and that pointers are correctly declared (`void *arg`, not `void* arg`). ###
Ok, I will fix it.
Thanks, --Robbie
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Dent JohnДата:
Сообщение: Re: Query Rewrite for Materialized Views (Postgres Extension)