libpq compression
От | Euler Taveira |
---|---|
Тема | libpq compression |
Дата | |
Msg-id | 4FD9698F.2090407@timbira.com обсуждение исходный текст |
Ответы |
Re: libpq compression
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: libpq compression ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Список | pgsql-hackers |
Hi, There was already some discussion about compressing libpq data [1][2][3]. Recently, I faced a scenario that would become less problematic if we have had compression support. The scenario is frequent data load (aka COPY) over slow/unstable links. It should be executed in a few hundreds of PostgreSQL servers all over Brazil. Someone could argue that I could use ssh tunnel to solve the problem but (i) it is complex because it involves a different port in the firewall and (ii) it's an opportunity to improve other scenarios like reducing bandwidth consumption during replication or normal operation over slow/unstable links. AFAICS there aren't objections about implementing compression in libpq. The problem is what algorithm use for compression. I mean, there is a lot of patents in this area. As others spotted at [4], we should not implement algorithms that possibly infringe patents in core. Derivated products are free to plug whatever algorithms they want. There will be an API to do it. This work will be sponsored by a company that is interested in this feature. === Design === - algorithm: zlib, bzip2, (another patent free and bsd licensed?) - compiled-in option: --with-bzip2 - PGCOMPRESSMODE env * disable: only try non-compressed connection (default) * prefer: try compressed connection; if thatfails, try a non-compressed connection * require: only try compressed connection - PGCOMPRESSALGO env * zlib * bzip2 - compressmode and compressalgo string connection - compress all data - compress before send() and decompress after recv() I am all ears for improving this design. Some of my choices are based on my research in compression at protocols and PostgreSQL internals. Keep in mind that I prefer compressing all data instead of a selected set of messages because (i) every new data message could be coded with compression support and (ii) avoid that the protocol code turns into a spaghetti. I'll try to post a patch soon with the ideas discussed at this thread. [1] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00929.php [2] http://archives.postgresql.org/pgsql-hackers/2011-01/msg00337.php [3] http://archives.postgresql.org/pgsql-hackers/2002-03/msg00664.php [4] http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
В списке pgsql-hackers по дате отправления: