libpq compression

Поиск
Список
Период
Сортировка
От Euler Taveira
Тема libpq compression
Дата
Msg-id 4FD9698F.2090407@timbira.com
обсуждение исходный текст
Ответы Re: libpq compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: libpq compression  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Список pgsql-hackers
Hi,

There was already some discussion about compressing libpq data [1][2][3].
Recently, I faced a scenario that would become less problematic if we have had
compression support. The scenario is frequent data load (aka COPY) over
slow/unstable links. It should be executed in a few hundreds of PostgreSQL
servers all over Brazil. Someone could argue that I could use ssh tunnel to
solve the problem but (i) it is complex because it involves a different port
in the firewall and (ii) it's an opportunity to improve other scenarios like
reducing bandwidth consumption during replication or normal operation over
slow/unstable links.

AFAICS there aren't objections about implementing compression in libpq. The
problem is what algorithm use for compression. I mean, there is a lot of
patents in this area. As others spotted at [4], we should not implement
algorithms that possibly infringe patents in core. Derivated products are free
to plug whatever algorithms they want. There will be an API to do it.

This work will be sponsored by a company that is interested in this feature.

=== Design ===

- algorithm: zlib, bzip2, (another patent free and bsd licensed?)
- compiled-in option: --with-bzip2
- PGCOMPRESSMODE env * disable: only try non-compressed connection (default) * prefer: try compressed connection; if
thatfails, try a non-compressed
 
connection * require: only try compressed connection
- PGCOMPRESSALGO env * zlib * bzip2
- compressmode and compressalgo string connection
- compress all data
- compress before send() and decompress after recv()

I am all ears for improving this design. Some of my choices are based on my
research in compression at protocols and PostgreSQL internals.

Keep in mind that I prefer compressing all data instead of a selected set of
messages because (i) every new data message could be coded with compression
support and (ii) avoid that the protocol code turns into a spaghetti.

I'll try to post a patch soon with the ideas discussed at this thread.


[1] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00929.php
[2] http://archives.postgresql.org/pgsql-hackers/2011-01/msg00337.php
[3] http://archives.postgresql.org/pgsql-hackers/2002-03/msg00664.php
[4] http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php


--   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Ability to listen on two unix sockets
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Ability to listen on two unix sockets