Re: Tweaking bytea / large object block sizes?

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Tweaking bytea / large object block sizes?
Дата
Msg-id 4DF5B52D.9000102@postnewspapers.com.au
обсуждение исходный текст
Ответ на Re: Tweaking bytea / large object block sizes?  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: Tweaking bytea / large object block sizes?
Список pgsql-general
On 13/06/11 09:27, Merlin Moncure wrote:

> want to use the binary protocol mode (especially for postgres versions
> that don't support hex mode)

Allowing myself to get a wee bit sidetracked:

I've been wondering lately why hex was chosen as the new input/output
format when the bytea_output change went in. The Base64 encoding is
trivial to implement, already supported by standard libraries for many
languages and add-ons for the rest, fast to encode/decode, and much more
compact than a hex encoding, so it seems like a more attractive option.
PostgreSQL already supports base64 in explicit 'escape()' calls.

Was concern about input format ambiguity a motivator for avoiding
base64? Checking the archives:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00238.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00192.php

... it was considered but knocked back because it's enough more complex
to encode that it could matter on big dumps and standards-compliant
base64 appears to require newlines - something that was viewed as ugly
and problematic. Initial input format detection reliability options were
also raised, but as the same solution used for hex input would apply to
base64 input too it doesn't look like that was a big factor.

Personally, even with the newline 'ick factor' I think it'd be pretty
nice to have as an option for dumps and COPY.

Ascii85 (base85) would be another alternative. It's used in PostScript
and PDF, but isn't anywhere near as widespread as base64. It's still
trivial to implement and is 7-8% more space-efficient than base64.

After a bit of digging, though, I can't help wonder if a binary dump
format that's machine-representation independent, fast and compact isn't
more practical. Tools like Thrift (http://thrift.apache.org), Protocol
Buffers, etc might make it less painful. Maybe an interesting GsOC
project? Supporting binary COPY with a machine independent format would
be a natural extension of that, too.

--
Craig Ringer

В списке pgsql-general по дате отправления:

Предыдущее
От: Zhidong She
Дата:
Сообщение: psql core dump
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Reinstalling