Re: Re: COPY BINARY file format proposal

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re: COPY BINARY file format proposal
Дата
Msg-id 13671.976217308@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Re: COPY BINARY file format proposal  (Philip Warner <pjw@rhyme.com.au>)
Ответы Re: Re: COPY BINARY file format proposal  (Philip Warner <pjw@rhyme.com.au>)
Re: Re: COPY BINARY file format proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Re: COPY BINARY file format proposal  (ncm@zembu.com (Nathan Myers))
Список pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
>> Just thinking that the only way an endianness flag inside the header
>> would be useful is if we pick a magic number that's a bytewise
>> palindrome.

> You could just read the 1st, 2nd, 3rd, etc bytes and require that they be
> 'P', 'G', 'C', 'P', 'Y' or some such. I *think* reading five bytes and
> doing a strcmp works...ie. don't rely on the integer value, use a string.

Oh.  We could use a string instead of an integer, I suppose, although
I'm not sure I see the point for what's basically a binary format.

Given all that, here is a proposed spec for the header:

First 8 bytes: signature, ASCII "PGBCOPY\0" --- note that the null is a
required part of the signature.  (This is to catch files that have been
munged by a non-8-bit-clean transfer.)

Next 4 bytes: integer layout field.  This consists of the int32 constant
0x0A820D0A expressed in the source machine's endianness.  (Again, value
chosen with malice aforethought, to catch files munged by things like
DOS/Unix newline conversion or high-bit-stripping.)  Potentially, a
reader could engage in byte-flipping of subsequent fields if the wrong
byte order is detected here.

Next 4 bytes: version number, currently 1 (expressed in source machine's
endianness, as are all subsequent integer fields).  A reader should
abort if it does not recognize the version number.

Next 4 bytes: length of remainder of header, not including self.  In
the initial version this will be zero, and the first tuple follows
immediately.  Future changes to the format might allow additional data
to be present in the header.  A reader should silently ignore any header
extension data it does not know what to do with.

This allows for both backwards-compatible header additions (extend the
header without changing the version number) and non-backwards-compatible
changes (bump the version number).

Since we don't yet know what we might do about the issue of
floating-point format, I left that out of the spec.  It can be added to
the header extension area when and if we figure out how to do it.

Likewise, addons such as column names are also punted until later.

Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Switch pg_ctl's default about waiting?
Следующее
От: The Hermit Hacker
Дата:
Сообщение: v7.1 beta 1 ...packaged, finally ...