Re: Re: COPY BINARY file format proposal

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re: COPY BINARY file format proposal
Дата
Msg-id 10336.976334295@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Re: COPY BINARY file format proposal  (Philip Warner <pjw@rhyme.com.au>)
Список pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
> More a matter of not thinking it was important enough to worry about, and
> not really wanting to drag the MD5/MD4/CRC64/etc debate into this one.

I'd just as soon not drag that debate in here either ;-) ... but once we
settle on an appropriate CRC method for WAL it's easy enough to call the
same routine for this code.

> Sounds good to me. I'm not sure you need it on a per-tuple basis - but it
> can't hurt, assuming it's cheap to generate. Does the backend send tuples
> or blocks of tuples? If the latter, and if CRC is expensive, then maybe 1
> CRC for each group of tuples.

Extending the CRC over multiple tuples would just complicate life,
I think.  The per-byte cost is the biggest factor, so you don't really
save all that much.

>> Next 4 bytes: length of remainder of header, not including self.  In
>> the initial version this will be zero, and the first tuple follows
>> immediately.  Future changes to the format might allow additional data
>> to be present in the header.  A reader should silently ignore any header
>> extension data it does not know what to do with.

> Don't you need to at least define how to specify non-essential chunks,
> since the flags are not to be used to describe the header extensions. Or
> are we going to make the initial version barf when it encounters any header
> extension?

No, the initial version will just silently skip the whole header
extension; it's defined so that that's a legal behavior (everything
in the header extension is inessential).  We can come back and define
a format for the entries in the header extension area when we need some.

> Another option would be to:
> - dump the field sizes in the header somewhere (they will all be the same), 
> - for each row output a bitmap of non-null fields, followed by the data.
> - varlena would have a -1 length in the header, an an int32 length in the row.

That would work if you are willing to assume that all the tuples indeed
always have the same set of fields --- you're not, for example, doing an
inheritance-tree-walk "COPY FROM foo*".  But Chris Bitmead still has a
gleam in his eye about that sort of thing, so we might want it someday.
I think it's worth a small amount of extra space to avoid that
assumption, especially since it simplifies the code too.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Philip Warner
Дата:
Сообщение: Re: Re: COPY BINARY file format proposal
Следующее
От: "Jonathan Ellis"
Дата:
Сообщение: Re: [GENERAL] Oracle-compatible lpad/rpad behavior