I/O support for composite types

Поиск
Список
Период
Сортировка
От Tom Lane
Тема I/O support for composite types
Дата
Msg-id 19594.1086454647@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: I/O support for composite types  (Thomas Hallgren <thhal@mailblocks.com>)
Re: I/O support for composite types  (elein <elein@varlena.com>)
Список pgsql-hackers
There's just one thing left to do to make composite types useful as
table columns: we have to support I/O of composite values.  (Without
this, pg_dump would fail to work on such columns, rendering them not
very useful in the real world.)  This means we have to hammer out a
definition for what the external representation is.  Here are my
thoughts on the subject.


Textual representation:

I am inclined to define this similarly to the representation for arrays;
however, we need to allow for NULLs.  I suggest
{item,item,item}

The separator is always comma (it can't be type-specific since the items
might have different types).  Backslashes and double quotes can be used
in the usual ways to quote characters in the item strings.  If an item
string is completely empty it is taken as NULL; to write an actual
empty-string value, you must write "".  There is an ambiguity whether
'{}' represents a zero-column row or a one-column row containing a NULL,
but I don't think this is a problem since the input converter will
always know how many columns it is expecting.

There are a couple of fine points of the array I/O behavior that I think
we should not emulate.  One is that leading whitespace in an item string
is discarded.  This seems inconsistent, mainly because trailing
whitespace isn't discarded.  In the cases where it really makes sense to
discard whitespace (namely numeric datatypes), the underlying datatype's
input converter can do that just fine, and so I suggest that the record
converter itself should not discard whitespace.  It seems OK to ignore
whitespace before and after the outer braces, however.

The other fine point has to do with double quoting.  In the array code,{a"b""c"d}
is legal input representing an item 'abcd'.  I think it would be more
consistent with usual SQL conventions to treat it as meaning 'ab"cd',
that is a doubled double quote within double quotes should represent a
double quote not nothing.  Anyone have a strong feeling one way or the
other?

(In the long run we might want to think about making these same changes
in array_in, but that's a can of worms I don't wish to open today.)


Binary representation:

This seems relatively easy.  I propose we send number of fields (int4)
followed by, for each field: type oid (sizeof(Oid)), data length (int4),
data according to the binary representation of the field datatype.
The field count and type oids are not strictly necessary but seem like
a good idea for error-checking purposes.


Infrastructure changes:

record_out/record_send can extract the needed type info right from the
Datum, but record_in/record_recv really need to be told what data type
to expect, and the current call conventions for input converters don't
pass them any useful information.  I propose that we adjust the present
definitions so that the second argument passed to I/O conversion
routines, rather than being always pg_type.typelem, is defined as
"if pg_type.typtype is 'c' then pg_type.oid else pg_type.typelem".
That is, for composite types we'll pass the type's own OID in place of
typelem.

This does not affect I/O routines for user-defined types, since there
are no user-defined I/O routines for composite types.  It could break
any user-written code that calls I/O routines, if it's been hard-wired
to pass typelem instead of using one of the support routines like
getTypeInputInfo() or get_type_io_data() to collect the parameters to
pass.  By my count there are about a dozen places in the backend code
that will need to be fixed to use one of these routines instead of
having a hard-wired typelem reference.

An alternative definition that might be more useful in the long run is
to define the second parameter as
"if pg_type.typelem is not zero then pg_type.typelem else pg_type.oid".
In other words, for everything *except* arrays we'd pass the type OID.
This would allow I/O routines to be written to support multiple
datatypes.  However there seems a larger chance of breaking things if
we do this, and I'm also fuzzy on which OID to pass for domain types.
So I'm inclined to keep it conservative for now, and change the
behavior only for composite types.


Comments, objections, better ideas?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: [pgsql-advocacy] Not 7.5, but 8.0 ?
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Official Freeze Date for 7.5: July 1st, 2004