Обсуждение: Verifying embedded oids in *recv is a bad idea

Поиск
Список
Период
Сортировка

Verifying embedded oids in *recv is a bad idea

От
Andres Freund
Дата:
Hi,

for performance reasons it's a good idea to use the binary protocol. But
doing so between two postgres installations is made unnecessarily hard
by the choice of embedding and verifying oids in composite and array
types.  When using extensions, even commonly used ones like hstore, the
datatype oids will often not be the same between systems, even when
using exactly the same version on the same OS.

Specifically I'm talking about

Datum
array_recv(PG_FUNCTION_ARGS)
{
...element_type = pq_getmsgint(buf, sizeof(Oid));if (element_type != spec_element_type){    /* XXX Can we allow taking
theinput element type in any cases? */    ereport(ERROR,            (errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrongelement type")));}
 
...
}

and

Datum
record_recv(PG_FUNCTION_ARGS)
{
.../* Process each column */for (i = 0; i < ncolumns; i++)
...    /* Verify column datatype */    coltypoid = pq_getmsgint(buf, sizeof(Oid));    if (coltypoid != column_type)
  ereport(ERROR,                (errcode(ERRCODE_DATATYPE_MISMATCH),                 errmsg("wrong data type: %u,
expected%u",                        coltypoid, column_type)));
 
...
}


given that we're giving up quite some speed and adding complexity to
make send/recv functions portable, this seems like a shame.

I'm failing to see what these checks are buying us? I mean the text
representation of a composite doesn't include type information about
contained columns either, I don't see why the binary version has to?

I'm basically thinking that we should remove the checks on the receiving
side, but leave the sending of oids in place for backward compat.

Greetings,

Andres Freund



Re: Verifying embedded oids in *recv is a bad idea

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 25 Apr 2016 17:17:13 -0700, Andres Freund <andres@anarazel.de> wrote in
<20160426001713.hbqdiwvf4mkzkg55@alap3.anarazel.de>
> Hi,
> 
> for performance reasons it's a good idea to use the binary protocol. But
> doing so between two postgres installations is made unnecessarily hard
> by the choice of embedding and verifying oids in composite and array
> types.  When using extensions, even commonly used ones like hstore, the
> datatype oids will often not be the same between systems, even when
> using exactly the same version on the same OS.
> 
> Specifically I'm talking about
> 
> Datum
> array_recv(PG_FUNCTION_ARGS)
> {
> ...
>     element_type = pq_getmsgint(buf, sizeof(Oid));
>     if (element_type != spec_element_type)
>     {
>         /* XXX Can we allow taking the input element type in any cases? */
>         ereport(ERROR,
>                 (errcode(ERRCODE_DATATYPE_MISMATCH),
>                  errmsg("wrong element type")));
>     }
> ...
> }
> 
> and
> 
> Datum
> record_recv(PG_FUNCTION_ARGS)
> {
> ...
>     /* Process each column */
>     for (i = 0; i < ncolumns; i++)
> ...
>         /* Verify column datatype */
>         coltypoid = pq_getmsgint(buf, sizeof(Oid));
>         if (coltypoid != column_type)
>             ereport(ERROR,
>                     (errcode(ERRCODE_DATATYPE_MISMATCH),
>                      errmsg("wrong data type: %u, expected %u",
>                             coltypoid, column_type)));
> ...
> }
> 
> 
> given that we're giving up quite some speed and adding complexity to
> make send/recv functions portable, this seems like a shame.
> 
> I'm failing to see what these checks are buying us? I mean the text
> representation of a composite doesn't include type information about
> contained columns either, I don't see why the binary version has to?
> 
> I'm basically thinking that we should remove the checks on the receiving
> side, but leave the sending of oids in place for backward compat.

FWIW, I think that typsend/typreceive are intended for the use by
COPY FROM/TO and the doc says that the following.

http://www.postgresql.org/docs/devel/static/sql-copy.html

> Binary Format
> 
> The binary format option causes all data to be stored/read as
> binary format rather than as text. It is somewhat faster than the
> text and CSV formats, but a binary-format file is less portable
> across machine architectures and PostgreSQL versions. Also, the
> binary format is very data type specific; for example it will not
> work to output binary data from a smallint column and read it
> into an integer column, even though that would work fine in text
> format.

I haven't considered much about the usefulness of this
restriction, but receiving into an array of a different type
easily leads to a crash.

# However, false matching of type oids also would do so.

I suppose that we should reconsider here if removing the checks.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Verifying embedded oids in *recv is a bad idea

От
Jelte Fennema
Дата:
So I ran into this as well and I think this should actually be considered a bug, since COPY ... BINARY does not work between nodes now.
Which is why I have posted it to the pgsql-bugs list with some reproducible steps to trigger the bug: https://www.postgresql.org/message-id/16485-f7b2dddca52ef2ae%40postgresql.org

Like Andres said, the oids are not checked for the text protocol versions either. And like  Kyotaro says, if the type that has a different OID would actually have a different binary encoding, parsing will simply fail at that point.
I don't think there's any practical reason to keep these checks. They only get in the way of using the more efficient binary encoding for anything other than simple types.
The only case where these *_recv functions would currently work is if you COPY ... BINARY to the exact same postgres instance (when not dropping and recreating types). But in that case there's little reason to use COPY at all.

On Tue, 9 Jun 2020 at 18:33, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

At Mon, 25 Apr 2016 17:17:13 -0700, Andres Freund <andres@anarazel.de> wrote in <20160426001713.hbqdiwvf4mkzkg55@alap3.anarazel.de>
> Hi,
>
> for performance reasons it's a good idea to use the binary protocol. But
> doing so between two postgres installations is made unnecessarily hard
> by the choice of embedding and verifying oids in composite and array
> types.  When using extensions, even commonly used ones like hstore, the
> datatype oids will often not be the same between systems, even when
> using exactly the same version on the same OS.
>
> Specifically I'm talking about
>
> Datum
> array_recv(PG_FUNCTION_ARGS)
> {
> ...
>       element_type = pq_getmsgint(buf, sizeof(Oid));
>       if (element_type != spec_element_type)
>       {
>               /* XXX Can we allow taking the input element type in any cases? */
>               ereport(ERROR,
>                               (errcode(ERRCODE_DATATYPE_MISMATCH),
>                                errmsg("wrong element type")));
>       }
> ...
> }
>
> and
>
> Datum
> record_recv(PG_FUNCTION_ARGS)
> {
> ...
>       /* Process each column */
>       for (i = 0; i < ncolumns; i++)
> ...
>               /* Verify column datatype */
>               coltypoid = pq_getmsgint(buf, sizeof(Oid));
>               if (coltypoid != column_type)
>                       ereport(ERROR,
>                                       (errcode(ERRCODE_DATATYPE_MISMATCH),
>                                        errmsg("wrong data type: %u, expected %u",
>                                                       coltypoid, column_type)));
> ...
> }
>
>
> given that we're giving up quite some speed and adding complexity to
> make send/recv functions portable, this seems like a shame.
>
> I'm failing to see what these checks are buying us? I mean the text
> representation of a composite doesn't include type information about
> contained columns either, I don't see why the binary version has to?
>
> I'm basically thinking that we should remove the checks on the receiving
> side, but leave the sending of oids in place for backward compat.

FWIW, I think that typsend/typreceive are intended for the use by
COPY FROM/TO and the doc says that the following.

http://www.postgresql.org/docs/devel/static/sql-copy.html

> Binary Format
>
> The binary format option causes all data to be stored/read as
> binary format rather than as text. It is somewhat faster than the
> text and CSV formats, but a binary-format file is less portable
> across machine architectures and PostgreSQL versions. Also, the
> binary format is very data type specific; for example it will not
> work to output binary data from a smallint column and read it
> into an integer column, even though that would work fine in text
> format.

I haven't considered much about the usefulness of this
restriction, but receiving into an array of a different type
easily leads to a crash.

# However, false matching of type oids also would do so.

I suppose that we should reconsider here if removing the checks.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers