Обсуждение: 7.4 COPY BINARY Format Change

Поиск
Список
Период
Сортировка

7.4 COPY BINARY Format Change

От
Lee Kindness
Дата:
Guys,

I've been testing 7.4 against an application today. Recompiled
everything against the new libraries. However the application includes
code which builds up a bulkload file in the documented 7.1 format. The
documentation for COPY goes on at length about the format being
forward compatible...

However of course it's changed in 7.4 for the following minor reasons:

1. Header is now PGCOPY\n\377\r\n\0 rather than PGBCOPY\n\377\r\n\0

2. Integers within the header (but not the data itself) are now in
network order, rather than native order

3. The "integer layout field" has disappeared.

4. typlen is now an int32 rather than an int16 plus an additional
int32 if a varlen type.

I've attached a patch which lets COPY read in the 7.1 format. However
i'm not convinced this is the right way to go - I think the format
which is output by 7.4 should be identical to the 7.1 format. The
"integer layout field" could be used to determine if byteswapping is
required on reading. The other changes seem to be unnecessary? If the
typlen change is kept it should be flagged in the flags field rather
than requiring a completely new format - this would allow old readers
to gracefully fail and old & new files to be read in by 7.4...

It's extremely counter-productive to break backward compatibility for
such whimsical changes! This will hurt those updating to 7.4 once it
is released...

So, I expect the patch below to be rejected - I'll happily rework the
patch to revert 7.4 to a version of the 7.1 format which results in
the same feature gain but without forfeiting backward
compatibility. Let me know.

Thanks. Lee.

Index: src/backend/commands/copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.205
diff -c -b -r1.205 copy.c
*** src/backend/commands/copy.c    1 Aug 2003 00:15:19 -0000    1.205
--- src/backend/commands/copy.c    1 Aug 2003 14:53:35 -0000
***************
*** 91,102 ****
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
                       char *delim, char *null_print);
  static char *CopyReadAttribute(const char *delim, CopyReadResult *result);
! static Datum CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo,
!                                      Oid typelem, bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);

! static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";

  /*
   * Static communication variables ... pretty grotty, but COPY has
--- 91,107 ----
  static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
                       char *delim, char *null_print);
  static char *CopyReadAttribute(const char *delim, CopyReadResult *result);
! static Datum CopyReadBinaryAttribute(int version, int column_no,
!                      FmgrInfo *flinfo, Oid typelem,
!                      bool *isnull);
  static void CopyAttributeOut(char *string, char *delim);
  static List *CopyGetAttnums(Relation rel, List *attnamelist);

! static const char BinarySignature74[11] = "PGCOPY\n\377\r\n\0";
! static const char BinarySignature71[12] = "PGBCOPY\n\377\r\n\0";
! #define BINARY_FMT_74x 740
! #define BINARY_FMT_71x 710
! #define BINARY_FMT_CUR BINARY_FMT_74x

  /*
   * Static communication variables ... pretty grotty, but COPY has
***************
*** 140,148 ****
  static int    CopyPeekChar(void);
  static void CopyDonePeek(int c, bool pickup);
  static void CopySendInt32(int32 val);
! static int32 CopyGetInt32(void);
  static void CopySendInt16(int16 val);
! static int16 CopyGetInt16(void);

  /*
   * Send copy start/stop messages for frontend copies.  These have changed
--- 145,153 ----
  static int    CopyPeekChar(void);
  static void CopyDonePeek(int c, bool pickup);
  static void CopySendInt32(int32 val);
! static int32 CopyGetInt32(int version);
  static void CopySendInt16(int16 val);
! static int16 CopyGetInt16(int version);

  /*
   * Send copy start/stop messages for frontend copies.  These have changed
***************
*** 571,581 ****
   * CopyGetInt32 reads an int32 that appears in network byte order
   */
  static int32
! CopyGetInt32(void)
  {
      uint32        buf;

      CopyGetData(&buf, sizeof(buf));
      return (int32) ntohl(buf);
  }

--- 576,589 ----
   * CopyGetInt32 reads an int32 that appears in network byte order
   */
  static int32
! CopyGetInt32(int version)
  {
      uint32        buf;

      CopyGetData(&buf, sizeof(buf));
+     if( version == BINARY_FMT_71x )
+       return buf;
+     else
        return (int32) ntohl(buf);
  }

***************
*** 595,605 ****
   * CopyGetInt16 reads an int16 that appears in network byte order
   */
  static int16
! CopyGetInt16(void)
  {
      uint16        buf;

      CopyGetData(&buf, sizeof(buf));
      return (int16) ntohs(buf);
  }

--- 603,617 ----
   * CopyGetInt16 reads an int16 that appears in network byte order
   */
  static int16
! CopyGetInt16(int version)
  {
      uint16        buf;

      CopyGetData(&buf, sizeof(buf));
+
+     if( version == BINARY_FMT_71x )
+       return buf;
+     else
        return (int16) ntohs(buf);
  }

***************
*** 972,978 ****
          int32        tmp;

          /* Signature */
!         CopySendData((char *) BinarySignature, 11);
          /* Flags field */
          tmp = 0;
          if (oids)
--- 984,990 ----
          int32        tmp;

          /* Signature */
!         CopySendData((char *) BinarySignature74, 11);
          /* Flags field */
          tmp = 0;
          if (oids)
***************
*** 1136,1141 ****
--- 1148,1154 ----
      ExprContext *econtext;        /* used for ExecEvalExpr for default atts */
      MemoryContext oldcontext = CurrentMemoryContext;
      ErrorContextCallback errcontext;
+     int             version = BINARY_FMT_CUR;

      tupDesc = RelationGetDescr(rel);
      attr = tupDesc->attrs;
***************
*** 1261,1272 ****

          /* Signature */
          CopyGetData(readSig, 11);
!         if (CopyGetEof() || memcmp(readSig, BinarySignature, 11) != 0)
              ereport(ERROR,
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                       errmsg("COPY file signature not recognized")));
          /* Flags field */
!         tmp = CopyGetInt32();
          if (CopyGetEof())
              ereport(ERROR,
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
--- 1274,1311 ----

          /* Signature */
          CopyGetData(readSig, 11);
!         if( CopyGetEof() )
!           ereport(ERROR,
!               (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                errmsg("invalid COPY file header (missing signature)")));
!         if( memcmp(readSig, BinarySignature74, 11) == 0 )
!           version = BINARY_FMT_74x; /* PostgreSQL 7.4 format */
!         else if( memcmp(readSig, BinarySignature71, 11) == 0 )
!           {
!             version = BINARY_FMT_71x; /* PostgreSQL 7.1 format */
!             ereport(WARNING,
!                 (errmsg("Old version 7.1 binary COPY file")));
!             CopyGetData(readSig, 1); /* the 12th byte of the header */
!             if( CopyGetEof() )
!               ereport(ERROR,
!                   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                    errmsg("invalid COPY file header (signature too short)")));
!           }
!         else
            ereport(ERROR,
                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                 errmsg("COPY file signature not recognized")));
+         /* integer layout field */
+         if( version == BINARY_FMT_71x )
+           {
+             tmp = CopyGetInt32(version);
+             if( CopyGetEof() )
+               ereport(ERROR,
+                   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                    errmsg("invalid COPY file header (missing integer layout field)")));
+           }
          /* Flags field */
!         tmp = CopyGetInt32(version);
          if (CopyGetEof())
              ereport(ERROR,
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
***************
*** 1278,1284 ****
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                       errmsg("unrecognized critical flags in COPY file header")));
          /* Header extension length */
!         tmp = CopyGetInt32();
          if (CopyGetEof() || tmp < 0)
              ereport(ERROR,
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
--- 1317,1323 ----
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                       errmsg("unrecognized critical flags in COPY file header")));
          /* Header extension length */
!         tmp = CopyGetInt32(version);
          if (CopyGetEof() || tmp < 0)
              ereport(ERROR,
                      (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
***************
*** 1458,1464 ****
              /* binary */
              int16        fld_count;

!             fld_count = CopyGetInt16();
              if (CopyGetEof() || fld_count == -1)
              {
                  done = true;
--- 1497,1503 ----
              /* binary */
              int16        fld_count;

!             fld_count = CopyGetInt16(version);
              if (CopyGetEof() || fld_count == -1)
              {
                  done = true;
***************
*** 1474,1480 ****
              if (file_has_oids)
              {
                  loaded_oid =
!                     DatumGetObjectId(CopyReadBinaryAttribute(0,
                                                               &oid_in_function,
                                                               oid_in_element,
                                                               &isnull));
--- 1513,1519 ----
              if (file_has_oids)
              {
                  loaded_oid =
!                     DatumGetObjectId(CopyReadBinaryAttribute(version, 0,
                                                               &oid_in_function,
                                                               oid_in_element,
                                                               &isnull));
***************
*** 1491,1497 ****
                  int            m = attnum - 1;

                  i++;
!                 values[m] = CopyReadBinaryAttribute(i,
                                                      &in_functions[m],
                                                      elements[m],
                                                      &isnull);
--- 1530,1536 ----
                  int            m = attnum - 1;

                  i++;
!                 values[m] = CopyReadBinaryAttribute(version, i,
                                      &in_functions[m],
                                      elements[m],
                                      &isnull);
***************
*** 1879,1895 ****
   * Read a binary attribute
   */
  static Datum
! CopyReadBinaryAttribute(int column_no, FmgrInfo *flinfo, Oid typelem,
!                         bool *isnull)
  {
      int32        fld_size;
      Datum        result;

!     fld_size = CopyGetInt32();
      if (CopyGetEof())
          ereport(ERROR,
                  (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                   errmsg("unexpected EOF in COPY data")));
      if (fld_size == -1)
      {
          *isnull = true;
--- 1918,1957 ----
   * Read a binary attribute
   */
  static Datum
! CopyReadBinaryAttribute(int version, int column_no, FmgrInfo *flinfo,
!             Oid typelem, bool *isnull)
  {
      int32        fld_size;
      Datum        result;

!     if( version == BINARY_FMT_71x )
!       {
!         int16 tmp_fld_size;
!         tmp_fld_size = CopyGetInt16(version);
!         if( CopyGetEof() )
!           ereport(ERROR,
!               (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                errmsg("unexpected EOF in COPY data")));
!         if( tmp_fld_size == -1 )
!           {
!         fld_size = CopyGetInt32(version);
!         if( CopyGetEof() )
!           ereport(ERROR,
!               (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
!                errmsg("unexpected EOF in COPY data")));
!         fld_size -= 4; /* includes sizeof these 4 bytes too */
!           }
!         else
!           fld_size = tmp_fld_size;
!       }
!     else
!       {
!         fld_size = CopyGetInt32(version);
          if (CopyGetEof())
            ereport(ERROR,
                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                 errmsg("unexpected EOF in COPY data")));
+       }
      if (fld_size == -1)
      {
          *isnull = true;

Re: 7.4 COPY BINARY Format Change

От
Tom Lane
Дата:
Lee Kindness <lkindness@csl.co.uk> writes:
> I've attached a patch which lets COPY read in the 7.1 format. However
> i'm not convinced this is the right way to go - I think the format
> which is output by 7.4 should be identical to the 7.1 format.

You are greatly underestimating the changes that occurred in COPY BINARY.
If the format difference had been as minor as you think, I would not
have gratuitously broken compatibility.

The real change that occurred here is that the individual data fields
go through per-datatype send/receive routines, which in addition to
implementing a mostly machine-independent binary format also provide
defenses against bad input data.

To continue to read the old COPY BINARY format, we'd have to bypass
those routines and allow direct read of the internal data formats.
This was a security risk before and would be a much bigger one now,
seeing that we allow COPY BINARY FROM STDIN to unprivileged users.  It
is trivial to crash the backend by feeding it bad internal-format data.

(I don't believe that the patch works anyway, given that you aren't doing
anything to disable use of the per-datatype receive routine.  It might
work as-is for text fields, and for integers on bigendian machines, but
not for much else.)

We are not going back to the pre-7.4 format.  Sorry.

            regards, tom lane

Re: 7.4 COPY BINARY Format Change

От
Lee Kindness
Дата:
Tom,

Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > I've attached a patch which lets COPY read in the 7.1
format.However> > i'm not convinced this is the right way to go - I think the format> > which is output by 7.4 should
beidentical to the 7.1 format.> > You are greatly underestimating the changes that occurred in COPY BINARY.> If the
formatdifference had been as minor as you think, I would not> have gratuitously broken compatibility.> > The real
changethat occurred here is that the individual data fields> go through per-datatype send/receive routines, which in
additionto> implementing a mostly machine-independent binary format also provide> defenses against bad input data.> >
Tocontinue to read the old COPY BINARY format, we'd have to bypass> those routines and allow direct read of the
internaldata formats.> This was a security risk before and would be a much bigger one now,> seeing that we allow COPY
BINARYFROM STDIN to unprivileged users.  It> is trivial to crash the backend by feeding it bad internal-format> data.
 

Well in that case the docs need attention. They describe the
"envelope" surrounding the tuples, but no mention is made of the
format they are in. It is reasonable to assume that this format was
the native binary format, as in earlier releases.

I've got applications which create binary "bulkload" files which are
loaded into the database using COPY FROM. Currently they write the
data out using simple fwrite calls. What do I need to do to make this
code work with 7.4? Is there any docs describing the "binary" format
for each of the datatypes or do I need to reverse-engineer a dump file
or look in the source? Are the routines in libpq/pqformat.c intended
to be used by client applications to read/write the binary COPY files?
If so they also need documented in the libpq docs and that
documentation linked to from the COPY docs.
> (I don't believe that the patch works anyway, given that you aren't doing> anything to disable use of the
per-datatypereceive routine.  It might> work as-is for text fields, and for integers on bigendian machines, but> not
formuch else.)
 

Yeah, I didn't spend a lot of effort in that respect - after all I
said myself I didn't see the patch being accepted...
> We are not going back to the pre-7.4 format.  Sorry.

Well as pointed out in my earlier message nothing has changed which
requires the format to change - there is no real reason it's now
"PGCOPY" and the integer layout field has disappeared. The change for
the byte swapping should have been indicated by an entry in the flags
field.

I am still willing to make a patch which does this (to aid those
writing COPY format files) and to fully support the reading of the old
format tuples. However i'm not going to waste both our time if this
patch is not going to be positively considered...

I think it's worthwhile reiterating that this change will be a real
pain for PostgreSQL users when migrating to 7.4. To be honest i'd
probably stick with 7.3 until the subsequent major release. Have a
think what benefit this incompatibility gives users of COPY
BINARY... I can't think of much use of byte swapping when 99% of the
use of COPY BINARY FROM is to improve performance over using
INSERT. Both the reader and writer will be using the same binary
integer/float/etc formats!

So, will I look at implementing these changes? Or not?

L.


Re: 7.4 COPY BINARY Format Change

От
Tom Lane
Дата:
Lee Kindness <lkindness@csl.co.uk> writes:
>>> The real change that occurred here is that the individual data fields
>>> go through per-datatype send/receive routines, which in addition to
>>> implementing a mostly machine-independent binary format also provide
>>> defenses against bad input data.

> Well in that case the docs need attention. They describe the
> "envelope" surrounding the tuples, but no mention is made of the
> format they are in. It is reasonable to assume that this format was
> the native binary format, as in earlier releases.

Yeah, there should be some mention of that in the COPY ref page I guess
--- it's mentioned in the frontend protocol chapter, but not under COPY.
In my defense I'd point out that the contents of individual fields have
never been documented under COPY.

> What do I need to do to make this
> code work with 7.4? Is there any docs describing the "binary" format
> for each of the datatypes or do I need to reverse-engineer a dump file
> or look in the source?

ATM, I'd recommend looking in the sources to see what the datatype
send/receive routines do.

I have been thinking about documenting the binary formats during beta,
but am unsure where to put the info.  We never documented the internal
formats before either, so there's no obvious place.

> Are the routines in libpq/pqformat.c intended
> to be used by client applications to read/write the binary COPY files?

They are not designed to be used outside the backend environment,
although possibly some enterprising person could adapt them.  I am not
sure there's any value in it though.  Copying the backend code helps
only if what you want to get out of the transmission is the same as the
backend's internal format, which for anything more complex than
int/float/text seems a bit dubious.

>>> We are not going back to the pre-7.4 format.  Sorry.

> Well as pointed out in my earlier message nothing has changed which
> requires the format to change - there is no real reason it's now
> "PGCOPY" and the integer layout field has disappeared.

Given that the interpretation of the field contents has changed
drastically, I thought it better to make an obvious incompatible
change.  We could perhaps have kept the skeleton the same, but to
what end?  An app trying to read or write the file as if it were
pre-7.4 data would fail miserably anyway.

> I am still willing to make a patch which does this (to aid those
> writing COPY format files) and to fully support the reading of the old
> format tuples. However i'm not going to waste both our time if this
> patch is not going to be positively considered...

My vote will be to reject it because of the security problem.

> I can't think of much use of byte swapping when 99% of the
> use of COPY BINARY FROM is to improve performance over using
> INSERT. Both the reader and writer will be using the same binary
> integer/float/etc formats!

You must think that the universe consists exclusively of Intel hardware.
In my view, standardizing on a machine-independent binary format will
greatly *expand* the usefulness of COPY BINARY, since the files will not
be tied to a single architecture.
        regards, tom lane


Re: 7.4 COPY BINARY Format Change

От
Lee Kindness
Дата:
Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > Well in that case the docs need attention. They
describethe> > "envelope" surrounding the tuples, but no mention is made of the> > format they are in. It is reasonable
toassume that this format was> > the native binary format, as in earlier releases.> Yeah, there should be some mention
ofthat in the COPY ref page I guess> --- it's mentioned in the frontend protocol chapter, but not under COPY.> In my
defenseI'd point out that the contents of individual fields have> never been documented under COPY.
 

True, the docs have always skipped the specifics for the
tuples. But now that the format has evolved beyond a simple dump of
the bytes the tuple format does need discussing.
> > What do I need to do to make this> > code work with 7.4? Is there any docs describing the "binary" format> > for
eachof the datatypes or do I need to reverse-engineer a dump file> > or look in the source?> ATM, I'd recommend looking
inthe sources to see what the datatype> send/receive routines do.> > I have been thinking about documenting the binary
formatsduring beta,> but am unsure where to put the info.  We never documented the internal> formats before either, so
there'sno obvious place.
 

Perhaps the documentation of the binary format should be taken out of
the COPY docs and moved into the client interfaces documentation? the
COPY docs would of course reference the new location. Just now the
tuples could be "documented" simply by referring the reader to the
relevant functions in the relevant source files. After all the source
is the best documentation for this sort of thing.
> > Are the routines in libpq/pqformat.c intended> > to be used by client applications to read/write the binary COPY
files?>They are not designed to be used outside the backend environment,> although possibly some enterprising person
couldadapt them.  I am not> sure there's any value in it though.  Copying the backend code helps> only if what you want
toget out of the transmission is the same as the> backend's internal format, which for anything more complex than>
int/float/textseems a bit dubious.
 

I think there is a lot of use for a binary COPY file API within libpq
- routines to open a file, write/read a header and write/read common
datatypes. This would remove the need for most people using the binary
version of COPY to even know the file format. This would also isolate
people who use this API from any future changes.

Would libpq or contrib be the best place for this? Would you agree
this is a good idea for 7.4? I've already got something along these
lines:
extern FILE *lofsdb_Bulk_Open(char **filename);extern void  lofsdb_Bulk_Close(FILE *f, char *filename);extern void
lofsdb_Bulk_Write_NCols(FILE*f, short ncols);extern void  lofsdb_Bulk_Write(FILE *f, void *data, size_t sz, size_t
count,short ind);extern void  lofsdb_Bulk_WriteText(FILE *f, char *data, short ind);extern void
lofsdb_Bulk_WriteBytea(FILE*f, char *data, size_t len, short ind);extern void  lofsdb_Bulk_WriteTime(FILE *f, double t,
shortind);extern void  lofsdb_Bulk_WriteTimeNow(FILE *f);
 

which could form the basis of a contrib module to handle writing out
7.1 through to 7.4 format files. Naturally lofsdb_Bulk_Write needs to
go and be replaced by specific functions.
> > Well as pointed out in my earlier message nothing has changed which> > requires the format to change - there is no
realreason it's now> > "PGCOPY" and the integer layout field has disappeared.> Given that the interpretation of the
fieldcontents has changed> drastically, I thought it better to make an obvious incompatible> change.  We could perhaps
havekept the skeleton the same, but to> what end?  An app trying to read or write the file as if it were> pre-7.4 data
wouldfail miserably anyway.
 

Yeah, but someone (actually you!) went to the effort of making the 7.1
format extensible and documenting it as such... It could have handled
the changes.
> > I am still willing to make a patch which does this (to aid those> > writing COPY format files) and to fully support
thereading of the old> > format tuples. However i'm not going to waste both our time if this> > patch is not going to
bepositively considered...> My vote will be to reject it because of the security problem.
 

In which case I think my time would be better spent looking at the API
described above.
> > I can't think of much use of byte swapping when 99% of the> > use of COPY BINARY FROM is to improve performance
overusing> > INSERT. Both the reader and writer will be using the same binary> > integer/float/etc formats!> You must
thinkthat the universe consists exclusively of Intel hardware.> In my view, standardizing on a machine-independent
binaryformat will> greatly *expand* the usefulness of COPY BINARY, since the files will not> be tied to a single
architecture.

Well my testing (or lack of) of the earlier patch would seem to
indicate it was done on non-Intel box (Solaris)! I've got access here
to Solaris (2.5 through to 9), AIX (4.1 to 4.3.3), HPUX (9, 10, 11)
and of course Linux flavours - our apps run on these UNIX versions. So
i'm well aware of binary format issues (for fun look into the SEG-D
and SEG-Y formats used within the seismic industry).

However, is COPY BINARY meant/designed to be used as transfer or
backup mechanism? I have trouble coming up with many uses where a
binary file generated on one server would be loaded into another
server running on a different architecture.

Regards, Lee.


Re: 7.4 COPY BINARY Format Change

От
Tom Lane
Дата:
Lee Kindness <lkindness@csl.co.uk> writes:
> However, is COPY BINARY meant/designed to be used as transfer or
> backup mechanism?

I think you're overlooking a key consideration: COPY BINARY is not
an isolated feature anymore.  By design it uses the same data
representations as are used for binary query parameters and results
in the rest of the 7.4 FE/BE protocol.

I could see some value in providing byte-swapping routines in libpq
to convert between local and network representations for integers and
floats.  The underlying facilities (ntohl etc) are readily available,
of course, but it's a small matter that is easy to get wrong.

I'm not sure it's worth packaging up COPY BINARY logic per se.  I think
you'd end up with an API not materially simpler than dealing with the
format directly.  And I'm unconvinced it'd actually be used widely,
whereas I do expect binary transfer of individual values to be common.
        regards, tom lane


Re: 7.4 COPY BINARY Format Change

От
Lee Kindness
Дата:
Tom,

Tom Lane writes:> Lee Kindness <lkindness@csl.co.uk> writes:> > However, is COPY BINARY meant/designed to be used as
transferor> > backup mechanism?> > I think you're overlooking a key consideration: COPY BINARY is not> an isolated
featureanymore.  By design it uses the same data> representations as are used for binary query parameters and results>
inthe rest of the 7.4 FE/BE protocol.
 

Yeah, what i've overlooked is that an implementation detail now forms
part of an external interface into PostgreSQL - this is a major
change.
> I could see some value in providing byte-swapping routines in libpq> to convert between local and network
representationsfor integers and> floats.  The underlying facilities (ntohl etc) are readily available,> of course, but
it'sa small matter that is easy to get wrong.>> I'm not sure it's worth packaging up COPY BINARY logic per se.  I
think>you'd end up with an API not materially simpler than dealing with the> format directly.  And I'm unconvinced it'd
actuallybe used widely,> whereas I do expect binary transfer of individual values to be common.
 

Would I be right is guessing a binary CURSOR would return in values in
the same format as a binary COPY, hence your expectation of more
individual transfers/conversions? Actually with the new FE/BE protocol
there is little call for the binary cursor now, yeah?

What I proposed in my email yesterday is really just completing the
new functions (PQnfields, PQputCopyData, PQgetCopyData and friends)
described at:
http://developer.postgresql.org/docs/postgres/libpq-copy.html

so they don't stop at just giving you a blob of binary data and saying
it has n fields - functions would be available to iterate over the
fields and get the data out in a format which is immediately
useful. Without this do you not think PQgetCopyData is of limited use
except for being used by psql (which I guess isn't using it yet). Same
for the writing functions.

This is slightly different from my earlier example (on the connection
rather than file-based) but functionally similar.

BTW, do you have any examples of using PQgetCopyData - none in the
source and can't find anything with Google.

Regards, Lee.


Re: 7.4 COPY BINARY Format Change

От
Tom Lane
Дата:
Lee Kindness <lkindness@csl.co.uk> writes:
> Would I be right is guessing a binary CURSOR would return in values in
> the same format as a binary COPY, hence your expectation of more
> individual transfers/conversions? Actually with the new FE/BE protocol
> there is little call for the binary cursor now, yeah?

Binary cursors per se are obsolete --- you can get the result of any
query in binary form, if you ask politely.  And you can send data in
binary form, too, using parameters.  I have not gotten around to
benchmarking a prepared INSERT with binary parameters against a binary
COPY, but I expect the differential is not nearly as bad as it is for a
source-form INSERT.  It could well be that binary COPY is obsolete for
the purposes you're using it for.

> so they don't stop at just giving you a blob of binary data and saying
> it has n fields - functions would be available to iterate over the
> fields and get the data out in a format which is immediately
> useful.

You can iterate over the fields of a text COPY and get the data out,
too, if you think it useful.  I haven't seen a huge call for support
for that, and so I don't believe it's important for binary COPY either.
I do see a need for converting individual binary data values in the
context of query parameters and results.
        regards, tom lane


Re: 7.4 COPY BINARY Format Change

От
Lee Kindness
Дата:
I've just sent off patches to pgsql-patches to:

1. Slight clarification to the COPY BINARY format docs

2. A contrib/binarycopy module which wraps-up the detail of creating a
file which can be used as input to COPY BINARY. User can create either
7.1 or 7.4 format files using the same API, without needing to know
the file format, without needing to know the individual binary
format of each field and without needing to explicitly byte-swap.

#2 will be used extensively within Concept Systems code which
interfaces to PostgreSQL. It really simplifies the creation of the
binary files.

Thanks, Lee.

Lee Kindness writes:> Tom Lane writes:>  > Lee Kindness <lkindness@csl.co.uk> writes:>  > > Well in that case the docs
needattention. They describe the>  > > "envelope" surrounding the tuples, but no mention is made of the>  > > format
theyare in. It is reasonable to assume that this format was>  > > the native binary format, as in earlier releases.>  >
Yeah,there should be some mention of that in the COPY ref page I guess>  > --- it's mentioned in the frontend protocol
chapter,but not under COPY.>  > In my defense I'd point out that the contents of individual fields have>  > never been
documentedunder COPY.> > True, the docs have always skipped the specifics for the> tuples. But now that the format has
evolvedbeyond a simple dump of> the bytes the tuple format does need discussing.> >  > > What do I need to do to make
this> > > code work with 7.4? Is there any docs describing the "binary" format>  > > for each of the datatypes or do I
needto reverse-engineer a dump file>  > > or look in the source?>  > ATM, I'd recommend looking in the sources to see
whatthe datatype>  > send/receive routines do.>  > >  > I have been thinking about documenting the binary formats
duringbeta,>  > but am unsure where to put the info.  We never documented the internal>  > formats before either, so
there'sno obvious place.> > Perhaps the documentation of the binary format should be taken out of> the COPY docs and
movedinto the client interfaces documentation? the> COPY docs would of course reference the new location. Just now the>
tuplescould be "documented" simply by referring the reader to the> relevant functions in the relevant source files.
Afterall the source> is the best documentation for this sort of thing.> >  > > Are the routines in libpq/pqformat.c
intended> > > to be used by client applications to read/write the binary COPY files?>  > They are not designed to be
usedoutside the backend environment,>  > although possibly some enterprising person could adapt them.  I am not>  >
surethere's any value in it though.  Copying the backend code helps>  > only if what you want to get out of the
transmissionis the same as the>  > backend's internal format, which for anything more complex than>  > int/float/text
seemsa bit dubious.> > I think there is a lot of use for a binary COPY file API within libpq> - routines to open a
file,write/read a header and write/read common> datatypes. This would remove the need for most people using the binary>
versionof COPY to even know the file format. This would also isolate> people who use this API from any future changes.>
>Would libpq or contrib be the best place for this? Would you agree> this is a good idea for 7.4? I've already got
somethingalong these> lines:> >  extern FILE *lofsdb_Bulk_Open(char **filename);>  extern void  lofsdb_Bulk_Close(FILE
*f,char *filename);>  extern void  lofsdb_Bulk_Write_NCols(FILE *f, short ncols);>  extern void  lofsdb_Bulk_Write(FILE
*f,void *data, size_t sz, size_t count, short ind);>  extern void  lofsdb_Bulk_WriteText(FILE *f, char *data, short
ind);> extern void  lofsdb_Bulk_WriteBytea(FILE *f, char *data, size_t len, short ind);>  extern void
lofsdb_Bulk_WriteTime(FILE*f, double t, short ind);>  extern void  lofsdb_Bulk_WriteTimeNow(FILE *f);> > which could
formthe basis of a contrib module to handle writing out> 7.1 through to 7.4 format files. Naturally lofsdb_Bulk_Write
needsto> go and be replaced by specific functions.> >  > > Well as pointed out in my earlier message nothing has
changedwhich>  > > requires the format to change - there is no real reason it's now>  > > "PGCOPY" and the integer
layoutfield has disappeared.>  > Given that the interpretation of the field contents has changed>  > drastically, I
thoughtit better to make an obvious incompatible>  > change.  We could perhaps have kept the skeleton the same, but to>
> what end?  An app trying to read or write the file as if it were>  > pre-7.4 data would fail miserably anyway.> >
Yeah,but someone (actually you!) went to the effort of making the 7.1> format extensible and documenting it as such...
Itcould have handled> the changes.> >  > > I am still willing to make a patch which does this (to aid those>  > >
writingCOPY format files) and to fully support the reading of the old>  > > format tuples. However i'm not going to
wasteboth our time if this>  > > patch is not going to be positively considered...>  > My vote will be to reject it
becauseof the security problem.> > In which case I think my time would be better spent looking at the API> described
above.>>  > > I can't think of much use of byte swapping when 99% of the>  > > use of COPY BINARY FROM is to improve
performanceover using>  > > INSERT. Both the reader and writer will be using the same binary>  > > integer/float/etc
formats!> > You must think that the universe consists exclusively of Intel hardware.>  > In my view, standardizing on a
machine-independentbinary format will>  > greatly *expand* the usefulness of COPY BINARY, since the files will not>  >
betied to a single architecture.> > Well my testing (or lack of) of the earlier patch would seem to> indicate it was
doneon non-Intel box (Solaris)! I've got access here> to Solaris (2.5 through to 9), AIX (4.1 to 4.3.3), HPUX (9, 10,
11)>and of course Linux flavours - our apps run on these UNIX versions. So> i'm well aware of binary format issues (for
funlook into the SEG-D> and SEG-Y formats used within the seismic industry).> > However, is COPY BINARY meant/designed
tobe used as transfer or> backup mechanism? I have trouble coming up with many uses where a> binary file generated on
oneserver would be loaded into another> server running on a different architecture.> > Regards, Lee.