Обсуждение: ECGP - varchar in struct?
I looked at comp.databases.postgresql.questions
and posted a brief form of this question there.
I am porting an existing Oracle/ PRO-C Application
over to PostgreSQL 7.2.1 (ecpg 2.9.0).
This application takes advantage of the fact that
PRO-C lets you use 'varchar' as part of 'struct'
definitions by re-writing 'varchar' during INCLUDE
or DECLARE processing either one. This application
can (and presently does) make a lot of use of:
struct rowdef { ... /* Column */ varchar nameOFcolumn[maxlen] ; ... /* Column */
};
to define an interface buffer. PRO-C will re-write
this, during EXEC SQL INCLUDE processing, as:
struct rowdef {
... /* Column */
struct {int len;  char arr[maxlen];} nameOFcolumn;
... /* Column */
};
This makes it possible to easily declare an I/O buffer
that can hold many rows:
EXEC SQL BEGIN DECLARE SECTION;   struct rowdef ManyRows[25];
EXEC SQL END DECLARE SECTION;
and code can easily refer to members of this buffer:
/* Length column instance 7 */
ManyRows [7].nameOFcolumn.len;
/* Address of column instance 7 */
ManyRows [7].nameOFcolumn.arr;
or send them to other compile units:
foo (ManyRows + 7);
can run 'foo' on row 7 without 'foo' needing to be
able to see the ManyRows DECLARE. The 'foo' compile
unit only needs the INCLUDE to provide the 'rowdef'
struct layout, since INCLUDE processing takes care
of the C re-write that a varchar always needs.
====
It would appear that ecpg will only re-write a
varchar if it is found in a DECLARE statement,
because the varchar gets past the ecpg preprocessor's
INCLUDE as-is ... and, of course, the real C compiler
does not know what a 'varchar' is, so it complains.
As a PostgreSQL 'newbie', I am not sure if it is more
work to fix ecpg or to re-write how the application
does things. I see it is a 'yacc' unit that handles the
re-writing that ecpg now does, but I do not know
'yacc-ese' and do not know what else would be involved
to fix ecpg to work like PRO-C?
- Did I miss something here?
- Any insights/ advice from the gurus?
- Would it be a really big deal to support varchar re-writes during INCLUDE and not just during DECLARE?
			
		On Fri, Jul 26, 2002 at 09:38:25PM -0400, William West wrote: > It would appear that ecpg will only re-write a > varchar if it is found in a DECLARE statement, > because the varchar gets past the ecpg preprocessor's > INCLUDE as-is ... and, of course, the real C compiler > does not know what a 'varchar' is, so it complains. Yes, ecpg only parses stuff inside the declare section. A full syntax checking mode would be a nice add-on (and is listed as todo) but IMO it is not really needed urgently. > As a PostgreSQL 'newbie', I am not sure if it is more > work to fix ecpg or to re-write how the application > does things. I see it is a 'yacc' unit that handles the > re-writing that ecpg now does, but I do not know > 'yacc-ese' and do not know what else would be involved > to fix ecpg to work like PRO-C? Yes, it's quite some work. :-) But if you volunteer I gladly accept patches. :-) > - Did I miss something here? Yes, at first you can list you struct definition inside the declare section and it will work nicely. And second you can use a typedef command that AFAIK Pro*C does not have. This should help you without much work. > varchar re-writes during INCLUDE and not > just during DECLARE? Actually it does not have anything to do with INCLUDE or I misunderstood your problem comletely. Hope this helps. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
>> As a PostgreSQL 'newbie', I am not sure if it is more >> work to fix ecpg or to re-write how the application >> does things. I see it is a 'yacc' unit that handles the >> re-writing that ecpg now does, but I do not know >> 'yacc-ese' and do not know what else would be involved >> to fix ecpg to work like PRO-C? > Yes, it's quite some work. :-) But if you volunteer I gladly accept > patches. :-) The 'encapsulated C' approach would require few changes to Pro-C based application, and would make it easier to support both Oracle and PostgreSQL from the same code-base, if ecpg had a capabilities closer to Pro-C's. But I am leery of the time it would take for me to continue with that more general and portable approach (first upgrade ecpg and then port the application) ... so I have sort of decided to convert to using the libpq (or maybe libpqeasy + libpq) instead ... at least then it is only one project (albeit much bigger than at first expected). Perhaps later, when I have mastered the libpq interface, I'll reconsider taking you up on that, since the 'encapsulated' approach really would be better from a maintenance point of view :-) >> - Did I miss something here? > Yes, at first you can list you struct definition inside the declare > section and it will work nicely. I need for the struct to define a shape and to be able to *separately* instantiate instances (or arrays-of-instances) of that shape. As I understand it, a DECLARE section will instantiate an instance of the struct each time I INCLUDE it in a compile unit, and I am not sure how I would declare a multiple-rows array? To clarify what I am faced with: The existing application has many different structs, each of which has members for the columns affected by any of a number of different realtime message-driven update accesses. That is, there are a couple of dozen different messages, each needing to revise different columns of the tables defined (one of which has in excess of 60 columns). There are also 'update all' messages that need to INSERT or UPDATE into *all* columns of the 60+ columns table. Each message is backed by a struct that has storage shapes for the column(s) to be updated by that message. The functions that perform message data-to-struct (or data-from-struct) operations are in different compile units than are the functions that issue the SQL operations that exchange data with the database. The application currently does pass-by-reference operations among many functions across many separate compile units. Of course, every compile unit having to DECLARE the struct will give each compile unit its own *instance* of the struct and won't allow the application to do the SQL to load/ unload the struct from/to the database in one function in one compile unit and then pass by reference to/ from other functions in other compile units (as the application currently does). In addition, the application processes some messages that work with many rows from a given table all at once. Being unable to separate 'row shapes' (struct layouts) from instantiation thereof (DECLARE thereof) appears to make it difficult or impossible to instantiate front-end space for arrays-of-rows. It also appears to make it impossible to have ecpg issue SELECTS/ INSERTS/ UPDATES that affect many rows as well as columns with each front-end to back-end SQL exchange (each EXEC SQL) ... something that the present application does a *lot* of and that - at least with Oracle - gives an order of magnitude performance improvement. It appears to me that the libpq layer front-end to back-end interface is able to pass an open-ended number of columns and rows per PQexec() call ... that the problem would be a vectoring/ mapping definition capable of properly delivering PQgetvalue() returns into stuct members. That is why, unless I am still missing something, I am inclined to convert to a direct-to-libpq interface (or maybe a hybrid of libpgeasy plus libpq). Does that seem reasonable, given the present organization of the application?
On Sun, Aug 11, 2002 at 05:14:07PM -0400, William West wrote:
> The 'encapsulated C' approach would require few changes to
> Pro-C based application, and would make it easier to support
> both Oracle and PostgreSQL from the same code-base, if ecpg
> had a capabilities closer to Pro-C's.
But moving to ECPG's encapsulated approach does not prevent the app from
running under Pro*C, i.e. Pro*C can also work with everything listed
inside the declare section etc.
> But I am leery of the time it would take for me to continue with
> that more general and portable approach (first upgrade ecpg and
> then port the application) ... so I have sort of decided to convert
> to using the libpq (or maybe libpqeasy + libpq) instead ... at
> least then it is only one project (albeit much bigger than at
> first expected).
This certainly will be much more time to port the app. If you have a
Pro*C app, porting it to ECPG will almost definitely be less work than
proting to libpq.
> I need for the struct to define a shape and to be able to
> *separately* instantiate instances (or arrays-of-instances)
> of that shape.
But you can add the definition for each instance. Yes, C doesn't require
this, but it does not forbid it either. Also you can use a typedef to
define the struct so ecpg does the job of adding the definition.
> As I understand it, a DECLARE section will instantiate an
> instance of the struct each time I INCLUDE it in a compile unit,
What I mean is instaed of
struct foo {...};
struct foo a;
struct foo b;
you can use
exec sql begin declare section;
struct foo {...} a;
exec sql end declare section;
exec sql begin declare section;
struct foo {...} b;
exec sql end declare section;
or instead
exec sql typedef sf struct foo {...};
exec sql begin declare section;
sf a;
exec sql end declare section;
exec sql begin declare section;
sf b;
exec sql end declare section;
> and I am not sure how I would declare a multiple-rows array?
I'm not sure what you mean with this.
Also I have to admit that I didn't completely understand your mail in
terms of seeing the problem that you cannot solve with ecpg. Maybe a
code example would help me understand it better.
> It also appears to make it impossible to have ecpg issue SELECTS/ INSERTS/
> UPDATES that affect many rows as well as columns with each front-end to
> back-end SQL exchange (each EXEC SQL) ... something that the present
> application does a *lot* of and that - at least with Oracle - gives an
> order of
> magnitude performance improvement.
Once again I do not understand that. Do you mean for instance one select
that reads several tuples at the same time?
> It appears to me that the libpq layer front-end to back-end interface
> is able to pass an open-ended number of columns and rows per PQexec()
> call ... that the problem would be a vectoring/ mapping definition capable
> of
> properly delivering PQgetvalue() returns into stuct members.
But that's exactly what ecpg does. It just encapsulates the libpq calls.
It seems I did not fully understand your question nor answer it
accordingly. :-)
Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!
			
		
I really appreciate your time. I know a lot of what I am saying
must sound silly, due to how low on the learning curve I am.
> exec sql typedef sf struct foo {...};
>
> exec sql begin declare section;
> sf a;
> exec sql end declare section;
This looks promising for what I am trying to do. Is
'exec sql typedef' ecpg specific? I do not remember seeing
it called out as anything special in the books I have looked at?
I have looked at Bruce Momjian's "PostgreSQL - Introduction
and Concepts" book, the Geshwinde-Shonig "Developers Handbook"
put out by SAMS, and the 'PostgreSQL 7.2 Programmers Guide'
from the postgesql.org website.  Anyway, I tried:
exec sql typedef sf struct foo {int intOne; int intTwo};
and I get:
ERROR: parse error, unexpected IDENT at or near "typedef"
from ecpg? I am using ecpg 2.9.0 (Postgres 7.2.1).
===============
> exec sql begin declare section;
> struct foo {...} a;
> exec sql end declare section;
I can see that I have a very fundamental understanding problem
or misconception of some kind, in how one can/ should use structures
inside of an ecpg declare section, and appreciate your patience
(and suppose I am giving you and others a good laugh). I know I am
being prejudiced by what a structure inside a declare section
means to Pro*C, which is "here is the layout of some columns
that I intend to get from or write to a row".
It seems that ecpg does not have any such understanding about
the meaning of a struct and its members and that is (I'm sure)
one of my biggest problems. To see what I am up against, I tried to
make a struct defining two  varchar columns, per your example, but
then I could not for the life of me figure out how to use the struct.
Here is what I setup:
exec sql begin declare section;
struct foo {varchar col1[15]; varchar col2[20];} bar;
exec sql end declare section;
and then made a simple two varchar columns table:
exec sql create table
mytable (col1 varchar(15), col2 varchar(20));
and then checked that this gave me a new row containing Literal1
in col1 and Literal2 in col2:
exec sql insert into mytable values ('Literal1', 'Literal2');
That worked, so then I tried to make ecpg work the way Pro*C works
(bear with me). I setup 'bar' in the declarations section by:
sprintf(bar.col1.arr, "Variable1")
bar.col1.len = strlen(bar.col1.arr);
sprintf(bar.col2.len, "Variable1")
bar.col2.len = strlen(bar.col2.arr);
then tried to load a row with Variable1 in col1 and Variable2
in col2 by:
exec sql insert into mytable values (:bar);
Pro*C would take this to mean to put the present values of
bar.col1 and bar.col2 into columns 1 and 2 of a new row
added to mytable. This builds cleanly in ecpg, seems to emit
a sensible ECPGdo() call, but gives "-201, too many arguments"
at run time. I can't figure out what the library wants ... ECPGdo()
is emitted with two vectors, one to each of the struct members?
I then tried to figure out how to mention the struct members
individually in the values() clause:
exec sql insert into mytable values (:bar.col1.arr, :bar.col2.arr);
But this causes ecpg to issue the error "'bar.col1.arr' is not
declared". I can't figure out what ecpg would want, to be
able to refer to the individual members of the struct?
=============================
So as you can see, I surely have a some kind of very big gap
in my understanding of how one would/ could use structures in
the declare section of ecpg in ways that even remotely
resemble how the existing Pro*C based application now uses
structures in the declare section.
To make my job even harder, Pro*C uses its understanding
of "struct is row"/ "member of struct is column" in order to
support arrays-of-rows exchanges between the front-end
and the back-end.
Pro*C supports an "EXEC SQL FOR :batchsize" construct that
allows you to INSERT, UPDATE, or DELETE many rows
with one "EXEC SQL".  The current application, to exploit this
capability, is full of declarations that are roughly equivalent to:
exec sql begin declare section;
struct foo {varchar col1[15]; varchar col2[20];} bar [ROWS];
exec sql end declare section;
where 'ROWS' establishes the maximum batch size. The Pro*C
understanding of the meaning of a structure in the declare section
(as signifying a row) and its members (as signifying a column) is
fundamental to being able to declare space for batched operations
via C-Language array declaration syntax.
Claims have been made that this can increase an Oracle application's
performance by a factor of 100 or more. Indeed, with our application,
using a modest ':batchsize' (ROWS value), we observe an increase
in performance of a factor of 10 or more over one front-end/ back-end
exchange per insert/ delete/ update.
I know this is already *VERY* long, and if you got this far, thanks.
If you have not lost patience, here is a "simplified as much as possible"
example of how the application is presently organized, in an attempt to
show why the application needs for 'row shaping' to be independent
of 'declare space for row':
/* begin row_shape.h */
typedef struct
{  varchar col1 [15];     ...  varchar coln [140];
} row_shape_t;
/* end row shape .h */
/* begin multi_row_insert.pc */
=== DB INTERFACE COMPILE UNIT ===
#define ROWS_N 50
exec sql include row_shape.h
int
multi_row_insert (msg_t *msg_p)
{
  row_shape_t *row_p;
  exec sql begin declare section;  int row_n;  row_shape_t rowray [ROWS_N];  exec sql end declare section;
  for (row_n = 0, row_p = rowray; row_n < ROWS_N; row_n++, row_p++)  {      if (!load_row_from_msg (msg_p, row_p,
row_n))   break;  }
 
  exec sql begin transaction;
  exec sql whenever sqlerror goto errexit;  /* inserts 'row_n' rows into "table_in_question" */  exec sql for :row_n
insertinto table_in_question     values (:rowray);
 
  exec sql commit work;
  return row_n;
errexit:  exec sql rollback work;  return    -1;
}
/* end multi_row_insert.pc */
=== MESSAGE HANDLING COMPILE UNIT ===
/* begin load_row_from_msg.pc */
exec sql include row_shape.h
/* returns TRUE if message loaded another row_shape_t */
/* returns FALSE if message has no more rows */
bool
load_row_from_msg (msg_t *msg_p, row_shape_t *row_p, int row_n)
{
   /* Message indicates how many rows it has in msg_p->rows_n */   if (msg_p->rows_n < row_n)    return    FALSE;
   memset (row_p, 0, sizeof (*row_p));
  /* Transfers message data for each column of the   *  current row into the current row_shape_t   */
memcpy(row_p->col1.arr,         msg_p[row_n]col1_data,          msg_p[row_n]col1_data_len);   row_p->col1.len =
msg_p[row_n]col1_data_len);    ....   memcpy(row_p->coln.arr,          msg_p[row_n]coln_data,
msg_p[row_n]coln_data_len);  row_p->coln.len = msg_p[row_n]coln_data_len);
 
   return TRUE
}
/* end load_row_from_msg.pc */
============
Lack of the "exec sql for :batchsize" capability is a big problem to
overcome,
and inability to treat structs as 'row shapes' is another big problem to
overcome ... partly because of the inability to pass rows by reference
across
different compile units and partly because there is no longer any way to
clear
all columns in row (prior to insert) by using 'memset()'.
The combined effect of no longer being able to pass-by-reference, needing
to
perform one front-end to back-end  exchange for each row insert/ update/
delete,
and needing to clear each and every column buffer separately puts us in
danger
of being unable to handle the necessary realtime message arrival rates.
This fear of the effects on performance, along with the scope of
application-internal
interface changes that seem to be needed, is what is making me inclined to
consider,
much as I hate it, going to libpq and/ or libpqeasy directly.
I really do need to get going due to deadline pressures, and would hate to
make
a dumb/ bad decision about the basic approach just because of something I
am
missing or do not correctly understand.
			
		On Mon, Aug 12, 2002 at 09:53:03PM -0400, William West wrote:
> I really appreciate your time. I know a lot of what I am saying
> must sound silly, due to how low on the learning curve I am.
Actually it sounds very reasonable, I just don't fully understand it.
> > exec sql typedef sf struct foo {...};
Oops. This is incorrect syntax, must read:
exec sql type sf is struct foo {...};
> This looks promising for what I am trying to do. Is
> 'exec sql typedef' ecpg specific? I do not remember seeing
> it called out as anything special in the books I have looked at?
Informix has it also. 
> exec sql typedef sf struct foo {int intOne; int intTwo};
> 
> and I get:
> 
> ERROR: parse error, unexpected IDENT at or near "typedef"
Sure. My fault. Should be "exec sql type ... is ...;"
> Here is what I setup:
> ...
> exec sql insert into mytable values (:bar);
Hmm, that should work. Could you please send me the complete file?
> Pro*C would take this to mean to put the present values of
> bar.col1 and bar.col2 into columns 1 and 2 of a new row
> added to mytable. This builds cleanly in ecpg, seems to emit
> a sensible ECPGdo() call, but gives "-201, too many arguments"
> at run time. I can't figure out what the library wants ... ECPGdo()
> is emitted with two vectors, one to each of the struct members?
Yes, that should be the correct ECPGdo() call.
> I then tried to figure out how to mention the struct members
> individually in the values() clause:
> 
> exec sql insert into mytable values (:bar.col1.arr, :bar.col2.arr);
> 
> But this causes ecpg to issue the error "'bar.col1.arr' is not
> declared". I can't figure out what ecpg would want, to be
> able to refer to the individual members of the struct?
To get them individually you have to use:
exec sql insert into mytable values (:bar.col1, :bar.col2);
> where 'ROWS' establishes the maximum batch size. The Pro*C
> understanding of the meaning of a structure in the declare section
> (as signifying a row) and its members (as signifying a column) is
> fundamental to being able to declare space for batched operations
> via C-Language array declaration syntax.
You can use arrays of struct with ecpg as well. Please look into
ecpg/test/*.pgc for some examples.
> Lack of the "exec sql for :batchsize" capability is a big problem to
> overcome,
Hmm, looks like a good idea to implement in ecpg.
> This fear of the effects on performance, along with the scope of
> application-internal
> interface changes that seem to be needed, is what is making me inclined to
> consider,
> much as I hate it, going to libpq and/ or libpqeasy directly.
Of course there may be reasons to use libpq, but I will certainly work
on minimizing these. As soon as I learn about a missing feature and find
time I will implement it. 
Now that won't help you of course. :-)
Michael
-- 
Michael Meskes
Michael@Fam-Meskes.De
Go SF 49ers! Go Rhein Fire!
Use Debian GNU/Linux! Use PostgreSQL!
			
		
I have worked out a method to export references to
C-Variables in a DECLARE section such that I can
pass pointer-to-struct for the 'ecpg' compile unit
to other compile units that do message and
communications processing (as the original Oracle
based package now does).
I did this by writing a tool to parse the CREATE TABLE
psql SQL directives that setup the database tables,
to emit a C-Language structure definition into one
file that non-ecpg compile units can '#include', while
emitting a separate ecpg syntax DECLARE statement
into another file that the ecpg compile unit can
EXEC SQL INCLUDE; the two are guaranteed to
be synchronized with each other because they both
come from the same CREATE TABLE statement via
the tool.
====
However, I am having a problem with the SET
arguments to UPDATE.
I have a DECLARE section that looks like this:
EXEC SQL BEGIN DECLARE SECTION;
static struct hcs_status_info_dd {
     varchar   h_f242a_opsts_3c [3];
} H_hcs_status_info;
EXEC SQL END DECLARE SECTION;
which ecpg translates to:
/* exec sql begin declare section */
 static  struct hcs_status_info_dd {
       struct varchar_h_f242a_opsts_3c  { int len; char arr[ 3 ]; }
h_f242a_opsts_3c   ;
 }  H_hcs_status_info   ;
/* exec sql end declare section */
(In the real system, the structure has a much larger
number of members, because the external message
communication and processing compile units need
to access a much larger number of columns).
ecpg does all the things I wanted for SELECT, INSERT,
etc ... but I cannot get the 'SET' arguments needed for
UPDATE to work.
I have an UPDATE statement that looks like this
(again, simplified because the 'real'; system has
*many* members-of-structure needing to be SET):
EXEC SQL UPDATE hcs_status_info SET
     f242a_opsts_3c = :H_hcs_status_info.h_f242a_opsts_3c;
ecpg translates this (chopped a little to isolate
the variable references) into:
{ ECPGdo(__LINE__, NULL, "update hcs_status_info set f242a_opsts_3c  = ?
",
ECPGt_varchar,
&(H_hcs_status_info.h_f242a_opsts_3c),
3L,1L,
sizeof(struct varchar_H_hcs_status_info.h_f242a_opsts_3c),
ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
ECPGt_EOIT, ECPGt_EORT);}
The "&(H_hcs_status_info.h_f242a_opsts_3c)" reference is
exactly correct to get to the 'f242a_opsts_3c' column value.
But I get an error from gcc (rightly so) because it does not like
"sizeof(struct varchar_H_hcs_status_info.h_f242a_opsts_3c)"
because the whole structure is not a vachar, only the
member-of-structure is a varchar. In fact, the struct name
"varchar_H_hcs_status_info" does *not* exist, it is the struct
name "varchar_h_f242a_opsts_3c" that *does* exist.
I believe the expression (to be correct for C) would need
to read something like:
"sizeof(struct varchar_h_f242a_opsts_3c)"
         -or-
"sizeof (H_hcs_status_info.h_f242a_opsts_3c)"
        -?or?-
====
Is there any approach that can get around this?
Note that it is far too impractical to make those
varchar members separate variables, due to the need
to 'lasso' them all into a structure, so the address
of the structure can be passed around to other compile
units that do communications and message processing.
====
I have attached a tarbun that has the entire .pgc,
the '.c' file that I get from ecpg, and a makefile
file (upd_hcs_tst.m) that runs .pcg through ecpg,
and then gcc the .c file.
(See attached file: upd_tst.tar.gz)
			
		Вложения
On Tue, Aug 27, 2002 at 08:11:58PM -0400, William West wrote: > However, I am having a problem with the SET > arguments to UPDATE. > ... This appears to be a bug in ecpg. It should only list the part behind the dot. But I won't be able to look for it for at least another week sorry. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!