Обсуждение: DBD::PgSQL: More Queestions

Поиск
Список
Период
Сортировка

DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
Greetings fellow DBI and PostgreSQL hackers. Yes, it's me the C and XS 
newbie who wants to write a new PostgreSQL DBI driver, back with more 
questions. Thanks for bearing with me.

Status: I've just finished reading through DBD::Pg's dbd_preparse() and 
dbd_st_execute() functions, and they've left me with a *lot* of 
questions, which follow. Depending on the answers to these questions, 
I'll likely be deciding how to proceed with the development. This will 
naturally affect how long it will take to do the development (and I'm 
currently doing this in my [non-existent] spare time); I think it's 
going to take several months.

Anyway, to the questions. First, a couple of general questions for the 
DBI folks:

* What are the guidelines for handling trace? I note that DBD::Pg 
checks dbis->debug far more often than, say, DBD::mysql, and generally 
sets a lower threshold (1). What sorts of things should a driver be 
printing at what debug/trace levels?

* In several of the functions, DBD::Pg starts with the statement 
"dTHR;". DBD::mysql, meanwhile, starts with this:

#ifdef dTHR  dTHR;
#endif

Which is correct, and what is this thing (variable) for?

* In dbd_st_prepare(), is there a way to determine NUM_OF_FIELDS, NAME, 
etc. -- that is, before executing the statement?

These next questions relate to the dbd_preparse() function in DBD::Pg.

* DBD::Pg is doing a fair amount of work to work around SQL-style, 
C-style, and C++-style comments in SQL statements. Is this necessary? 
Are comments officially supported by the DBI spec? I notice that 
DBD::ODBC, for example, doesn't appear to do anything special with 
regard to comments. And if comments *are* supported by DBI, and DBD::Pg 
is doing the right thing by watching out for them, wouldn't it be 
smarter for DBD::Pg to cut them *out* of its internal copy of the SQL 
statement so that dbd_st_execute() doesn't also have to deal with them?

* Dumb question: Can someone explain to me what's happening with the 
literal stuff? What is the parse doing, exactly? I have a general idea, 
but some of it is a bit opaque to my Perl-trained brain. :-)

These questions related to the dbd_st_execute() function in DBD::Pg.

* If the answer to the last question is "no", then in dbd_st_execute(), 
where DBD::Pg sets the number of fields, (sh|c)ouldn't this be done 
only once per prepared statement? I mean, no matter how many times a 
prepared statement executes, its number of fields won't change, will it?

* With regard to the svp and phs variables, which get their values like 
this:
  svp = hv_fetch(imp_sth->all_params_hv, namebuf, i, 0);  phs = (phs_t*)(void*)SvPVX(*svp);

What are these variables, and where do they come from? I realize I'm 
showing my ignorance of Perl internals here, but I'm also trying to 
understand whether these variables are retrieving metadata from the 
PostgreSQL server. I understand that phs->ftype is checked to see if 
the value needs to be quoted, but I'm not sure where phs->ftype comes 
from.





-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
On Tuesday, November 19, 2002, at 03:13  PM, David Wheeler wrote:

> Greetings fellow DBI and PostgreSQL hackers. Yes, it's me the C and XS 
> newbie who wants to write a new PostgreSQL DBI driver, back with more 
> questions. Thanks for bearing with me.

Gah, I wasn't ready to send that. I was working on it on a plane, and 
was planning to finish it later. Not sure how I managed to send it. 
Apologies. More tomorrow when I've had a chance to finish it and 
proof-read it.

Thanks,

David

-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
First, I want to thank everyone who has responded to my posts. I know 
that they're chock full 'o newbie questions, and I *really* appreciate 
the patient explanations that all you folks are taking the time to send 
to me. I *really* appreciate it. The only side affects are that I now 
have to spend much more time writing and responding to emails, and that 
I'm much more motivated to spend the time getting to know the code so 
that I have the knowledge to create an effective update. I couldn't 
have done it without you all. Thanks!

On Wednesday, November 20, 2002, at 02:49  AM, Tim Bunce wrote:

> Drivers should rarely if ever print anything below trace level 3.
> Quite a few drivers get this wrong and it can be quite annoying to
> the users to have to wade through lots of driver output when all
> they want is the basic DBI level 1 or 2 trace.  Use levels around
> 4 through 7 to add more (obscure) detail.

So, 4 for more informative stuff, and down through 7 to report, say, 
every function executed?

> I believe dTHR is only needed for the old "5.005 threads", not the
> new iThreads, and the DBI will no longer support the old 5.005 threads
> so you can delete them all for your new driver.

Okay. Thanks for the tip.

> Some databases, like Oracle, have what you could call a "server-side
> prepare" so DBD::Oracle's prepare() method sends the statement to
> the server and then asks the server to 'describe' it.

PostgreSQL 7.3 has a "server-side prepare" that I'd like to take 
advantage of, but I don't think it offers any way to then "describe" 
the prepared statement.

> (Without that you'd need a full sql parser available on the client
> side.  Even then you wouldn't know the TYPE of the database fields
> without much more work.)

Yep, I'll steer *far* clear of that! :-)

> That's the wrong question. The DBI spec actually makes no assumptions
> about the syntax of the statements. But it does say that question marks
> should be used for placeholders.
>
> Thing is, users tend to get upset (quite reasonably) when a driver
> interprets question marks that are inside comments as real 
> placeholders.

I guess I've never seen users use comments in SQL they're passing to 
the DBI. I personally tend to use Perl comments.

> Personally I'd agree with them that it's a driver, er, limitation
> (so I'm not over-happy with DBD::ODBC ducking the issue, but I
> understand that DBD::ODBC faces a much tougher issue here than most 
> drivers).

Okay. Perhaps I'll leave it in, then. But if I decide not to use 
PostgreSQL 7.3's server-side prepare functionality, and I still need to 
parse statements for every execute, I think that I might then eliminate 
them from the query in dbd_preparse() so that dbd_st_execute() doesn't 
have to worry about them.

> Why should dbd_st_execute have to "deal" with them? The whole string
> should just be sent off to the server.

I believe that others have answered better than I. The short answer: no 
support for sever-side prepare up to now, so dbd_st_execute has to 
parse the statement again every time.

> If it's possible for you to _realiably_ determine the number of
> field at prepare() time, then you should do it. If not, then do it
> on just the *first* execute().

Yeah, I'm not sure that I want to do that kind of parsing in 
dbd_preparse.

> phs stands for placeholder structure. The structure (typedef) is 
> declared
> in one of the driver *.h files. There's one per placeholder and they're
> store in the hash pointed to by imp_sth->all_params_hv. All that's done
> by dbd_preparse() as it scans the statement string.

Thanks. With the help of some of the other replies (notably Jeff's), 
and careful examination of DBD::Pg's and DBD::ODBC's code, I think I'm 
starting to follow what's happening there.

Thanks again for the help, Tim!

Regards,

David

-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
On Wednesday, November 20, 2002, at 07:02  AM, Tim Bunce wrote:

> Ah, okay.
>
> The goal is always to reduce the work that execute() does as far
> as possible. I'd suggest that the preparse code generate a list of
> the 'fragments' of SQL between the placeholders. All execute has
> to do is effectively join the parts together.

Ooh, hey, that's not a bad idea.

> But it's probably not worth worrying about. The cost of the scan
> is tiny and would be dwarfed by other issues like how you handle
> memory allocation (like, measure all fragments and placeholder values
> first then allocate and merge, or concatenate each part into an SV).

Hrm, yes, good point. If I end up using PostgtreSQL 7.3's server-side 
prepare, though, I'll likely to something similar to this, though, as 
the execute syntax is (roughly):
  EXECUTE prepared_statement_name('param1', 'param2', 'paramn')

So I would just have to append the escaped parameters in turn to 
"EXECUTE prepared_statement_name(" plus a closing ")".

I need to think about how to create a prepared statement name, though. 
Some sort of hash (like MD5) would be good, but that might incur too 
much overhead. Ideas?

Regards,

David

-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
On Thursday, November 21, 2002, at 01:44  AM, Tim Bunce wrote:

> MD5 is pretty quick and the base64 result string is short.

Yeah, although then I'd have to hunt up and learn that C API, and then 
make sure that it's available on all platforms.

> But if prepared_statement_name's are local to the session then
> you could just use a sequential integer. When a statement handle
> is destroyed then you can tell the server to discard the statement.

Ah, yes, *much* better idea -- thanks!

> You could also let the application specify a name:
>
>     $dbh->prepare("...", { pg_sth_name => "..." });

Sounds like a good feature to add later.

Thanks Tim.

David

-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
David Wheeler
Дата:
On Thursday, November 21, 2002, at 01:47  AM, Tim Bunce wrote:

> No problem. You're questions are good and we're probably all learning 
> something.

And I'm keeping all these messages, too, and will definitely go back 
and update DBI::DBD at some point.

>> So, 4 for more informative stuff, and down through 7 to report, say,
>> every function executed?
>
> Yeap. (DBD::Oracle uses 9 for that now but I may change it.)

Okay, cool.

David

-- 
David Wheeler                                     AIM: dwTheory
david@wheeler.net                                 ICQ: 15726394
http://david.wheeler.net/                      Yahoo!: dew7e                                               Jabber:
Theory@jabber.org



Re: DBD::PgSQL: More Queestions

От
Tim Bunce
Дата:
On Tue, Nov 19, 2002 at 03:13:54PM -0800, David Wheeler wrote:
> 
> * What are the guidelines for handling trace? I note that DBD::Pg 
> checks dbis->debug far more often than, say, DBD::mysql, and generally 
> sets a lower threshold (1). What sorts of things should a driver be 
> printing at what debug/trace levels?

Drivers should rarely if ever print anything below trace level 3.
Quite a few drivers get this wrong and it can be quite annoying to
the users to have to wade through lots of driver output when all
they want is the basic DBI level 1 or 2 trace.  Use levels around
4 through 7 to add more (obscure) detail.

> * In several of the functions, DBD::Pg starts with the statement 
> "dTHR;". DBD::mysql, meanwhile, starts with this:
> 
> #ifdef dTHR
>   dTHR;
> #endif
> 
> Which is correct, and what is this thing (variable) for?

I believe dTHR is only needed for the old "5.005 threads", not the
new iThreads, and the DBI will no longer support the old 5.005 threads
so you can delete them all for your new driver.

> * In dbd_st_prepare(), is there a way to determine NUM_OF_FIELDS, NAME, 
> etc. -- that is, before executing the statement?

Some databases, like Oracle, have what you could call a "server-side
prepare" so DBD::Oracle's prepare() method sends the statement to
the server and then asks the server to 'describe' it.

(Without that you'd need a full sql parser available on the client
side.  Even then you wouldn't know the TYPE of the database fields
without much more work.)

> These next questions relate to the dbd_preparse() function in DBD::Pg.
> 
> * DBD::Pg is doing a fair amount of work to work around SQL-style, 
> C-style, and C++-style comments in SQL statements. Is this necessary? 
> Are comments officially supported by the DBI spec?

That's the wrong question. The DBI spec actually makes no assumptions
about the syntax of the statements. But it does say that question marks
should be used for placeholders.

Thing is, users tend to get upset (quite reasonably) when a driver
interprets question marks that are inside comments as real placeholders.

Personally I'd agree with them that it's a driver, er, limitation
(so I'm not over-happy with DBD::ODBC ducking the issue, but I
understand that DBD::ODBC faces a much tougher issue here than most drivers).

> I notice that 
> DBD::ODBC, for example, doesn't appear to do anything special with 
> regard to comments. And if comments *are* supported by DBI, and DBD::Pg 
> is doing the right thing by watching out for them, wouldn't it be 
> smarter for DBD::Pg to cut them *out* of its internal copy of the SQL 
> statement so that dbd_st_execute() doesn't also have to deal with them?

Why should dbd_st_execute have to "deal" with them? The whole string
should just be sent off to the server.

> * Dumb question: Can someone explain to me what's happening with the 
> literal stuff? What is the parse doing, exactly? I have a general idea, 
> but some of it is a bit opaque to my Perl-trained brain. :-)

Same issue as comments. Consider this statement:
   SELECT * FROM table WHERE a = ? AND b = '?' /* is this right? */

How many question marks are there? Three. And how many placeholders? One.


> These questions related to the dbd_st_execute() function in DBD::Pg.
> 
> * If the answer to the last question is "no", then in dbd_st_execute(), 
> where DBD::Pg sets the number of fields, (sh|c)ouldn't this be done 
> only once per prepared statement? I mean, no matter how many times a 
> prepared statement executes, its number of fields won't change, will it?

If it's possible for you to _realiably_ determine the number of
field at prepare() time, then you should do it. If not, then do it
on just the *first* execute().

> * With regard to the svp and phs variables, which get their values like 
> this:
> 
>   svp = hv_fetch(imp_sth->all_params_hv, namebuf, i, 0);
>   phs = (phs_t*)(void*)SvPVX(*svp);
> 
> What are these variables, and where do they come from?

Your dbd_preparse function set them up as it recognized the placeholders.

> I realize I'm 
> showing my ignorance of Perl internals here, but I'm also trying to 
> understand whether these variables are retrieving metadata from the 
> PostgreSQL server. I understand that phs->ftype is checked to see if 
> the value needs to be quoted, but I'm not sure where phs->ftype comes 
> from.

phs stands for placeholder structure. The structure (typedef) is declared
in one of the driver *.h files. There's one per placeholder and they're
store in the hash pointed to by imp_sth->all_params_hv. All that's done
by dbd_preparse() as it scans the statement string.

Tim.


Re: DBD::PgSQL: More Queestions

От
Rudy Lippan
Дата:
On Wed, 20 Nov 2002, Tim Bunce wrote:

> > I notice that 
> > DBD::ODBC, for example, doesn't appear to do anything special with 
> > regard to comments. And if comments *are* supported by DBI, and DBD::Pg 
> > is doing the right thing by watching out for them, wouldn't it be 
> > smarter for DBD::Pg to cut them *out* of its internal copy of the SQL 
> > statement so that dbd_st_execute() doesn't also have to deal with them?
> 
> Why should dbd_st_execute have to "deal" with them? The whole string
> should just be sent off to the server.
> 

PostgreSQL itself does not (yet) support placeholders, so DBD::Pg emulates
them by scanning the string again in dbd_st_execute() looking for
placeholders and substituting them quoted values of what was passed to 
execute().

or maybe more clearly in pseudo code:

preparse {  look for :pn :0 ? type place holders.  replace with :pn type placehoders.
}

execute {   look for :pn type placeholders   replace with for quote(valueof(:pn))   send to db.
}

-r




Re: DBD::PgSQL: More Queestions

От
Tim Bunce
Дата:
On Wed, Nov 20, 2002 at 08:58:59AM -0500, Rudy Lippan wrote:
> On Wed, 20 Nov 2002, Tim Bunce wrote:
> 
> > > I notice that 
> > > DBD::ODBC, for example, doesn't appear to do anything special with 
> > > regard to comments. And if comments *are* supported by DBI, and DBD::Pg 
> > > is doing the right thing by watching out for them, wouldn't it be 
> > > smarter for DBD::Pg to cut them *out* of its internal copy of the SQL 
> > > statement so that dbd_st_execute() doesn't also have to deal with them?
> > 
> > Why should dbd_st_execute have to "deal" with them? The whole string
> > should just be sent off to the server.
> 
> PostgreSQL itself does not (yet) support placeholders, so DBD::Pg emulates
> them by scanning the string again in dbd_st_execute() looking for
> placeholders and substituting them quoted values of what was passed to 
> execute().

Ah, okay.

The goal is always to reduce the work that execute() does as far
as possible. I'd suggest that the preparse code generate a list of
the 'fragments' of SQL between the placeholders. All execute has
to do is effectively join the parts together.

But it's probably not worth worrying about. The cost of the scan
is tiny and would be dwarfed by other issues like how you handle
memory allocation (like, measure all fragments and placeholder values
first then allocate and merge, or concatenate each part into an SV).

Tim.


Re: DBD::PgSQL: More Queestions

От
Tim Bunce
Дата:
On Wed, Nov 20, 2002 at 07:02:12PM -0800, David Wheeler wrote:
> On Wednesday, November 20, 2002, at 07:02  AM, Tim Bunce wrote:
> 
> Hrm, yes, good point. If I end up using PostgtreSQL 7.3's server-side 
> prepare, though, I'll likely to something similar to this, though, as 
> the execute syntax is (roughly):
> 
>   EXECUTE prepared_statement_name('param1', 'param2', 'paramn')
> 
> So I would just have to append the escaped parameters in turn to 
> "EXECUTE prepared_statement_name(" plus a closing ")".
> 
> I need to think about how to create a prepared statement name, though. 
> Some sort of hash (like MD5) would be good, but that might incur too 
> much overhead. Ideas?

MD5 is pretty quick and the base64 result string is short.

But if prepared_statement_name's are local to the session then
you could just use a sequential integer. When a statement handle
is destroyed then you can tell the server to discard the statement.

You could also let the application specify a name:
$dbh->prepare("...", { pg_sth_name => "..." });

Tim.


Re: DBD::PgSQL: More Queestions

От
Tim Bunce
Дата:
On Wed, Nov 20, 2002 at 06:03:00PM -0800, David Wheeler wrote:
> First, I want to thank everyone who has responded to my posts. I know 
> that they're chock full 'o newbie questions, and I *really* appreciate 
> the patient explanations that all you folks are taking the time to send 
> to me. I *really* appreciate it. The only side affects are that I now 
> have to spend much more time writing and responding to emails, and that 
> I'm much more motivated to spend the time getting to know the code so 
> that I have the knowledge to create an effective update. I couldn't 
> have done it without you all. Thanks!

No problem. You're questions are good and we're probably all learning something.

> On Wednesday, November 20, 2002, at 02:49  AM, Tim Bunce wrote:
> 
> >Drivers should rarely if ever print anything below trace level 3.
> >Quite a few drivers get this wrong and it can be quite annoying to
> >the users to have to wade through lots of driver output when all
> >they want is the basic DBI level 1 or 2 trace.  Use levels around
> >4 through 7 to add more (obscure) detail.
> 
> So, 4 for more informative stuff, and down through 7 to report, say, 
> every function executed?

Yeap. (DBD::Oracle uses 9 for that now but I may change it.)

Tim.