Обсуждение: DBD::PgSQL: More Queestions
Greetings fellow DBI and PostgreSQL hackers. Yes, it's me the C and XS newbie who wants to write a new PostgreSQL DBI driver, back with more questions. Thanks for bearing with me. Status: I've just finished reading through DBD::Pg's dbd_preparse() and dbd_st_execute() functions, and they've left me with a *lot* of questions, which follow. Depending on the answers to these questions, I'll likely be deciding how to proceed with the development. This will naturally affect how long it will take to do the development (and I'm currently doing this in my [non-existent] spare time); I think it's going to take several months. Anyway, to the questions. First, a couple of general questions for the DBI folks: * What are the guidelines for handling trace? I note that DBD::Pg checks dbis->debug far more often than, say, DBD::mysql, and generally sets a lower threshold (1). What sorts of things should a driver be printing at what debug/trace levels? * In several of the functions, DBD::Pg starts with the statement "dTHR;". DBD::mysql, meanwhile, starts with this: #ifdef dTHR dTHR; #endif Which is correct, and what is this thing (variable) for? * In dbd_st_prepare(), is there a way to determine NUM_OF_FIELDS, NAME, etc. -- that is, before executing the statement? These next questions relate to the dbd_preparse() function in DBD::Pg. * DBD::Pg is doing a fair amount of work to work around SQL-style, C-style, and C++-style comments in SQL statements. Is this necessary? Are comments officially supported by the DBI spec? I notice that DBD::ODBC, for example, doesn't appear to do anything special with regard to comments. And if comments *are* supported by DBI, and DBD::Pg is doing the right thing by watching out for them, wouldn't it be smarter for DBD::Pg to cut them *out* of its internal copy of the SQL statement so that dbd_st_execute() doesn't also have to deal with them? * Dumb question: Can someone explain to me what's happening with the literal stuff? What is the parse doing, exactly? I have a general idea, but some of it is a bit opaque to my Perl-trained brain. :-) These questions related to the dbd_st_execute() function in DBD::Pg. * If the answer to the last question is "no", then in dbd_st_execute(), where DBD::Pg sets the number of fields, (sh|c)ouldn't this be done only once per prepared statement? I mean, no matter how many times a prepared statement executes, its number of fields won't change, will it? * With regard to the svp and phs variables, which get their values like this: svp = hv_fetch(imp_sth->all_params_hv, namebuf, i, 0); phs = (phs_t*)(void*)SvPVX(*svp); What are these variables, and where do they come from? I realize I'm showing my ignorance of Perl internals here, but I'm also trying to understand whether these variables are retrieving metadata from the PostgreSQL server. I understand that phs->ftype is checked to see if the value needs to be quoted, but I'm not sure where phs->ftype comes from. -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
On Tuesday, November 19, 2002, at 03:13 PM, David Wheeler wrote: > Greetings fellow DBI and PostgreSQL hackers. Yes, it's me the C and XS > newbie who wants to write a new PostgreSQL DBI driver, back with more > questions. Thanks for bearing with me. Gah, I wasn't ready to send that. I was working on it on a plane, and was planning to finish it later. Not sure how I managed to send it. Apologies. More tomorrow when I've had a chance to finish it and proof-read it. Thanks, David -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
First, I want to thank everyone who has responded to my posts. I know that they're chock full 'o newbie questions, and I *really* appreciate the patient explanations that all you folks are taking the time to send to me. I *really* appreciate it. The only side affects are that I now have to spend much more time writing and responding to emails, and that I'm much more motivated to spend the time getting to know the code so that I have the knowledge to create an effective update. I couldn't have done it without you all. Thanks! On Wednesday, November 20, 2002, at 02:49 AM, Tim Bunce wrote: > Drivers should rarely if ever print anything below trace level 3. > Quite a few drivers get this wrong and it can be quite annoying to > the users to have to wade through lots of driver output when all > they want is the basic DBI level 1 or 2 trace. Use levels around > 4 through 7 to add more (obscure) detail. So, 4 for more informative stuff, and down through 7 to report, say, every function executed? > I believe dTHR is only needed for the old "5.005 threads", not the > new iThreads, and the DBI will no longer support the old 5.005 threads > so you can delete them all for your new driver. Okay. Thanks for the tip. > Some databases, like Oracle, have what you could call a "server-side > prepare" so DBD::Oracle's prepare() method sends the statement to > the server and then asks the server to 'describe' it. PostgreSQL 7.3 has a "server-side prepare" that I'd like to take advantage of, but I don't think it offers any way to then "describe" the prepared statement. > (Without that you'd need a full sql parser available on the client > side. Even then you wouldn't know the TYPE of the database fields > without much more work.) Yep, I'll steer *far* clear of that! :-) > That's the wrong question. The DBI spec actually makes no assumptions > about the syntax of the statements. But it does say that question marks > should be used for placeholders. > > Thing is, users tend to get upset (quite reasonably) when a driver > interprets question marks that are inside comments as real > placeholders. I guess I've never seen users use comments in SQL they're passing to the DBI. I personally tend to use Perl comments. > Personally I'd agree with them that it's a driver, er, limitation > (so I'm not over-happy with DBD::ODBC ducking the issue, but I > understand that DBD::ODBC faces a much tougher issue here than most > drivers). Okay. Perhaps I'll leave it in, then. But if I decide not to use PostgreSQL 7.3's server-side prepare functionality, and I still need to parse statements for every execute, I think that I might then eliminate them from the query in dbd_preparse() so that dbd_st_execute() doesn't have to worry about them. > Why should dbd_st_execute have to "deal" with them? The whole string > should just be sent off to the server. I believe that others have answered better than I. The short answer: no support for sever-side prepare up to now, so dbd_st_execute has to parse the statement again every time. > If it's possible for you to _realiably_ determine the number of > field at prepare() time, then you should do it. If not, then do it > on just the *first* execute(). Yeah, I'm not sure that I want to do that kind of parsing in dbd_preparse. > phs stands for placeholder structure. The structure (typedef) is > declared > in one of the driver *.h files. There's one per placeholder and they're > store in the hash pointed to by imp_sth->all_params_hv. All that's done > by dbd_preparse() as it scans the statement string. Thanks. With the help of some of the other replies (notably Jeff's), and careful examination of DBD::Pg's and DBD::ODBC's code, I think I'm starting to follow what's happening there. Thanks again for the help, Tim! Regards, David -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
On Wednesday, November 20, 2002, at 07:02 AM, Tim Bunce wrote: > Ah, okay. > > The goal is always to reduce the work that execute() does as far > as possible. I'd suggest that the preparse code generate a list of > the 'fragments' of SQL between the placeholders. All execute has > to do is effectively join the parts together. Ooh, hey, that's not a bad idea. > But it's probably not worth worrying about. The cost of the scan > is tiny and would be dwarfed by other issues like how you handle > memory allocation (like, measure all fragments and placeholder values > first then allocate and merge, or concatenate each part into an SV). Hrm, yes, good point. If I end up using PostgtreSQL 7.3's server-side prepare, though, I'll likely to something similar to this, though, as the execute syntax is (roughly): EXECUTE prepared_statement_name('param1', 'param2', 'paramn') So I would just have to append the escaped parameters in turn to "EXECUTE prepared_statement_name(" plus a closing ")". I need to think about how to create a prepared statement name, though. Some sort of hash (like MD5) would be good, but that might incur too much overhead. Ideas? Regards, David -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
On Thursday, November 21, 2002, at 01:44 AM, Tim Bunce wrote: > MD5 is pretty quick and the base64 result string is short. Yeah, although then I'd have to hunt up and learn that C API, and then make sure that it's available on all platforms. > But if prepared_statement_name's are local to the session then > you could just use a sequential integer. When a statement handle > is destroyed then you can tell the server to discard the statement. Ah, yes, *much* better idea -- thanks! > You could also let the application specify a name: > > $dbh->prepare("...", { pg_sth_name => "..." }); Sounds like a good feature to add later. Thanks Tim. David -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
On Thursday, November 21, 2002, at 01:47 AM, Tim Bunce wrote: > No problem. You're questions are good and we're probably all learning > something. And I'm keeping all these messages, too, and will definitely go back and update DBI::DBD at some point. >> So, 4 for more informative stuff, and down through 7 to report, say, >> every function executed? > > Yeap. (DBD::Oracle uses 9 for that now but I may change it.) Okay, cool. David -- David Wheeler AIM: dwTheory david@wheeler.net ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: Theory@jabber.org
On Tue, Nov 19, 2002 at 03:13:54PM -0800, David Wheeler wrote: > > * What are the guidelines for handling trace? I note that DBD::Pg > checks dbis->debug far more often than, say, DBD::mysql, and generally > sets a lower threshold (1). What sorts of things should a driver be > printing at what debug/trace levels? Drivers should rarely if ever print anything below trace level 3. Quite a few drivers get this wrong and it can be quite annoying to the users to have to wade through lots of driver output when all they want is the basic DBI level 1 or 2 trace. Use levels around 4 through 7 to add more (obscure) detail. > * In several of the functions, DBD::Pg starts with the statement > "dTHR;". DBD::mysql, meanwhile, starts with this: > > #ifdef dTHR > dTHR; > #endif > > Which is correct, and what is this thing (variable) for? I believe dTHR is only needed for the old "5.005 threads", not the new iThreads, and the DBI will no longer support the old 5.005 threads so you can delete them all for your new driver. > * In dbd_st_prepare(), is there a way to determine NUM_OF_FIELDS, NAME, > etc. -- that is, before executing the statement? Some databases, like Oracle, have what you could call a "server-side prepare" so DBD::Oracle's prepare() method sends the statement to the server and then asks the server to 'describe' it. (Without that you'd need a full sql parser available on the client side. Even then you wouldn't know the TYPE of the database fields without much more work.) > These next questions relate to the dbd_preparse() function in DBD::Pg. > > * DBD::Pg is doing a fair amount of work to work around SQL-style, > C-style, and C++-style comments in SQL statements. Is this necessary? > Are comments officially supported by the DBI spec? That's the wrong question. The DBI spec actually makes no assumptions about the syntax of the statements. But it does say that question marks should be used for placeholders. Thing is, users tend to get upset (quite reasonably) when a driver interprets question marks that are inside comments as real placeholders. Personally I'd agree with them that it's a driver, er, limitation (so I'm not over-happy with DBD::ODBC ducking the issue, but I understand that DBD::ODBC faces a much tougher issue here than most drivers). > I notice that > DBD::ODBC, for example, doesn't appear to do anything special with > regard to comments. And if comments *are* supported by DBI, and DBD::Pg > is doing the right thing by watching out for them, wouldn't it be > smarter for DBD::Pg to cut them *out* of its internal copy of the SQL > statement so that dbd_st_execute() doesn't also have to deal with them? Why should dbd_st_execute have to "deal" with them? The whole string should just be sent off to the server. > * Dumb question: Can someone explain to me what's happening with the > literal stuff? What is the parse doing, exactly? I have a general idea, > but some of it is a bit opaque to my Perl-trained brain. :-) Same issue as comments. Consider this statement: SELECT * FROM table WHERE a = ? AND b = '?' /* is this right? */ How many question marks are there? Three. And how many placeholders? One. > These questions related to the dbd_st_execute() function in DBD::Pg. > > * If the answer to the last question is "no", then in dbd_st_execute(), > where DBD::Pg sets the number of fields, (sh|c)ouldn't this be done > only once per prepared statement? I mean, no matter how many times a > prepared statement executes, its number of fields won't change, will it? If it's possible for you to _realiably_ determine the number of field at prepare() time, then you should do it. If not, then do it on just the *first* execute(). > * With regard to the svp and phs variables, which get their values like > this: > > svp = hv_fetch(imp_sth->all_params_hv, namebuf, i, 0); > phs = (phs_t*)(void*)SvPVX(*svp); > > What are these variables, and where do they come from? Your dbd_preparse function set them up as it recognized the placeholders. > I realize I'm > showing my ignorance of Perl internals here, but I'm also trying to > understand whether these variables are retrieving metadata from the > PostgreSQL server. I understand that phs->ftype is checked to see if > the value needs to be quoted, but I'm not sure where phs->ftype comes > from. phs stands for placeholder structure. The structure (typedef) is declared in one of the driver *.h files. There's one per placeholder and they're store in the hash pointed to by imp_sth->all_params_hv. All that's done by dbd_preparse() as it scans the statement string. Tim.
On Wed, 20 Nov 2002, Tim Bunce wrote: > > I notice that > > DBD::ODBC, for example, doesn't appear to do anything special with > > regard to comments. And if comments *are* supported by DBI, and DBD::Pg > > is doing the right thing by watching out for them, wouldn't it be > > smarter for DBD::Pg to cut them *out* of its internal copy of the SQL > > statement so that dbd_st_execute() doesn't also have to deal with them? > > Why should dbd_st_execute have to "deal" with them? The whole string > should just be sent off to the server. > PostgreSQL itself does not (yet) support placeholders, so DBD::Pg emulates them by scanning the string again in dbd_st_execute() looking for placeholders and substituting them quoted values of what was passed to execute(). or maybe more clearly in pseudo code: preparse { look for :pn :0 ? type place holders. replace with :pn type placehoders. } execute { look for :pn type placeholders replace with for quote(valueof(:pn)) send to db. } -r
On Wed, Nov 20, 2002 at 08:58:59AM -0500, Rudy Lippan wrote: > On Wed, 20 Nov 2002, Tim Bunce wrote: > > > > I notice that > > > DBD::ODBC, for example, doesn't appear to do anything special with > > > regard to comments. And if comments *are* supported by DBI, and DBD::Pg > > > is doing the right thing by watching out for them, wouldn't it be > > > smarter for DBD::Pg to cut them *out* of its internal copy of the SQL > > > statement so that dbd_st_execute() doesn't also have to deal with them? > > > > Why should dbd_st_execute have to "deal" with them? The whole string > > should just be sent off to the server. > > PostgreSQL itself does not (yet) support placeholders, so DBD::Pg emulates > them by scanning the string again in dbd_st_execute() looking for > placeholders and substituting them quoted values of what was passed to > execute(). Ah, okay. The goal is always to reduce the work that execute() does as far as possible. I'd suggest that the preparse code generate a list of the 'fragments' of SQL between the placeholders. All execute has to do is effectively join the parts together. But it's probably not worth worrying about. The cost of the scan is tiny and would be dwarfed by other issues like how you handle memory allocation (like, measure all fragments and placeholder values first then allocate and merge, or concatenate each part into an SV). Tim.
On Wed, Nov 20, 2002 at 07:02:12PM -0800, David Wheeler wrote: > On Wednesday, November 20, 2002, at 07:02 AM, Tim Bunce wrote: > > Hrm, yes, good point. If I end up using PostgtreSQL 7.3's server-side > prepare, though, I'll likely to something similar to this, though, as > the execute syntax is (roughly): > > EXECUTE prepared_statement_name('param1', 'param2', 'paramn') > > So I would just have to append the escaped parameters in turn to > "EXECUTE prepared_statement_name(" plus a closing ")". > > I need to think about how to create a prepared statement name, though. > Some sort of hash (like MD5) would be good, but that might incur too > much overhead. Ideas? MD5 is pretty quick and the base64 result string is short. But if prepared_statement_name's are local to the session then you could just use a sequential integer. When a statement handle is destroyed then you can tell the server to discard the statement. You could also let the application specify a name: $dbh->prepare("...", { pg_sth_name => "..." }); Tim.
On Wed, Nov 20, 2002 at 06:03:00PM -0800, David Wheeler wrote: > First, I want to thank everyone who has responded to my posts. I know > that they're chock full 'o newbie questions, and I *really* appreciate > the patient explanations that all you folks are taking the time to send > to me. I *really* appreciate it. The only side affects are that I now > have to spend much more time writing and responding to emails, and that > I'm much more motivated to spend the time getting to know the code so > that I have the knowledge to create an effective update. I couldn't > have done it without you all. Thanks! No problem. You're questions are good and we're probably all learning something. > On Wednesday, November 20, 2002, at 02:49 AM, Tim Bunce wrote: > > >Drivers should rarely if ever print anything below trace level 3. > >Quite a few drivers get this wrong and it can be quite annoying to > >the users to have to wade through lots of driver output when all > >they want is the basic DBI level 1 or 2 trace. Use levels around > >4 through 7 to add more (obscure) detail. > > So, 4 for more informative stuff, and down through 7 to report, say, > every function executed? Yeap. (DBD::Oracle uses 9 for that now but I may change it.) Tim.