Re: A PGsendQuery API design for your review
От | Karl O. Pinc |
---|---|
Тема | Re: A PGsendQuery API design for your review |
Дата | |
Msg-id | 20160205014455.65ce869b@slate.meme.com обсуждение исходный текст |
Ответ на | Re: A PGsendQuery API design for your review (Federico Di Gregorio <fog@dndg.it>) |
Список | psycopg |
On Wed, 3 Feb 2016 09:15:52 +0100 Federico Di Gregorio <fog@dndg.it> wrote: > On 02/02/16 21:04, Karl O. Pinc wrote: > > On Tue, 02 Feb 2016 19:58:16 +0100 > > Federico Di Gregorii <fog@dndg.it> wrote: > > > >> I can say with some confidence that we'll never implement such an > >> API. Even the > >> ux, i.e., accepting multi-queries from the user is wrong, imho. Let > >> the user > >> write single queries, organize them, exrcute them using bound > >> variables and NOT > >> mogrify() and then return the result. Anything less is lazy. Dear Federico, Thanks for your reply. > I never said that your users are lazy. I'm sorry about my crack about lazy users, it was a joke. There was something about your statement which bothered me and prompted a joking response. I've figured out what it was. Your mention of mongrify() made me think that your concern was that my proposed API was too dangerous. That people could not be trusted with it. Such a notion bothers me. Naturally, an API should not go out of it's way to encourage mis-use. But when an underlying functionality exists, even though it could be mis-used, I believe that it is overly paternalistic to keep that functionality from the programmer solely because it can be mis-used. Of course there are other legitimate reasons for limiting the scope of a set of APIs. And reasons for inclusion. Since psycopg2's purpose is to expose the libpq API to Python users I'd hope that one of the project's goals would be to make _all_ of the functionality of libpq available in Python; and that this would weigh in on the side of adding an API as proposed. > I said your design of the UX > was lazy. Let me explain: if you want to provide everything you > describe in the rest of your email, then splitting SQL in multiple > queries is the least of your problems (see below). What you really > want is an interactive environment with multiple "data pads", each > asociated to a query, that user can easily compare, save, restore and > so on. You should know up-front that we have, and have had, a user interface. It takes the form of a modified phpPgAdmin. They have, rightly IMO, rejected introducing an SQL parser into their code base. Moreover phpPgAdmin is starting to show it's age and is no longer well maintained. Someday we'll have to replace our interface and that's what I'm looking toward. Sure. Multiple "data pads" would be nice. I must disagree that each would be associated with a single query/statement. Each must be associated with multiple SQL statements, if not queries, at minimum. The users often begin a transaction, run various statements which may create temporary tables and use them to update data in real tables -- at minimum alter data in real tables without first creating temporary tables, and then run one or more queries that test the result. Finally they roll the transaction back. When their SQL is working as desired they change the rollback to a commit and re-execute. Naturally, more than one table is often updated within a single transaction so there is more than one insert/update statement executed. Sometimes the process can take days before the final SQL is settled upon. Or the resulting SQL re-executed periodically. No fancy interface is required in these cases. Cut and paste of SQL between "data pad" and a text document is a simple and universal interface. And it gives the user the ultimate in power and flexibility when it comes to archiving, documenting data flow, dissemination of SQL, comparison between past and present, etc. One "data pad" is good enough, so long as it accepts multiple statements. Without multiple statements you can't try data modifications out within transactions and report on the result unless, critically, you make permanent potentially erroneous modifications to your data. The user experience is greatly enhanced if "data pads" accept multiple queries, not just multiple statements. Otherwise, in order to see more than one aspect of a series of proposed data updates, the user would have to do something like copy those portions of the SQL work flow that update data from "data pad" to "data pad" in order to, in the end, have executed multiple queries to investigate various aspects of the result of the changes to db content. The user should not have to think about such issues. It's also helpful to the analytic process to be able to drop "debugging" queries into the middle of a bunch of data updates so as to understand the effect of each update. In fact this happens all the time. > You want interactive SQL completion and that's much much more > difficult than splitting a query. Interactive SQL completion might be nice, but only in that it saves a bit of typing. People got along fine for years without completion in shell and my users get along fine without SQL completion. If it was free, sure. But who's going to pay to do the difficult job of programming SQL completion? (And if you were going to implement such a feature, shouldn't it be part of Postgres so as to both leverage existing code and be available to everybody?) Likewise the whole pointy-clicky business of constructing SQL by way of GUI might be nice when one is just getting started with SQL. After that, typing is a whole lot faster. Graphical SQL construction becomes a giant annoyance unless it can be entirely ignored. > Just sending a multi-query to the > database, leveraging your proposed API, won't help your users much. I have to disagree. It's extremely useful to be able to send an arbitrary mix of SQL to the server and see the effect of every statement. Not just the output of the last query. Imagine a programming IDE with the limitations you propose. I hope this better explains why a truly generic SQL execution environment is a critical UI feature. What really matters is what my users think. I can assure you the verdict is in. Submitting multiple SQL statements, sometimes containing multiple queries, and getting both counts and query results back from every statement is a nearly indispensable part of their user interface. > Now, how to split a query? It's easy: you split at ";" unless inside > a text literal. Parsing text literals in SQL is incredibly easy: just > count the number of "'" you encounter: if odd then you're inside an > literal and you ignore the ";". PostgreSQL also has E'' literals but > they are just a little bit more complex - you have to check for "\;". > That's all. Now, parsing the SQL to implement completion, that's > where things get interesting... :) Actually, this is what we've been doing. Although it's not as simple as you think. There is also dollar quoting. And there's the hassle of having to go back and revisit the code. We've had to do this twice. Once for the 'E' literals and once for dollar quoting. (We've had this interface going for a long time. Although not in Python.) Not to mention that upstream wants nothing to do with putting a parser in their code. This greatly increases the hassle when upgrading to a new upstream version. At the same time we've been loath to produce an entirely custom interface when upstream has 98% of what we need. I've tried parsing. It's time to try something different. > TL;DR. Why add an API that nobody will use when what you need will > probably cost you just a couple of hours of coding? Yours is an excellent argument. Let me address both aspects of your question. First the approach our project takes to programming and where we'd like to invest our couple of hours of programming. And second the larger issue of why psycopg2 would want such an API. A guiding principle of ours is to use toolsets and interfaces that are as generic as possible, and obtained from 3rd parties whenever possible. The amount of work dedicated to user interface is to be minimized. The programming budget is limited and is to be directed to essential functionality. User interfaces are expensive. As soon as a program interacts with a human 90% of the programmer's work goes toward interaction and error handling and only about 10% goes to code that actually manipulates data and it's state. So, we presently use a 3rd party database admin tool. Someday we may be forced to supplement this. Our users are sophisticated enough to eschew bling. The Amboseli Baboon project's been going since 1974, switched to a "real" database from ad-hoc SPSS in a punchcard format around 1992, and have been creatively adapting the latest technology of every sort into all aspects their work in a fashion geared toward maximal return on investment since the beginning. When it comes to databases they want an interface that delivers power, and know what that means. I'd love to spend more time and make a more elegant user interface. My user's priorities are elsewhere. So, yes. I am lazy. Libpq provides access to a parser through PQsendQuery() and friends. Why should I (continuously) re-invent the wheel? If you don't want to incorporate the suggested APIs I will probably try subclassing psycopg2.extensions.cursor(). This seems even easier than writing an SQL parser. I'm pretty sure this would work, although I'm also pretty sure it'd have ugly-ish bits that need to know the internals of it's super()s. Although this would be sad, I think I'd prefer keeping up with psycopg2's internals to keeping up with changes to PG's SQL syntax. I won't know until I look. As to why psycopg2 should add the proposed API when "nobody will use it": There is the argument, at top, regards psycopg2 providing an interface that exposes all the functionality available within libpq. Clearly libpq makes it possible to create the user interface I desire. (Whether or not you believe that such a UI should exist.) Using PQsendQuery()/PQgetResult() is a way to create such a UI. (See this thread, where consensus seems to be that, in C at least, parsing SQL is undesirable and PQsendQuery()/PQgetResult() is the way to replicate psql's acceptance of multiple SQL queries: http://marc.info/?l=postgresql-general&m=134877413732485&w=2) As to "nobody will use it", this depends on how you count. I do believe that you are right in that very few programs will ever provide generic SQL "data pads"; interfaces allowing submission of arbitrary SQL. However very, very many _people_ do use an interface which allows submission of multiple arbitrary SQL statements. Every "all purpose" database administration tool has such an interface. phpPgAdmin, PgAdmin, adminer editor are the PG examples. Not to mention the various MySQL admin interfaces. And of course psql. The only question is whether or not the interface violates the principle of least surprise by displaying a single result when supplied with multiple SQL queries. Or whether it does something more sophisticated, and in my opinion much more useful, and behaves as psql does. I believe that psycopg2 would be a good choice if one was to write a generic admin interface to PG in Python. If so, as psycopg2 exists now, such a Python program would not be able to replicate psql's behavior when piped an arbitrary collection of SQL statements. Not, at least, unless the author implemented and maintained yet another SQL parser for PG per your suggestion. Perhaps the world already has enough generic database admin tools, or functional subsets thereof. It doesn't need another written in Python. But is that for psycopg2 to judge? In any case I believe that there are other legitimate use-cases for the proposed API, although it may not be used often. Reporting is one possible use, and other sorts of batch-related processing. I don't have vast experience to back this up with specifics but I believe that multiple SQL statements are piped to psql on a not-uncommon basis. Surely some of these rely on psql's ability to output a result per-statement. Why shouldn't there be Python programs that support this functionality? > Hope this clarifies why I don't like your proposal. Yes. Thanks very much. It does help me understand your thoughts and I appreciate you spending your time. From another standpoint though, no. I don't understand at all. Why should psycopg2 be passing judgment on what user interface is appropriate for interacting with a PG database? It's fine to reject APIs that just won't get used, but I question whether it's possible to know that a libpq feature won't be useful in unanticipated ways once made available via Python. It seems only natural to expose all of libpq's functionality. Picking and choosing ways of exposing features is one thing. Deciding that a libpq feature just shouldn't be available is something else. Something that's not been discussed at all is the proposed notion of eliminating client-side buffering. This is trading CPU on one side for reduced memory consumption and lower startup latency when returning query results on the other. It matters when result sets are large. As it is now, when a query produces a million rows of results all these rows are sent to the client and buffered before libpq delivers any rows to the application. (Absent server-side cursors, et-al.) Of course you could extend cursor.execute() in place of using the proposed API. I thought the "no buffering" feature would be useful. It would likely involve more changes to existing code than the ability to execute multiple queries, which mostly would require new code. Since "no buffering" is useful when executing queries, and both new features require a PQsendQuery() call, and the existence of a "no buffering" feature would be appreciated by my users, I figured I'd build the "no buffering" feature into the new API proposals. Thanks for listening. Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein P.S. We call "data pads" "SQL windows".
В списке psycopg по дате отправления: