Re: A PGsendQuery API design for your review

Поиск
Список
Период
Сортировка
От	Karl O. Pinc
Тема	Re: A PGsendQuery API design for your review
Дата	5 февраля 2016 г. 07:45:06
Msg-id	20160205014455.65ce869b@slate.meme.com обсуждение исходный текст
Ответ на	Re: A PGsendQuery API design for your review (Federico Di Gregorio <fog@dndg.it>)
Список	psycopg
Дерево обсуждения
On Wed, 3 Feb 2016 09:15:52 +0100
Federico Di Gregorio <fog@dndg.it> wrote:

> On 02/02/16 21:04, Karl O. Pinc wrote:
> > On Tue, 02 Feb 2016 19:58:16 +0100
> > Federico Di Gregorii <fog@dndg.it> wrote:
> >
> >> I can say with some confidence that we'll never implement such an
> >> API. Even the
> >> ux, i.e., accepting multi-queries from the user is wrong, imho. Let
> >> the user
> >> write single queries, organize them, exrcute them using bound
> >> variables and NOT
> >> mogrify() and then return the result. Anything less is lazy.

Dear Federico,

Thanks for your reply.

> I never said that your users are lazy.

I'm sorry about my crack about lazy users, it was a joke.  There was
something about your statement which bothered me and prompted a
joking response.  I've figured out what it was.

Your mention of mongrify() made me think that your concern was that my
proposed API was too dangerous.  That people could not be trusted with
it.  Such a notion bothers me.  Naturally, an API should not go out of
it's way to encourage mis-use.  But when an underlying functionality
exists, even though it could be mis-used, I believe that it is overly
paternalistic to keep that functionality from the programmer solely
because it can be mis-used.

Of course there are other legitimate reasons for limiting the scope of
a set of APIs.  And reasons for inclusion.  Since psycopg2's purpose
is to expose the libpq API to Python users I'd hope that one of the
project's goals would be to make _all_ of the functionality of libpq
available in Python; and that this would weigh in on the side of
adding an API as proposed.

> I said your design of the UX
> was lazy. Let me explain: if you want to provide everything you
> describe in the rest of your email, then splitting SQL in multiple
> queries is the least of your problems (see below). What you really
> want is an interactive environment with multiple "data pads", each
> asociated to a query, that user can easily compare, save, restore and
> so on.

You should know up-front that we have, and have had, a user interface.
It takes the form of a modified phpPgAdmin.  They have, rightly IMO,
rejected introducing an SQL parser into their code base.  Moreover
phpPgAdmin is starting to show it's age and is no longer well
maintained.  Someday we'll have to replace our interface and that's
what I'm looking toward.

Sure.  Multiple "data pads" would be nice.  I must disagree that each
would be associated with a single query/statement.  Each must be
associated with multiple SQL statements, if not queries, at minimum.
The users often begin a transaction, run various statements which may
create temporary tables and use them to update data in real tables --
at minimum alter data in real tables without first creating temporary
tables, and then run one or more queries that test the result.
Finally they roll the transaction back.  When their SQL is working as
desired they change the rollback to a commit and re-execute.

Naturally, more than one table is often updated within a single
transaction so there is more than one insert/update statement
executed.

Sometimes the process can take days before the final SQL is settled
upon.  Or the resulting SQL re-executed periodically.  No fancy
interface is required in these cases.  Cut and paste of SQL between
"data pad" and a text document is a simple and universal interface.
And it gives the user the ultimate in power and flexibility when it
comes to archiving, documenting data flow, dissemination of SQL,
comparison between past and present, etc.

One "data pad" is good enough, so long as it accepts multiple
statements.  Without multiple statements you can't try data
modifications out within transactions and report on the result unless,
critically, you make permanent potentially erroneous modifications to
your data.

The user experience is greatly enhanced if "data pads" accept multiple
queries, not just multiple statements.  Otherwise, in order to see
more than one aspect of a series of proposed data updates, the user
would have to do something like copy those portions of the SQL work
flow that update data from "data pad" to "data pad" in order to, in
the end, have executed multiple queries to investigate various aspects
of the result of the changes to db content.  The user should not have
to think about such issues.

It's also helpful to the analytic process to be able to drop
"debugging" queries into the middle of a bunch of data updates so as
to understand the effect of each update.  In fact this happens all the
time.

> You want interactive SQL completion and that's much much more
> difficult than splitting a query.

Interactive SQL completion might be nice, but only in that it saves a
bit of typing.  People got along fine for years without completion in
shell and my users get along fine without SQL completion.  If it was
free, sure.  But who's going to pay to do the difficult job of
programming SQL completion?  (And if you were going to implement such
a feature, shouldn't it be part of Postgres so as to both leverage
existing code and be available to everybody?)

Likewise the whole pointy-clicky business of constructing SQL by way
of GUI might be nice when one is just getting started with SQL.  After
that, typing is a whole lot faster.  Graphical SQL construction
becomes a giant annoyance unless it can be entirely ignored.

> Just sending a multi-query to the
> database, leveraging your proposed API, won't help your users much.

I have to disagree.  It's extremely useful to be able to send an
arbitrary mix of SQL to the server and see the effect of every
statement.  Not just the output of the last query.  Imagine a
programming IDE with the limitations you propose.

I hope this better explains why a truly generic SQL execution
environment is a critical UI feature.

What really matters is what my users think.  I can assure you the
verdict is in.  Submitting multiple SQL statements, sometimes
containing multiple queries, and getting both counts and query results
back from every statement is a nearly indispensable part of their user
interface.

> Now, how to split a query? It's easy: you split at ";" unless inside
> a text literal. Parsing text literals in SQL is incredibly easy: just
> count the number of "'" you encounter: if odd then you're inside an
> literal and you ignore the ";". PostgreSQL also has E'' literals but
> they are just a little bit more complex - you have to check for "\;".
> That's all. Now, parsing the SQL to implement completion, that's
> where things get interesting... :)

Actually, this is what we've been doing.  Although it's not as simple
as you think.  There is also dollar quoting.  And there's the hassle
of having to go back and revisit the code.  We've had to do this
twice.  Once for the 'E' literals and once for dollar quoting.  (We've
had this interface going for a long time.  Although not in Python.)

Not to mention that upstream wants nothing to do with putting a parser
in their code.  This greatly increases the hassle when upgrading to a
new upstream version.  At the same time we've been loath to produce an
entirely custom interface when upstream has 98% of what we need.

I've tried parsing.  It's time to try something different.

> TL;DR. Why add an API that nobody will use when what you need will
> probably cost you just a couple of hours of coding?

Yours is an excellent argument.  Let me address both aspects of your
question.  First the approach our project takes to programming and
where we'd like to invest our couple of hours of programming.  And
second the larger issue of why psycopg2 would want such an API.

A guiding principle of ours is to use toolsets and interfaces that are
as generic as possible, and obtained from 3rd parties whenever
possible.  The amount of work dedicated to user interface is to be
minimized.  The programming budget is limited and is to be directed to
essential functionality.  User interfaces are expensive.  As soon
as a program interacts with a human 90% of the programmer's work goes
toward interaction and error handling and only about 10% goes to code
that actually manipulates data and it's state.

So, we presently use a 3rd party database admin tool.  Someday we may
be forced to supplement this.

Our users are sophisticated enough to eschew bling.  The Amboseli
Baboon project's been going since 1974, switched to a "real" database
from ad-hoc SPSS in a punchcard format around 1992, and have been
creatively adapting the latest technology of every sort into all
aspects their work in a fashion geared toward maximal return on
investment since the beginning.  When it comes to databases they want
an interface that delivers power, and know what that means.

I'd love to spend more time and make a more elegant user interface.
My user's priorities are elsewhere.

So, yes.  I am lazy. Libpq provides access to a parser through
PQsendQuery() and friends.  Why should I (continuously) re-invent the
wheel?  If you don't want to incorporate the suggested APIs I will
probably try subclassing psycopg2.extensions.cursor().  This seems
even easier than writing an SQL parser.  I'm pretty sure this would
work, although I'm also pretty sure it'd have ugly-ish bits that need
to know the internals of it's super()s.  Although this would be sad, I
think I'd prefer keeping up with psycopg2's internals to keeping up
with changes to PG's SQL syntax.  I won't know until I look.

As to why psycopg2 should add the proposed API when "nobody will use
it":

There is the argument, at top, regards psycopg2 providing an interface
that exposes all the functionality available within libpq.  Clearly
libpq makes it possible to create the user interface I desire.
(Whether or not you believe that such a UI should exist.)  Using
PQsendQuery()/PQgetResult() is a way to create such a UI.  (See this
thread, where consensus seems to be that, in C at least, parsing SQL
is undesirable and PQsendQuery()/PQgetResult() is the way to replicate
psql's acceptance of multiple SQL queries:
http://marc.info/?l=postgresql-general&m=134877413732485&w=2)

As to "nobody will use it", this depends on how you count.  I do
believe that you are right in that very few programs will ever provide
generic SQL "data pads"; interfaces allowing submission of arbitrary
SQL.  However very, very many _people_ do use an interface which
allows submission of multiple arbitrary SQL statements.  Every "all
purpose" database administration tool has such an interface.
phpPgAdmin, PgAdmin, adminer editor are the PG examples.  Not to
mention the various MySQL admin interfaces.  And of course psql.

The only question is whether or not the interface violates the
principle of least surprise by displaying a single result when supplied
with multiple SQL queries.  Or whether it does something more
sophisticated, and in my opinion much more useful, and behaves as psql
does.

I believe that psycopg2 would be a good choice if one was to write a
generic admin interface to PG in Python.  If so, as psycopg2 exists
now, such a Python program would not be able to replicate psql's
behavior when piped an arbitrary collection of SQL statements.  Not,
at least, unless the author implemented and maintained yet another SQL
parser for PG per your suggestion.

Perhaps the world already has enough generic database admin tools, or
functional subsets thereof.  It doesn't need another written in
Python.  But is that for psycopg2 to judge?

In any case I believe that there are other legitimate use-cases for
the proposed API, although it may not be used often.  Reporting is one
possible use, and other sorts of batch-related processing.  I don't
have vast experience to back this up with specifics but I believe that
multiple SQL statements are piped to psql on a not-uncommon basis.
Surely some of these rely on psql's ability to output a result
per-statement.  Why shouldn't there be Python programs that support
this functionality?

> Hope this clarifies why I don't like your proposal.

Yes.  Thanks very much.  It does help me understand your thoughts and
I appreciate you spending your time.

From another standpoint though, no.  I don't understand at all.  Why
should psycopg2 be passing judgment on what user interface is
appropriate for interacting with a PG database?  It's fine to reject
APIs that just won't get used, but I question whether it's
possible to know that a libpq feature won't be useful in unanticipated
ways once made available via Python.  It seems only natural to
expose all of libpq's functionality.  Picking and choosing ways
of exposing features is one thing.  Deciding that a libpq feature
just shouldn't be available is something else.


Something that's not been discussed at all is the proposed notion of
eliminating client-side buffering.  This is trading CPU on one side
for reduced memory consumption and lower startup latency when
returning query results on the other.  It matters when result sets are
large.  As it is now, when a query produces a million rows of results
all these rows are sent to the client and buffered before libpq
delivers any rows to the application.  (Absent server-side cursors,
et-al.)

Of course you could extend cursor.execute() in place of using the
proposed API.  I thought the "no buffering" feature would be useful.
It would likely involve more changes to existing code than the ability
to execute multiple queries, which mostly would require new code.
Since "no buffering" is useful when executing queries, and both new
features require a PQsendQuery() call, and the existence of a "no
buffering" feature would be appreciated by my users, I figured I'd
build the "no buffering" feature into the new API proposals.

Thanks for listening.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

P.S.  We call "data pads" "SQL windows".
В списке psycopg по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: A PGsendQuery API design for your review