Обсуждение: A PGsendQuery API design for your review

Поиск
Список
Период
Сортировка

A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
Hi,

I've got users who write and submit arbitrary sequences of SQL
to the server, as supported by psql, phpPgAdmin, pgAdmin, etc.
I'd like to see this sort of functionality in psycopg2.

Someday I may submit a code patch.  In the meantime I'd
appreciate any review of an API design that supports
such functionality.

(Attached is a patch to the docs if you'd prefer to look
at it that way.)

Thanks for the work.

----------------------------------------------------

The "cursor" class
******************
<snip>
   -[ Commands execution methods ]-
<snip>
   executequery([buffered=True])

      Execute the next database operation (query or command) submitted
      to the server with "submitqueries()".

      "buffered" determines whether all query output is collected in a
      buffer by the client before delivery to the application or
      whether results are returned to the application as they are
      received from the server.  (The "cursor.execute()" and
      "cursor.executemany()" methods always buffer.)

      "True" is returned if a database operation was executed. "None"
      is returned if all submitted database operations have been
      executed.

      If a query was executed, the returned values can be retrieved
      using "fetch*()" methods.

      DB API extension: The "executequery()" method is a Psycopg
      extension to the DB API 2.0.

<snip>
   submitqueries(operations)

      Submit a string containing one or more database operations
      (queries or commands) to the database server for execution.

      Warning: Never, **never**, **NEVER** use Python string
        concatenation ("+") or string parameters interpolation ("%")
        to pass variables to a SQL query string.  See Passing
        parameters to SQL queries.  Always use the "mogrify()" method
        to construct the "operations" string if the SQL contains
        variable data.

      This function is primarily useful in the context of executing
      large reports, pre-packaged collections of SQL reports, and
      arbitrary user-supplied SQL.

      Use the "executequery()" method to execute the submitted SQL.

      DB API extension: The "submitqueries()" method is a Psycopg
      extension to the DB API 2.0.
<snip>

Regards,


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

Вложения

Re: A PGsendQuery API design for your review

От
Christophe Pettus
Дата:
On Feb 2, 2016, at 9:49 AM, "Karl O. Pinc" <kop@meme.com> wrote:

> I've got users who write and submit arbitrary sequences of SQL
> to the server, as supported by psql, phpPgAdmin, pgAdmin, etc.

Note that those are all applications, rather than interface libraries.

> I'd like to see this sort of functionality in psycopg2.

The client of psycopg2 can do this already, of course.
--
-- Christophe Pettus
   xof@thebuild.com



Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 09:52:33 -0800
Christophe Pettus <xof@thebuild.com> wrote:

>
> On Feb 2, 2016, at 9:49 AM, "Karl O. Pinc" <kop@meme.com> wrote:
>
> > I've got users who write and submit arbitrary sequences of SQL
> > to the server, as supported by psql, phpPgAdmin, pgAdmin, etc.
>
> Note that those are all applications, rather than interface libraries.

Yes.  And I take it back.  phpPgAdmin does not support the
submission of arbitrary sequences of SQL.  Or rather it does,
but you can only get the results of the last query.  This is
due to a limitation in the PHP PG API.

>
> > I'd like to see this sort of functionality in psycopg2.
>
> The client of psycopg2 can do this already, of course.

How?  In order to submit multiple statements to libpq
and get back results for all of them PGsendQuery() must
be called.

cursor.execute() will return results only from the last
query executed, like calling PGexec().  Or am I wrong?

Meanwhile, I don't see PGsetSingleRowMode() called anywhere
in psycopg2.  I figured as long as I was making an API
I'd provide for a way to get rid of the client-side buffering.
I figure that's a useful feature.

Regards,


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Christophe Pettus
Дата:
On Feb 2, 2016, at 10:04 AM, Karl O. Pinc <kop@meme.com> wrote:

> OHow?  In order to submit multiple statements to libpq
> and get back results for all of them PGsendQuery() must
> be called.

The same way the other clients do it; split the text into queries and send them over.

--
-- Christophe Pettus
   xof@thebuild.com



Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 10:15:40 -0800
Christophe Pettus <xof@thebuild.com> wrote:

>
> On Feb 2, 2016, at 10:04 AM, Karl O. Pinc <kop@meme.com> wrote:
>
> > OHow?  In order to submit multiple statements to libpq
> > and get back results for all of them PGsendQuery() must
> > be called.
>
> The same way the other clients do it; split the text into queries and
> send them over.

That requires my application contain an SQL parser.  This seems
onerous.  The whole point is that I'm getting unfiltered SQL directly
from a user.  Multiple statements.  "Splitting the text into queries"
is non-trivial.

I can call libpq directly.  First PQsendQuery(), then repeatedly
call PQgetResult() (optionally calling PQgetSingleRowMode().
But I like the psycopg2 API.  Seems like it ought to be able
to make these calls for me.

By the by.  The other clients probably don't split the text
into queries.  psql does, but to do it it uses the SQL
parser internal to pg.  phpPgAdmin does not.  And I don't
know what pgAdmin does.  My point in mentioning these interfaces
is that they all allow the user to submit arbitrary sql.
And the only one that does it "right" is psql, and it's
got "special access" to a SQL parser.

psql is the only client
I know of that delivers query results when multiple queries
are supplied (on stdin, say).  I want to be able to write
a similar application.

(Thanks for taking your time to look at this by the way.)

(And, oops.  The "buffered" argument must be to the
submitquery() method in my design.)

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Federico Di Gregorii
Дата:

Hi,

I can say with some confidence that we'll never implement such an API. Even the ux, i.e., accepting multi-queries from the user is wrong, imho. Let the user write single queries, organize them, exrcute them using bound variables and NOT mogrify() and then return the result. Anything less is lazy.

federico

On mar, feb 02, 2016 at 6:50 PM, Karl O. Pinc <kop@meme.com> wrote:

Hi,

I've got users who write and submit arbitrary sequences of SQL
to the server, as supported by psql, phpPgAdmin, pgAdmin, etc.
I'd like to see this sort of functionality in psycopg2.

Someday I may submit a code patch.  In the meantime I'd
appreciate any review of an API design that supports
such functionality.

(Attached is a patch to the docs if you'd prefer to look
at it that way.)

Thanks for the work.

----------------------------------------------------

The "cursor" class
******************
<snip>
   -[ Commands execution methods ]-
<snip>
   executequery([buffered=True])

      Execute the next database operation (query or command) submitted
      to the server with "submitqueries()".

      "buffered" determines whether all query output is collected in a
      buffer by the client before delivery to the application or
      whether results are returned to the application as they are
      received from the server.  (The "cursor.execute()" and
      "cursor.executemany()" methods always buffer.)

      "True" is returned if a database operation was executed. "None"
      is returned if all submitted database operations have been
      executed.

      If a query was executed, the returned values can be retrieved
      using "fetch*()" methods.

      DB API extension: The "executequery()" method is a Psycopg
      extension to the DB API 2.0.

<snip>
   submitqueries(operations)

      Submit a string containing one or more database operations
      (queries or commands) to the database server for execution.

      Warning: Never, **never**, **NEVER** use Python string
        concatenation ("+") or string parameters interpolation ("%")
        to pass variables to a SQL query string.  See Passing
        parameters to SQL queries.  Always use the "mogrify()" method
        to construct the "operations" string if the SQL contains
        variable data.

      This function is primarily useful in the context of executing
      large reports, pre-packaged collections of SQL reports, and
      arbitrary user-supplied SQL.

      Use the "executequery()" method to execute the submitted SQL.

      DB API extension: The "submitqueries()" method is a Psycopg
      extension to the DB API 2.0.
<snip>

Regards,


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg

Re: A PGsendQuery API design for your review

От
Adrian Klaver
Дата:
On 02/02/2016 10:34 AM, Karl O. Pinc wrote:
> On Tue, 2 Feb 2016 10:15:40 -0800
> Christophe Pettus <xof@thebuild.com> wrote:
>
>>
>> On Feb 2, 2016, at 10:04 AM, Karl O. Pinc <kop@meme.com> wrote:
>>
>>> OHow?  In order to submit multiple statements to libpq
>>> and get back results for all of them PGsendQuery() must
>>> be called.
>>
>> The same way the other clients do it; split the text into queries and
>> send them over.
>
> That requires my application contain an SQL parser.  This seems
> onerous.  The whole point is that I'm getting unfiltered SQL directly
> from a user.  Multiple statements.  "Splitting the text into queries"
> is non-trivial.

How about using the statement terminator and doing:

sql_str = 'select * from cell_per; select count(*) from cell_per;'

sql_str.split(';')
['select * from cell_per', ' select count(*) from cell_per', '']

for sql in sql_str.split(';'):
     if sql:
         cur.execute(sql)
         rs = cur.fetchall()
         print rs

>
> I can call libpq directly.  First PQsendQuery(), then repeatedly
> call PQgetResult() (optionally calling PQgetSingleRowMode().
> But I like the psycopg2 API.  Seems like it ought to be able
> to make these calls for me.
>
> By the by.  The other clients probably don't split the text
> into queries.  psql does, but to do it it uses the SQL
> parser internal to pg.  phpPgAdmin does not.  And I don't
> know what pgAdmin does.  My point in mentioning these interfaces
> is that they all allow the user to submit arbitrary sql.
> And the only one that does it "right" is psql, and it's
> got "special access" to a SQL parser.
>
> psql is the only client
> I know of that delivers query results when multiple queries
> are supplied (on stdin, say).  I want to be able to write
> a similar application.
>
> (Thanks for taking your time to look at this by the way.)
>
> (And, oops.  The "buffered" argument must be to the
> submitquery() method in my design.)
>
> Regards,
>
> Karl <kop@meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: A PGsendQuery API design for your review

От
Christophe Pettus
Дата:
On Feb 2, 2016, at 11:01 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> sql_str = 'select * from cell_per; select count(*) from cell_per;'

sql_str = "select * from thing where a like ";"; select count(*) from thing;"

--
-- Christophe Pettus
   xof@thebuild.com



Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 02 Feb 2016 19:58:16 +0100
Federico Di Gregorii <fog@dndg.it> wrote:

> I can say with some confidence that we'll never implement such an
> API. Even the
> ux, i.e., accepting multi-queries from the user is wrong, imho. Let
> the user
> write single queries, organize them, exrcute them using bound
> variables and NOT
> mogrify() and then return the result. Anything less is lazy.

I'm don't think you've a complete picture of my users, what
they're doing, and why it is useful to submit multiple queries
at once.

I support these people:
http://amboselibaboons.nd.edu/
http://gombechimpanzees.org/

Who use interfaces like:
http://papio.biology.duke.edu/babasewiki/MyStartingPage
https://gombemi.ccas.gwu.edu/gombemiwiki/Main_Page

My users are scientists, from undergraduates to senior
researchers and American Academy of Science members, with a wide
variety of SQL expertise.  They have direct access to their
databases, writing SQL to explore their data.  They want a
simple, comprehensive interface.  (One that not only provides
support for SQL but also allows the unsophisticated to create
their own tables, etc.  Which is neither here nor there.)

They repeatedly try different queries and compare results,
iterating until they tease the data apart.  They might write a
query, run it, modify it, and then run both the modified and the
original version.  They want to see the results in a single
window so they can compare.  They want to be able to edit and
tweak either or both and hit a "resubmit" button and again review
the results.

Some are sophisticated SQL users; regardless most don't want to
think about SQL.  It's easier for them to cut and paste the same
query 3 times and tweak the SQL to get 3 results (adult males,
adult females, juveniles) than to try to write a single query
that does a GROUP BY.  But they do want to be able to see and
compare the results of the 3 queries.

They use SQL post-validate their data and check for sanity after
a bulk data upload.  They have a file containing from a few to
10s of SQL queries that they cut and paste into a "SQL window"
and run.  Usually many of the queries produce a single number as
output.  Then they examine the output and see if the numbers add
up.

They use sets of SELECT INTO queries to copy, or pre-join, or
summarize data in the master schema, moving it into their own
schema for further analysis.  They again use cut-and-paste to
replay these operations when there's new data from a fresh bulk
data upload.  (They likewise have full permissions, within their
own schemas, to create whatever tables they need to support their
analysis.)

Etc.

Of course everything _could_ be done by submitting a single query
at a time.  (Or an interface that requires such parsing by the
end-user, and then runs them in bulk.  Or some sort of arbitrary
file format that contains easily parse-able statement delimiters.
Etc.)  But an interface that allows submission of multiple
statements in a single go, and review of all the results at once,
greatly simplifies their work flow.  (And it's nice that their
interface has point-and-click table creation, deletion, etc.)

I'm surprised (Shocked I say!  Shocked! :-) that you'd want to
place arbitrary restrictions on the user interfaces that can be
created with psycopg2.  And Sir!  I must take offense at the
implication that my users are lazy.  They work hard to press the
semicolon key to end their statements!  ;-)

Seriously, the real world is complex.  If PG allows me to create
a user interface where the SQL parsing into separate statements
is done by PG then psycopg2 should let me do the same.  There's
a reason you can pipe multiple SQL statements to psql without
having to pre-parse them into separate statements.

If you really want to reject the idea of accepting multiple
queries from the user let me know and I'll not pursue the
matter.

Regards,


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 14:04:08 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

> On Tue, 02 Feb 2016 19:58:16 +0100
> Federico Di Gregorii <fog@dndg.it> wrote:
>
> > I can say with some confidence that we'll never implement such an
> > API. Even the
> > ux, i.e., accepting multi-queries from the user is wrong, imho.

> Seriously, the real world is complex.

(Reminds me of an interaction I had with the Mozilla people.
They'd changed Firefox so that a response had to show up
within 5 minutes.  My folks occasionally write queries
that run hours.  Mozilla changed their mind.  Hope you will
too.)

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Adrian Klaver
Дата:
On 02/02/2016 12:04 PM, Karl O. Pinc wrote:
> On Tue, 02 Feb 2016 19:58:16 +0100
> Federico Di Gregorii <fog@dndg.it> wrote:
>
>> I can say with some confidence that we'll never implement such an
>> API. Even the
>> ux, i.e., accepting multi-queries from the user is wrong, imho. Let
>> the user
>> write single queries, organize them, exrcute them using bound
>> variables and NOT
>> mogrify() and then return the result. Anything less is lazy.
>
> I'm don't think you've a complete picture of my users, what
> they're doing, and why it is useful to submit multiple queries
> at once.
>
> I support these people:
> http://amboselibaboons.nd.edu/
> http://gombechimpanzees.org/
>
> Who use interfaces like:
> http://papio.biology.duke.edu/babasewiki/MyStartingPage
> https://gombemi.ccas.gwu.edu/gombemiwiki/Main_Page
>
> My users are scientists, from undergraduates to senior
> researchers and American Academy of Science members, with a wide
> variety of SQL expertise.  They have direct access to their
> databases, writing SQL to explore their data.  They want a
> simple, comprehensive interface.  (One that not only provides
> support for SQL but also allows the unsophisticated to create
> their own tables, etc.  Which is neither here nor there.)
>
> They repeatedly try different queries and compare results,
> iterating until they tease the data apart.  They might write a
> query, run it, modify it, and then run both the modified and the
> original version.  They want to see the results in a single
> window so they can compare.  They want to be able to edit and
> tweak either or both and hit a "resubmit" button and again review
> the results.
>
> Some are sophisticated SQL users; regardless most don't want to
> think about SQL.  It's easier for them to cut and paste the same
> query 3 times and tweak the SQL to get 3 results (adult males,
> adult females, juveniles) than to try to write a single query
> that does a GROUP BY.  But they do want to be able to see and
> compare the results of the 3 queries.
>
> They use SQL post-validate their data and check for sanity after
> a bulk data upload.  They have a file containing from a few to
> 10s of SQL queries that they cut and paste into a "SQL window"
> and run.  Usually many of the queries produce a single number as
> output.  Then they examine the output and see if the numbers add
> up.
>
> They use sets of SELECT INTO queries to copy, or pre-join, or
> summarize data in the master schema, moving it into their own
> schema for further analysis.  They again use cut-and-paste to
> replay these operations when there's new data from a fresh bulk
> data upload.  (They likewise have full permissions, within their
> own schemas, to create whatever tables they need to support their
> analysis.)

I think what they are looking for is this:

http://ipython.org/

and this:

http://pandas.pydata.org/

>
> Etc.
>
> Of course everything _could_ be done by submitting a single query
> at a time.  (Or an interface that requires such parsing by the
> end-user, and then runs them in bulk.  Or some sort of arbitrary
> file format that contains easily parse-able statement delimiters.
> Etc.)  But an interface that allows submission of multiple
> statements in a single go, and review of all the results at once,
> greatly simplifies their work flow.  (And it's nice that their
> interface has point-and-click table creation, deletion, etc.)
>
> I'm surprised (Shocked I say!  Shocked! :-) that you'd want to
> place arbitrary restrictions on the user interfaces that can be
> created with psycopg2.  And Sir!  I must take offense at the
> implication that my users are lazy.  They work hard to press the
> semicolon key to end their statements!  ;-)
>
> Seriously, the real world is complex.  If PG allows me to create
> a user interface where the SQL parsing into separate statements
> is done by PG then psycopg2 should let me do the same.  There's
> a reason you can pipe multiple SQL statements to psql without
> having to pre-parse them into separate statements.
>
> If you really want to reject the idea of accepting multiple
> queries from the user let me know and I'll not pursue the
> matter.
>
> Regards,
>
>
> Karl <kop@meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: A PGsendQuery API design for your review

От
Adrian Klaver
Дата:
On 02/02/2016 12:10 PM, Karl O. Pinc wrote:
> On Tue, 2 Feb 2016 14:04:08 -0600
> "Karl O. Pinc" <kop@meme.com> wrote:
>
>> On Tue, 02 Feb 2016 19:58:16 +0100
>> Federico Di Gregorii <fog@dndg.it> wrote:
>>
>>> I can say with some confidence that we'll never implement such an
>>> API. Even the
>>> ux, i.e., accepting multi-queries from the user is wrong, imho.
>
>> Seriously, the real world is complex.
>
> (Reminds me of an interaction I had with the Mozilla people.
> They'd changed Firefox so that a response had to show up
> within 5 minutes.  My folks occasionally write queries
> that run hours.  Mozilla changed their mind.  Hope you will
> too.)

Not sure that was a good idea, holding a resource open for hours to do
nothing seems counter-productive to me. That is what LISTEN/NOTIFY,
Websockets, name your async method,  are for.


>
> Regards,
>
> Karl <kop@meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 13:20:34 -0800
Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> I think what they are looking for is this:
>
> http://ipython.org/
>
> and this:
>
> http://pandas.pydata.org/

Thanks, but I don't think so.  Those are interactive
python shells.  What they want is a web-enabled psql;
a SQL shell.

Many years ago I did a horrible hack of phpPgAdmin
to give them such a shell (it parses sql).  I'd
like to move forward.

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 13:24:24 -0800
Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 02/02/2016 12:10 PM, Karl O. Pinc wrote:
> >
> > (Reminds me of an interaction I had with the Mozilla people.
> > They'd changed Firefox so that a response had to show up
> > within 5 minutes.  My folks occasionally write queries
> > that run hours.  Mozilla changed their mind.  Hope you will
> > too.)
>
> Not sure that was a good idea, holding a resource open for hours to
> do nothing seems counter-productive to me. That is what
> LISTEN/NOTIFY, Websockets, name your async method,  are for.

I understand.  On the other hand, there's maybe 2 people using
the box at once.  The limiting resource is programmer.
Plain old http is dirt simple.


Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Adrian Klaver
Дата:
On 02/02/2016 01:30 PM, Karl O. Pinc wrote:
> On Tue, 2 Feb 2016 13:20:34 -0800
> Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
>> I think what they are looking for is this:
>>
>> http://ipython.org/
>>
>> and this:
>>
>> http://pandas.pydata.org/
>
> Thanks, but I don't think so.  Those are interactive
> python shells.  What they want is a web-enabled psql;
> a SQL shell.

Actually more then that, as IPython is just one part of the Juypter
stack which encompasses over 40 languages and is Web enabled through
Notebooks. I would take a serious look at this before rolling your own
data analysis stack. Jupyter is becoming the de facto scientific toolset
and the community that surrounds it is huge. Pretty sure you will
someone has already done a  lot of the legwork for you.

http://jupyter.org/

>
> Many years ago I did a horrible hack of phpPgAdmin
> to give them such a shell (it parses sql).  I'd
> like to move forward.
>
> Regards,
>
> Karl <kop@meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Tue, 2 Feb 2016 13:47:51 -0800
Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> > Thanks, but I don't think so.  Those are interactive
> > python shells.  What they want is a web-enabled psql;
> > a SQL shell.
>
> Actually more then that, as IPython is just one part of the Juypter
> stack which encompasses over 40 languages and is Web enabled through
> Notebooks. I would take a serious look at this before rolling your
> own data analysis stack. Jupyter is becoming the de facto scientific
> toolset and the community that surrounds it is huge. Pretty sure you
> will someone has already done a  lot of the legwork for you.
>
> http://jupyter.org/

Thanks.  I'd glanced at it some years ago, but forgotten
about it.

Anyway, we don't have a data analysis stack of our own
making.  They just want to get the data out or otherwise
query the db.  (Lordy, some of them are doing analysis
in Excel.)

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Federico Di Gregorio
Дата:
On 02/02/16 21:04, Karl O. Pinc wrote:
> On Tue, 02 Feb 2016 19:58:16 +0100
> Federico Di Gregorii <fog@dndg.it> wrote:
>
>> I can say with some confidence that we'll never implement such an
>> API. Even the
>> ux, i.e., accepting multi-queries from the user is wrong, imho. Let
>> the user
>> write single queries, organize them, exrcute them using bound
>> variables and NOT
>> mogrify() and then return the result. Anything less is lazy.
>
> I'm don't think you've a complete picture of my users, what
> they're doing, and why it is useful to submit multiple queries
> at once.

Dear Karl,

I never said that your users are lazy. I said your design of the UX was
lazy. Let me explain: if you want to provide everything you describe in
the rest of your email, then splitting SQL in multiple queries is the
least of your problems (see below). What you really want is an
interactive environment with multiple "data pads", each asociated to a
query, that user can easily compare, save, restore and so on. You want
interactive SQL completion and that's much much more difficult than
splitting a query. Just sending a multi-query to the database,
leveraging your proposed API, won't help your users much.

Now, how to split a query? It's easy: you split at ";" unless inside a
text literal. Parsing text literals in SQL is incredibly easy: just
count the number of "'" you encounter: if odd then you're inside an
literal and you ignore the ";". PostgreSQL also has E'' literals but
they are just a little bit more complex - you have to check for "\;".
That's all. Now, parsing the SQL to implement completion, that's where
things get interesting... :)

TL;DR. Why add an API that nobody will use when what you need will
probably cost you just a couple of hours of coding?

Hope this clarifies why I don't like your proposal.

federico


>
> I support these people:
> http://amboselibaboons.nd.edu/
> http://gombechimpanzees.org/
>
> Who use interfaces like:
> http://papio.biology.duke.edu/babasewiki/MyStartingPage
> https://gombemi.ccas.gwu.edu/gombemiwiki/Main_Page
>
> My users are scientists, from undergraduates to senior
> researchers and American Academy of Science members, with a wide
> variety of SQL expertise.  They have direct access to their
> databases, writing SQL to explore their data.  They want a
> simple, comprehensive interface.  (One that not only provides
> support for SQL but also allows the unsophisticated to create
> their own tables, etc.  Which is neither here nor there.)
>
> They repeatedly try different queries and compare results,
> iterating until they tease the data apart.  They might write a
> query, run it, modify it, and then run both the modified and the
> original version.  They want to see the results in a single
> window so they can compare.  They want to be able to edit and
> tweak either or both and hit a "resubmit" button and again review
> the results.
>
> Some are sophisticated SQL users; regardless most don't want to
> think about SQL.  It's easier for them to cut and paste the same
> query 3 times and tweak the SQL to get 3 results (adult males,
> adult females, juveniles) than to try to write a single query
> that does a GROUP BY.  But they do want to be able to see and
> compare the results of the 3 queries.
>
> They use SQL post-validate their data and check for sanity after
> a bulk data upload.  They have a file containing from a few to
> 10s of SQL queries that they cut and paste into a "SQL window"
> and run.  Usually many of the queries produce a single number as
> output.  Then they examine the output and see if the numbers add
> up.
>
> They use sets of SELECT INTO queries to copy, or pre-join, or
> summarize data in the master schema, moving it into their own
> schema for further analysis.  They again use cut-and-paste to
> replay these operations when there's new data from a fresh bulk
> data upload.  (They likewise have full permissions, within their
> own schemas, to create whatever tables they need to support their
> analysis.)
>
> Etc.
>
> Of course everything _could_ be done by submitting a single query
> at a time.  (Or an interface that requires such parsing by the
> end-user, and then runs them in bulk.  Or some sort of arbitrary
> file format that contains easily parse-able statement delimiters.
> Etc.)  But an interface that allows submission of multiple
> statements in a single go, and review of all the results at once,
> greatly simplifies their work flow.  (And it's nice that their
> interface has point-and-click table creation, deletion, etc.)
>
> I'm surprised (Shocked I say!  Shocked! :-) that you'd want to
> place arbitrary restrictions on the user interfaces that can be
> created with psycopg2.  And Sir!  I must take offense at the
> implication that my users are lazy.  They work hard to press the
> semicolon key to end their statements!  ;-)
>
> Seriously, the real world is complex.  If PG allows me to create
> a user interface where the SQL parsing into separate statements
> is done by PG then psycopg2 should let me do the same.  There's
> a reason you can pipe multiple SQL statements to psql without
> having to pre-parse them into separate statements.
>
> If you really want to reject the idea of accepting multiple
> queries from the user let me know and I'll not pursue the
> matter.
>
> Regards,
>
>
> Karl <kop@meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>


--
Federico Di Gregorio                         federico.digregorio@dndg.it
DNDG srl                                                  http://dndg.it
  There's no greys, only white that's got grubby. I'm surprised you
   don't know that. And sin, young man, is when you treat people as
   things. Including yourself.                       -- Granny Weatherwax


Re: A PGsendQuery API design for your review

От
Joe Abbate
Дата:
On 02/02/16 16:30, Karl O. Pinc wrote:
> Thanks, but I don't think so.  Those are interactive
> python shells.  What they want is a web-enabled psql;
> a SQL shell.
>
> Many years ago I did a horrible hack of phpPgAdmin
> to give them such a shell (it parses sql).  I'd
> like to move forward.

I haven't played much with your particular problem, but it seems that
adminer does what you want.  You can SELECT from a table from the left
panel, then restrict the data that you want using dropdowns, and you'll
see the generated SQL before the results with an Edit link.  If you
click on that and then add a semicolon and another query, it returns two
sets of results.

Joe



Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Wed, 3 Feb 2016 09:15:52 +0100
Federico Di Gregorio <fog@dndg.it> wrote:

> On 02/02/16 21:04, Karl O. Pinc wrote:
> > On Tue, 02 Feb 2016 19:58:16 +0100
> > Federico Di Gregorii <fog@dndg.it> wrote:
> >
> >> I can say with some confidence that we'll never implement such an
> >> API. Even the
> >> ux, i.e., accepting multi-queries from the user is wrong, imho. Let
> >> the user
> >> write single queries, organize them, exrcute them using bound
> >> variables and NOT
> >> mogrify() and then return the result. Anything less is lazy.

Dear Federico,

Thanks for your reply.

> I never said that your users are lazy.

I'm sorry about my crack about lazy users, it was a joke.  There was
something about your statement which bothered me and prompted a
joking response.  I've figured out what it was.

Your mention of mongrify() made me think that your concern was that my
proposed API was too dangerous.  That people could not be trusted with
it.  Such a notion bothers me.  Naturally, an API should not go out of
it's way to encourage mis-use.  But when an underlying functionality
exists, even though it could be mis-used, I believe that it is overly
paternalistic to keep that functionality from the programmer solely
because it can be mis-used.

Of course there are other legitimate reasons for limiting the scope of
a set of APIs.  And reasons for inclusion.  Since psycopg2's purpose
is to expose the libpq API to Python users I'd hope that one of the
project's goals would be to make _all_ of the functionality of libpq
available in Python; and that this would weigh in on the side of
adding an API as proposed.

> I said your design of the UX
> was lazy. Let me explain: if you want to provide everything you
> describe in the rest of your email, then splitting SQL in multiple
> queries is the least of your problems (see below). What you really
> want is an interactive environment with multiple "data pads", each
> asociated to a query, that user can easily compare, save, restore and
> so on.

You should know up-front that we have, and have had, a user interface.
It takes the form of a modified phpPgAdmin.  They have, rightly IMO,
rejected introducing an SQL parser into their code base.  Moreover
phpPgAdmin is starting to show it's age and is no longer well
maintained.  Someday we'll have to replace our interface and that's
what I'm looking toward.

Sure.  Multiple "data pads" would be nice.  I must disagree that each
would be associated with a single query/statement.  Each must be
associated with multiple SQL statements, if not queries, at minimum.
The users often begin a transaction, run various statements which may
create temporary tables and use them to update data in real tables --
at minimum alter data in real tables without first creating temporary
tables, and then run one or more queries that test the result.
Finally they roll the transaction back.  When their SQL is working as
desired they change the rollback to a commit and re-execute.

Naturally, more than one table is often updated within a single
transaction so there is more than one insert/update statement
executed.

Sometimes the process can take days before the final SQL is settled
upon.  Or the resulting SQL re-executed periodically.  No fancy
interface is required in these cases.  Cut and paste of SQL between
"data pad" and a text document is a simple and universal interface.
And it gives the user the ultimate in power and flexibility when it
comes to archiving, documenting data flow, dissemination of SQL,
comparison between past and present, etc.

One "data pad" is good enough, so long as it accepts multiple
statements.  Without multiple statements you can't try data
modifications out within transactions and report on the result unless,
critically, you make permanent potentially erroneous modifications to
your data.

The user experience is greatly enhanced if "data pads" accept multiple
queries, not just multiple statements.  Otherwise, in order to see
more than one aspect of a series of proposed data updates, the user
would have to do something like copy those portions of the SQL work
flow that update data from "data pad" to "data pad" in order to, in
the end, have executed multiple queries to investigate various aspects
of the result of the changes to db content.  The user should not have
to think about such issues.

It's also helpful to the analytic process to be able to drop
"debugging" queries into the middle of a bunch of data updates so as
to understand the effect of each update.  In fact this happens all the
time.

> You want interactive SQL completion and that's much much more
> difficult than splitting a query.

Interactive SQL completion might be nice, but only in that it saves a
bit of typing.  People got along fine for years without completion in
shell and my users get along fine without SQL completion.  If it was
free, sure.  But who's going to pay to do the difficult job of
programming SQL completion?  (And if you were going to implement such
a feature, shouldn't it be part of Postgres so as to both leverage
existing code and be available to everybody?)

Likewise the whole pointy-clicky business of constructing SQL by way
of GUI might be nice when one is just getting started with SQL.  After
that, typing is a whole lot faster.  Graphical SQL construction
becomes a giant annoyance unless it can be entirely ignored.

> Just sending a multi-query to the
> database, leveraging your proposed API, won't help your users much.

I have to disagree.  It's extremely useful to be able to send an
arbitrary mix of SQL to the server and see the effect of every
statement.  Not just the output of the last query.  Imagine a
programming IDE with the limitations you propose.

I hope this better explains why a truly generic SQL execution
environment is a critical UI feature.

What really matters is what my users think.  I can assure you the
verdict is in.  Submitting multiple SQL statements, sometimes
containing multiple queries, and getting both counts and query results
back from every statement is a nearly indispensable part of their user
interface.

> Now, how to split a query? It's easy: you split at ";" unless inside
> a text literal. Parsing text literals in SQL is incredibly easy: just
> count the number of "'" you encounter: if odd then you're inside an
> literal and you ignore the ";". PostgreSQL also has E'' literals but
> they are just a little bit more complex - you have to check for "\;".
> That's all. Now, parsing the SQL to implement completion, that's
> where things get interesting... :)

Actually, this is what we've been doing.  Although it's not as simple
as you think.  There is also dollar quoting.  And there's the hassle
of having to go back and revisit the code.  We've had to do this
twice.  Once for the 'E' literals and once for dollar quoting.  (We've
had this interface going for a long time.  Although not in Python.)

Not to mention that upstream wants nothing to do with putting a parser
in their code.  This greatly increases the hassle when upgrading to a
new upstream version.  At the same time we've been loath to produce an
entirely custom interface when upstream has 98% of what we need.

I've tried parsing.  It's time to try something different.

> TL;DR. Why add an API that nobody will use when what you need will
> probably cost you just a couple of hours of coding?

Yours is an excellent argument.  Let me address both aspects of your
question.  First the approach our project takes to programming and
where we'd like to invest our couple of hours of programming.  And
second the larger issue of why psycopg2 would want such an API.

A guiding principle of ours is to use toolsets and interfaces that are
as generic as possible, and obtained from 3rd parties whenever
possible.  The amount of work dedicated to user interface is to be
minimized.  The programming budget is limited and is to be directed to
essential functionality.  User interfaces are expensive.  As soon
as a program interacts with a human 90% of the programmer's work goes
toward interaction and error handling and only about 10% goes to code
that actually manipulates data and it's state.

So, we presently use a 3rd party database admin tool.  Someday we may
be forced to supplement this.

Our users are sophisticated enough to eschew bling.  The Amboseli
Baboon project's been going since 1974, switched to a "real" database
from ad-hoc SPSS in a punchcard format around 1992, and have been
creatively adapting the latest technology of every sort into all
aspects their work in a fashion geared toward maximal return on
investment since the beginning.  When it comes to databases they want
an interface that delivers power, and know what that means.

I'd love to spend more time and make a more elegant user interface.
My user's priorities are elsewhere.

So, yes.  I am lazy. Libpq provides access to a parser through
PQsendQuery() and friends.  Why should I (continuously) re-invent the
wheel?  If you don't want to incorporate the suggested APIs I will
probably try subclassing psycopg2.extensions.cursor().  This seems
even easier than writing an SQL parser.  I'm pretty sure this would
work, although I'm also pretty sure it'd have ugly-ish bits that need
to know the internals of it's super()s.  Although this would be sad, I
think I'd prefer keeping up with psycopg2's internals to keeping up
with changes to PG's SQL syntax.  I won't know until I look.

As to why psycopg2 should add the proposed API when "nobody will use
it":

There is the argument, at top, regards psycopg2 providing an interface
that exposes all the functionality available within libpq.  Clearly
libpq makes it possible to create the user interface I desire.
(Whether or not you believe that such a UI should exist.)  Using
PQsendQuery()/PQgetResult() is a way to create such a UI.  (See this
thread, where consensus seems to be that, in C at least, parsing SQL
is undesirable and PQsendQuery()/PQgetResult() is the way to replicate
psql's acceptance of multiple SQL queries:
http://marc.info/?l=postgresql-general&m=134877413732485&w=2)

As to "nobody will use it", this depends on how you count.  I do
believe that you are right in that very few programs will ever provide
generic SQL "data pads"; interfaces allowing submission of arbitrary
SQL.  However very, very many _people_ do use an interface which
allows submission of multiple arbitrary SQL statements.  Every "all
purpose" database administration tool has such an interface.
phpPgAdmin, PgAdmin, adminer editor are the PG examples.  Not to
mention the various MySQL admin interfaces.  And of course psql.

The only question is whether or not the interface violates the
principle of least surprise by displaying a single result when supplied
with multiple SQL queries.  Or whether it does something more
sophisticated, and in my opinion much more useful, and behaves as psql
does.

I believe that psycopg2 would be a good choice if one was to write a
generic admin interface to PG in Python.  If so, as psycopg2 exists
now, such a Python program would not be able to replicate psql's
behavior when piped an arbitrary collection of SQL statements.  Not,
at least, unless the author implemented and maintained yet another SQL
parser for PG per your suggestion.

Perhaps the world already has enough generic database admin tools, or
functional subsets thereof.  It doesn't need another written in
Python.  But is that for psycopg2 to judge?

In any case I believe that there are other legitimate use-cases for
the proposed API, although it may not be used often.  Reporting is one
possible use, and other sorts of batch-related processing.  I don't
have vast experience to back this up with specifics but I believe that
multiple SQL statements are piped to psql on a not-uncommon basis.
Surely some of these rely on psql's ability to output a result
per-statement.  Why shouldn't there be Python programs that support
this functionality?

> Hope this clarifies why I don't like your proposal.

Yes.  Thanks very much.  It does help me understand your thoughts and
I appreciate you spending your time.

From another standpoint though, no.  I don't understand at all.  Why
should psycopg2 be passing judgment on what user interface is
appropriate for interacting with a PG database?  It's fine to reject
APIs that just won't get used, but I question whether it's
possible to know that a libpq feature won't be useful in unanticipated
ways once made available via Python.  It seems only natural to
expose all of libpq's functionality.  Picking and choosing ways
of exposing features is one thing.  Deciding that a libpq feature
just shouldn't be available is something else.


Something that's not been discussed at all is the proposed notion of
eliminating client-side buffering.  This is trading CPU on one side
for reduced memory consumption and lower startup latency when
returning query results on the other.  It matters when result sets are
large.  As it is now, when a query produces a million rows of results
all these rows are sent to the client and buffered before libpq
delivers any rows to the application.  (Absent server-side cursors,
et-al.)

Of course you could extend cursor.execute() in place of using the
proposed API.  I thought the "no buffering" feature would be useful.
It would likely involve more changes to existing code than the ability
to execute multiple queries, which mostly would require new code.
Since "no buffering" is useful when executing queries, and both new
features require a PQsendQuery() call, and the existence of a "no
buffering" feature would be appreciated by my users, I figured I'd
build the "no buffering" feature into the new API proposals.

Thanks for listening.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

P.S.  We call "data pads" "SQL windows".


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Wed, 03 Feb 2016 15:27:25 -0500
Joe Abbate <jma@freedomcircle.com> wrote:

> On 02/02/16 16:30, Karl O. Pinc wrote:
> > Thanks, but I don't think so.  Those are interactive
> > python shells.  What they want is a web-enabled psql;
> > a SQL shell.

> I haven't played much with your particular problem, but it seems that
> adminer does what you want.  You can SELECT from a table from the left
> panel, then restrict the data that you want using dropdowns, and
> you'll see the generated SQL before the results with an Edit link.
> If you click on that and then add a semicolon and another query, it
> returns two sets of results.

Thanks very much for the link.  Adminer may be very helpful at some
point.

I did glance at it long ago.  I think then maybe it did not support
Postgres.  In any case I'd forgotten about it.

I'd prefer to dispense with the whole notion of building
SQL from a GUI.  Rarely is a single table queried.  Better to go
straight to the edit window.

I don't have time to look at this now, and have no
immediate need either.  It must parse though, since
php does not have support for PQsendQuery()/PQgetResult()
as far as I know.

Gotta wonder how it will do with:

  SELECT $;$Hello World.$;$;

And thanks for help with, and thinking about, my problem.

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
Joe Abbate
Дата:
On 05/02/16 03:02, Karl O. Pinc wrote:
> I did glance at it long ago.  I think then maybe it did not support
> Postgres.  In any case I'd forgotten about it.

It does show its MySQL roots in some places (e.g., SERIAL shown as Auto
Increment), but it has supported PG since at least February 2011
(because I wrote a blog post about it --
https://pyrseas.wordpress.com/2011/02/24/interesting-database-tool-adminer/
)

> I'd prefer to dispense with the whole notion of building
> SQL from a GUI.  Rarely is a single table queried.  Better to go
> straight to the edit window.

It does have an "SQL command" link that takes you to an "edit" window
direcly and you could specify directly, e.g.,

http://localhost/adminer/?pgsql=localhost&username=yourusername&db=yourdbname&ns=public&sql=

> I don't have time to look at this now, and have no
> immediate need either.  It must parse though, since
> php does not have support for PQsendQuery()/PQgetResult()
> as far as I know.
>
> Gotta wonder how it will do with:
>
>   SELECT $;$Hello World.$;$;

It does give a syntax error on that exact statement (just as psql does),
but it returns

   ?column?
$;$Hello World.$;$

if you specify it as

SELECT '$;$Hello World.$;$';

What I'm still amazed at is how adminer manages to present a graphical
schema display.  I haven't looked at phpPgAdmin recently but that (and
its ease of use) won me over, even over pgAdmin.

Note: I'm using Adminer 3.3.3 (which is Debian stretch's default) but
there's a newer version: 4.2.3.

Joe


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Fri, 05 Feb 2016 08:17:17 -0500
Joe Abbate <jma@freedomcircle.com> wrote:

> > Gotta wonder how it will do with:
> >
> >   SELECT $;$Hello World.$;$;
>
> It does give a syntax error on that exact statement

Ah.  That would make sense because that's illegal SQL.  :-)

"The tag, if any, of a dollar-quoted string follows the same rules as an
unquoted identifier, except that it cannot contain a dollar sign."



Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: A PGsendQuery API design for your review

От
"Karl O. Pinc"
Дата:
On Wed, 03 Feb 2016 15:27:25 -0500
Joe Abbate <jma@freedomcircle.com> wrote:

> On 02/02/16 16:30, Karl O. Pinc wrote:
> > Thanks, but I don't think so.  Those are interactive
> > python shells.  What they want is a web-enabled psql;
> > a SQL shell.

> I haven't played much with your particular problem, but it seems that
> adminer does what you want.

FYI.  I can't get adminer editor to work.  Perhaps because
we have schemas, and no "public" schema.  It tells
me there are no tables and I have no further options.
Providing a (schema qualified, or not) table name to search
yields no change in the interface.

The adminer sql window interface shows some promise, although
there does not seem to be any provision for data export.
This is another essential feature.

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein