Обсуждение: [psycopg] Solving the SQL composition problem
Hello, one recurring problem using psycopg is with the generation of dynamic SQL strings, e.g. where the table or the field names change. Usually the conversation goes: User: this doesn't work: cur.execute("insert into %s values (%s)", ['mytable', 42]) Maintainer: yes, you can't pass tables to the arguments User: what should I do? Maintainer: normal string concatenation: cur.execute("insert into %s values %%s" % 'mytable', [42]) User: what if the table name has special chars? Maintainer: ehm... User: what if the source of the table name is not secure? Maintainer: ehm... The problem is slightly improved by exposing `quote_ident()` but not solved altogether: people must remember to use `quote_ident()` every time, which is dangerous, because forgetting to use it will usually work anyway... until one table name contains a special char. Furthermore sometimes you will want to include a SQL value in the query, which should be done in a complete different way (calling adapt, prepare, getquoted...). [quote_ident()] http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident My attempt to a solution: the `psycopg2.sql` module: http://initd.org/psycopg/docs/sql.html The design was sort of sketched in bugs #308 and #358. Unlike these first sketches there is no new method on connections or cursors: the new module exposes certain Composable objects (SQL, Literal, Identifier...) which can be composed using operators and methods and forming new Composables. Composables can be ultimately used in place of strings in queries. [bug #308]: https://github.com/psycopg/psycopg2/issues/308 [bug #358]: https://github.com/psycopg/psycopg2/issues/358 With these objects it is possible to call: from psycopg2 import sql cur.execute( sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')], [42]) If you could take a look at the documentation, or play with the feature, and let me know what you think about the feature design, or any other observation, it would be great. The code currently lives in the sql-compose branch of the github repository. [sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose Thank you very much! -- Daniele
On 01/01/2017 12:11 AM, Daniele Varrazzo wrote: > Hello, > > one recurring problem using psycopg is with the generation of dynamic > SQL strings, e.g. where the table or the field names change. Usually > the conversation goes: > > User: this doesn't work: cur.execute("insert into %s values (%s)", > ['mytable', 42]) > Maintainer: yes, you can't pass tables to the arguments > User: what should I do? > Maintainer: normal string concatenation: cur.execute("insert into %s > values %%s" % 'mytable', [42]) > User: what if the table name has special chars? > Maintainer: ehm... > User: what if the source of the table name is not secure? > Maintainer: ehm... > > The problem is slightly improved by exposing `quote_ident()` but not > solved altogether: people must remember to use `quote_ident()` every > time, which is dangerous, because forgetting to use it will usually > work anyway... until one table name contains a special char. > Furthermore sometimes you will want to include a SQL value in the > query, which should be done in a complete different way (calling > adapt, prepare, getquoted...). > > [quote_ident()] > http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident > > My attempt to a solution: the `psycopg2.sql` module: > http://initd.org/psycopg/docs/sql.html > > The design was sort of sketched in bugs #308 and #358. Unlike these > first sketches there is no new method on connections or cursors: the > new module exposes certain Composable objects (SQL, Literal, > Identifier...) which can be composed using operators and methods and > forming new Composables. Composables can be ultimately used in place > of strings in queries. > > [bug #308]: https://github.com/psycopg/psycopg2/issues/308 > [bug #358]: https://github.com/psycopg/psycopg2/issues/358 > > With these objects it is possible to call: > > from psycopg2 import sql > > cur.execute( > sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')], > [42]) > > If you could take a look at the documentation, or play with the > feature, and let me know what you think about the feature design, or > any other observation, it would be great. The code currently lives in > the sql-compose branch of the github repository. > > [sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose > > Thank you very much! Wow, that is very cool. I have gotten as far reading the docs, especially like Placeholder. Still need to check out the branch and play. Will report back. > > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, Jan 01, 2017 at 09:11:24AM +0100, Daniele Varrazzo wrote: > one recurring problem using psycopg is with the generation of dynamic > SQL strings, e.g. where the table or the field names change. Usually > the conversation goes: [...] The concept immediately appeals to me and I would surely apply it in GNUmed. One word of caution: pretty-please don't let psycopg2 slide into being some sort of mini/pseudo ORM. Thanks ! Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 01/01/2017 12:11 AM, Daniele Varrazzo wrote: > Hello, > > one recurring problem using psycopg is with the generation of dynamic > SQL strings, e.g. where the table or the field names change. Usually > the conversation goes: > > User: this doesn't work: cur.execute("insert into %s values (%s)", > ['mytable', 42]) > Maintainer: yes, you can't pass tables to the arguments > User: what should I do? > Maintainer: normal string concatenation: cur.execute("insert into %s > values %%s" % 'mytable', [42]) > User: what if the table name has special chars? > Maintainer: ehm... > User: what if the source of the table name is not secure? > Maintainer: ehm... > > The problem is slightly improved by exposing `quote_ident()` but not > solved altogether: people must remember to use `quote_ident()` every > time, which is dangerous, because forgetting to use it will usually > work anyway... until one table name contains a special char. > Furthermore sometimes you will want to include a SQL value in the > query, which should be done in a complete different way (calling > adapt, prepare, getquoted...). > > [quote_ident()] > http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.quote_ident > > My attempt to a solution: the `psycopg2.sql` module: > http://initd.org/psycopg/docs/sql.html > > The design was sort of sketched in bugs #308 and #358. Unlike these > first sketches there is no new method on connections or cursors: the > new module exposes certain Composable objects (SQL, Literal, > Identifier...) which can be composed using operators and methods and > forming new Composables. Composables can be ultimately used in place > of strings in queries. > > [bug #308]: https://github.com/psycopg/psycopg2/issues/308 > [bug #358]: https://github.com/psycopg/psycopg2/issues/358 > > With these objects it is possible to call: > > from psycopg2 import sql > > cur.execute( > sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')], > [42]) > > If you could take a look at the documentation, or play with the > feature, and let me know what you think about the feature design, or > any other observation, it would be great. The code currently lives in > the sql-compose branch of the github repository. > > [sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose Yes this is going to make building dynamic SQL cleaner then the hodgepodge of things I have created over the years. Only hiccup I had was more me then anything else. You have this example: from psycopg2 import sql cur.execute( sql.SQL("insert into %s values (%%s, %%s)") % [sql.Identifier('my_table')], [10, 20]) towards the top of the docs section. When I got down to the Placeholder section it did not occur to me at first that I could eliminate Placeholder instances by just putting %%s in the template. Going back over the docs got me to the above example and clarity. > > Thank you very much! > > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, Jan 1, 2017 at 9:11 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
Hello,
one recurring problem using psycopg is with the generation of dynamic
SQL strings, e.g. where the table or the field names change. Usually
the conversation goes:
User: this doesn't work: cur.execute("insert into %s values (%s)",
['mytable', 42])
Maintainer: yes, you can't pass tables to the arguments
User: what should I do?
Maintainer: normal string concatenation: cur.execute("insert into %s
values %%s" % 'mytable', [42])
User: what if the table name has special chars?
Maintainer: ehm...
User: what if the source of the table name is not secure?
Maintainer: ehm...
The problem is slightly improved by exposing `quote_ident()` but not
solved altogether: people must remember to use `quote_ident()` every
time, which is dangerous, because forgetting to use it will usually
work anyway... until one table name contains a special char.
Furthermore sometimes you will want to include a SQL value in the
query, which should be done in a complete different way (calling
adapt, prepare, getquoted...).
[quote_ident()]
http://initd.org/psycopg/docs/extensions.html#psycopg2. extensions.quote_ident
My attempt to a solution: the `psycopg2.sql` module:
http://initd.org/psycopg/docs/sql.html
The design was sort of sketched in bugs #308 and #358. Unlike these
first sketches there is no new method on connections or cursors: the
new module exposes certain Composable objects (SQL, Literal,
Identifier...) which can be composed using operators and methods and
forming new Composables. Composables can be ultimately used in place
of strings in queries.
[bug #308]: https://github.com/psycopg/psycopg2/issues/308
[bug #358]: https://github.com/psycopg/psycopg2/issues/358
With these objects it is possible to call:
from psycopg2 import sql
cur.execute(
sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')],
[42])
If you could take a look at the documentation, or play with the
feature, and let me know what you think about the feature design, or
any other observation, it would be great. The code currently lives in
the sql-compose branch of the github repository.
[sql-compose] https://github.com/psycopg/psycopg2/tree/sql-compose
Thank you very much!
Hello Daniele,
I've only taken look at the docs and that totally makes sense to me. Thank you!
--
Alex
On 1/1/17 2:11 AM, Daniele Varrazzo wrote: > sql.SQL("insert into %s values (%%s)") % [sql.Identifier('mytable')], Since %s isn't standard parameter replacement anyway, I'm wondering if both considerations could just be handled by execute(), by using different replacement syntax. IE: execute('insert into %s values ($1)', [42], ['my table']) Obviously this would be backwards incompatible, but I think that's manageable. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 01/02/2017 07:05 AM, Jim Nasby wrote: > On 1/1/17 2:11 AM, Daniele Varrazzo wrote: >> sql.SQL("insert into %s values (%%s)") % >> [sql.Identifier('mytable')], > > Since %s isn't standard parameter replacement anyway, I'm wondering if > both considerations could just be handled by execute(), by using > different replacement syntax. IE: I thought it was?: https://www.python.org/dev/peps/pep-0249/#paramstyle format ANSI C printf format codes, e.g. ...WHERE name=%s > > execute('insert into %s values ($1)', [42], ['my table']) > > Obviously this would be backwards incompatible, but I think that's > manageable. -- Adrian Klaver adrian.klaver@aklaver.com
On Mon, Jan 2, 2017 at 4:05 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 1/1/17 2:11 AM, Daniele Varrazzo wrote: >> >> sql.SQL("insert into %s values (%%s)") % >> [sql.Identifier('mytable')], > > > Since %s isn't standard parameter replacement anyway, I'm wondering if both > considerations could just be handled by execute(), by using different > replacement syntax. IE: > > execute('insert into %s values ($1)', [42], ['my table']) > > Obviously this would be backwards incompatible, but I think that's > manageable. mmm... what I think is that if these objects' replacement rules were different one could leave the %s and %(name)s placeholder untouched for the query params. A natural choice could be to use the str.format syntax for the query composition, or a subset of it. Hence my example could be: cur.execute( sql.SQL("insert into {} values (%s, %s)").format(sql.Identifier('my_table')), [10, 20]) This would largely remove the need for double escaping. -- Daniele
On 01/02/2017 08:21 AM, Daniele Varrazzo wrote: > On Mon, Jan 2, 2017 at 4:05 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> On 1/1/17 2:11 AM, Daniele Varrazzo wrote: >>> >>> sql.SQL("insert into %s values (%%s)") % >>> [sql.Identifier('mytable')], >> >> >> Since %s isn't standard parameter replacement anyway, I'm wondering if both >> considerations could just be handled by execute(), by using different >> replacement syntax. IE: >> >> execute('insert into %s values ($1)', [42], ['my table']) >> >> Obviously this would be backwards incompatible, but I think that's >> manageable. > > mmm... what I think is that if these objects' replacement rules were > different one could leave the %s and %(name)s placeholder untouched > for the query params. > > A natural choice could be to use the str.format syntax for the query > composition, or a subset of it. Hence my example could be: > > cur.execute( > sql.SQL("insert into {} values (%s, > %s)").format(sql.Identifier('my_table')), > [10, 20]) I like this, makes it clearer what is an identifier versus a placeholder. > > This would largely remove the need for double escaping. > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
On 1/2/17 10:21 AM, Daniele Varrazzo wrote: >> execute('insert into %s values ($1)', [42], ['my table']) >> >> Obviously this would be backwards incompatible, but I think that's >> manageable. > mmm... what I think is that if these objects' replacement rules were > different one could leave the %s and %(name)s placeholder untouched > for the query params. > > A natural choice could be to use the str.format syntax for the query > composition, or a subset of it. Hence my example could be: > > cur.execute( > sql.SQL("insert into {} values (%s, > %s)").format(sql.Identifier('my_table')), > [10, 20]) > > This would largely remove the need for double escaping. What I'm suggesting is to let Postgres handle the replacement of normal values, using the prepared statement syntax of $1, $2, etc[1], and only do identifier replacement in python (using quote_ident). That means a lot less time spent parsing, and opens the door for eventually doing more efficient stuff over the wire, like using binary type formats. 1: https://www.postgresql.org/docs/current/static/sql-prepare.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Mon, Jan 2, 2017 at 5:21 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > > A natural choice could be to use the str.format syntax for the query > composition, or a subset of it. Hence my example could be: > > cur.execute( > sql.SQL("insert into {} values (%s, > %s)").format(sql.Identifier('my_table')), > [10, 20]) That's now implemented in the branch; docs updated too: http://initd.org/psycopg/docs/sql.html. In particular see http://initd.org/psycopg/docs/sql.html#psycopg2.sql.SQL.format which replaces the previous SQL % operator (no more supported). It is no more necessary to specify value placeholders as %%s so it's easier to switch between constant and parametrized queries. -- Daniele
On 01/03/2017 08:24 AM, Jim Nasby wrote: > On 1/2/17 10:21 AM, Daniele Varrazzo wrote: >>> execute('insert into %s values ($1)', [42], ['my table']) >>> >>> Obviously this would be backwards incompatible, but I think that's >>> manageable. >> mmm... what I think is that if these objects' replacement rules were >> different one could leave the %s and %(name)s placeholder untouched >> for the query params. >> >> A natural choice could be to use the str.format syntax for the query >> composition, or a subset of it. Hence my example could be: >> >> cur.execute( >> sql.SQL("insert into {} values (%s, >> %s)").format(sql.Identifier('my_table')), >> [10, 20]) >> >> This would largely remove the need for double escaping. > > What I'm suggesting is to let Postgres handle the replacement of normal > values, using the prepared statement syntax of $1, $2, etc[1], and only > do identifier replacement in python (using quote_ident). That means a > lot less time spent parsing, and opens the door for eventually doing > more efficient stuff over the wire, like using binary type formats. I believe a path to the above is explained here: https://www.postgresql.org/message-id/AANLkTi=ym3SCQKCQBtp8RJHUswwAPOpjXYKTXS=aHWzp@mail.gmail.com > > 1: https://www.postgresql.org/docs/current/static/sql-prepare.html -- Adrian Klaver adrian.klaver@aklaver.com
> On Jan 3, 2017, at 08:24, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote: > What I'm suggesting is to let Postgres handle the replacement of normal values, using the prepared statement syntax of$1, $2, etc[1], and only do identifier replacement in python (using quote_ident). If this is practical, I'd be very much in favor of it, for the reasons given. -- -- Christophe Pettus xof@thebuild.com
On Tue, Jan 3, 2017 at 5:24 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > What I'm suggesting is to let Postgres handle the replacement of normal > values, using the prepared statement syntax of $1, $2, etc[1], and only do > identifier replacement in python (using quote_ident). That means a lot less > time spent parsing, and opens the door for eventually doing more efficient > stuff over the wire, like using binary type formats. > > 1: https://www.postgresql.org/docs/current/static/sql-prepare.html Hi Jim, this is a very desirable feature. However it would be a major source of backward incompatibility. Even keeping the placeholders as %s or %(name)s some query would stop working because of different postgresql cast rules and because PQexecParams doesn't support passing several semicolon-separated statements at once. We will eventually get there in another major release of the adapter. Adrian has already posted a link with further discussion about the topic. -- Daniele
On 1/3/17 10:35 AM, Adrian Klaver wrote: >> What I'm suggesting is to let Postgres handle the replacement of normal >> values, using the prepared statement syntax of $1, $2, etc[1], and only >> do identifier replacement in python (using quote_ident). That means a >> lot less time spent parsing, and opens the door for eventually doing >> more efficient stuff over the wire, like using binary type formats. > > I believe a path to the above is explained here: > > https://www.postgresql.org/message-id/AANLkTi=ym3SCQKCQBtp8RJHUswwAPOpjXYKTXS=aHWzp@mail.gmail.com Seems reasonable at first glance. Speaking of type compatibility and what-not... I find it irritating that we need to maintain that in two different places (psycopg2 and plpython). It'd be damn nice if we could abstract that out somehow, as well as fixing plpython's brain-dead behavior with nested types (arrays of complex, or complex with arrays or more complex types). I've also been mucking around with having the SPI functions in plpython return something that looks more like a dataframe. I've got working code but have yet to figure out a good interface. Part of that is trying to figure out how to deal with a bunch of different possible arguments in the C code. I'm thinking the most flexible API would be something like plpy.execute(..., container=[], members={}) to match existing behavior. You could then do container={}, members=[] to get something that looks like a dataframe. Or (theoretically) you could pass in the desired Pandas types (I think that'd be contairner=pd.DataFrame, members=pd.Series). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 01/03/2017 08:33 AM, Daniele Varrazzo wrote: > On Mon, Jan 2, 2017 at 5:21 PM, Daniele Varrazzo > <daniele.varrazzo@gmail.com> wrote: >> >> A natural choice could be to use the str.format syntax for the query >> composition, or a subset of it. Hence my example could be: >> >> cur.execute( >> sql.SQL("insert into {} values (%s, >> %s)").format(sql.Identifier('my_table')), >> [10, 20]) > > That's now implemented in the branch; docs updated too: > http://initd.org/psycopg/docs/sql.html. In particular see > http://initd.org/psycopg/docs/sql.html#psycopg2.sql.SQL.format which > replaces the previous SQL % operator (no more supported). It is no > more necessary to specify value placeholders as %%s so it's easier to > switch between constant and parametrized queries. Not sure it applies here, but I just ran across a blog from Armin Ronacher. I don't always understand what he says, in this case I think I do and it might be worth a look: http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/ > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Jan 5, 2017 at 6:59 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Not sure it applies here, but I just ran across a blog from Armin Ronacher. > I don't always understand what he says, in this case I think I do and it > might be worth a look: > > http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/ It's a reasonable concern, but no, it doesn't apply to us. From the Python library I'm only using the parser to parse the format micro-language, but not doing anything special with the field name, in particular not applying attribute lookup: trying `{0.__class__}` wouldn't try to extract the `__class__` attribute from the first positional argument, but would look up for a keyword argument with such name and fail with a KeyError. Also, we check and explicitly forbid placeholder modifier. https://github.com/psycopg/psycopg2/blob/a8a3a298/lib/sql.py#L227 -- Daniele
On 01/05/2017 11:26 AM, Daniele Varrazzo wrote: > On Thu, Jan 5, 2017 at 6:59 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> Not sure it applies here, but I just ran across a blog from Armin Ronacher. >> I don't always understand what he says, in this case I think I do and it >> might be worth a look: >> >> http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/ > > It's a reasonable concern, but no, it doesn't apply to us. From the > Python library I'm only using the parser to parse the format > micro-language, but not doing anything special with the field name, in > particular not applying attribute lookup: trying `{0.__class__}` > wouldn't try to extract the `__class__` attribute from the first > positional argument, but would look up for a keyword argument with > such name and fail with a KeyError. Also, we check and explicitly > forbid placeholder modifier. > > https://github.com/psycopg/psycopg2/blob/a8a3a298/lib/sql.py#L227 Alright, good to know. > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com