Обсуждение: Types pollution with iso-8859-1 oids and server-side parameters binding

Поиск
Список
Период
Сортировка

Types pollution with iso-8859-1 oids and server-side parameters binding

От
Daniele Varrazzo
Дата:
Hello,

A problem shown in https://github.com/psycopg/psycopg/discussions/289

In the query:

    UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3::date

passing a string with unknown oid as param $3 results in the column ts
receiving only the date part. Looks like the cast needed on the param
in the WHERE gets propagated to the other occurrences of the same
parameter.

Repro:

```psql
piro=# create table test289 (num int, name text, ts timestamp);
CREATE TABLE
piro=# insert into test289 values (300, 'Fred', '2022-03-03 11:00:00');
INSERT 0 1
piro=# insert into test289 values (200, 'Barney', '2022-03-02 11:00:00');
INSERT 0 1
piro=# select * from test289;
┌─────┬────────┬─────────────────────┐
│ num │  name  │         ts          │
├─────┼────────┼─────────────────────┤
│ 300 │ Fred   │ 2022-03-03 11:00:00 │
│ 200 │ Barney │ 2022-03-02 11:00:00 │
└─────┴────────┴─────────────────────┘
(2 rows)
```

```python
import psycopg
conn = psycopg.connect(DSN, autocommit=True)
conn.pgconn.exec_params(
    b'UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date =
$3::date',
    [b"301", b"Fred2", b"2022-03-03 20:00:00"], [21, 0, 0])
```

```psql
piro=# select * from test289;
┌─────┬────────┬─────────────────────┐
│ num │  name  │         ts          │
├─────┼────────┼─────────────────────┤
│ 200 │ Barney │ 2022-03-02 11:00:00 │
│ 301 │ Fred2  │ 2022-03-03 00:00:00 │  <<< should have been time 20:00
└─────┴────────┴─────────────────────┘
(2 rows)
```

This doesn't happen if the parameter type is specified (e.g. using
[21, 0, 1114] as OIDs array) or if the type of param 3 is made
understood as timestamp, e.g. with $3::timestamp::date in the WHERE
condition, or if the timestamp value is copied and used in separate
placeholders $3 and $4).

I see why it happens... I don't think it's the right behaviour though.

-- Daniele

Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
Tom Lane
Дата:
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> A problem shown in https://github.com/psycopg/psycopg/discussions/289

> In the query:
>     UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3::date

> passing a string with unknown oid as param $3 results in the column ts
> receiving only the date part. Looks like the cast needed on the param
> in the WHERE gets propagated to the other occurrences of the same
> parameter.

I think it's operating exactly as designed.  The parameter can only
have one type, and the cast is telling it to assume that that type
is "date".

> I see why it happens... I don't think it's the right behaviour though.

What do you think ought to happen?  The same parameter somehow having
different types in different places?

            regards, tom lane



Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
"David G. Johnston"
Дата:
On Tue, May 3, 2022 at 12:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> A problem shown in https://github.com/psycopg/psycopg/discussions/289

> In the query:
>     UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3::date

> passing a string with unknown oid as param $3 results in the column ts
> receiving only the date part. Looks like the cast needed on the param
> in the WHERE gets propagated to the other occurrences of the same
> parameter.

I think it's operating exactly as designed.  The parameter can only
have one type, and the cast is telling it to assume that that type
is "date".

> I see why it happens... I don't think it's the right behaviour though.

What do you think ought to happen?  The same parameter somehow having
different types in different places?


Based upon:

"When a parameter's data type is not specified or is declared as unknown, the type is inferred from the context in which the parameter is first referenced (if possible)."
(PREPARE Command docs)

and:

"""
postgres=# prepare prepupd as update table1 set ts = $1, tsd = $1 returning ts, tsd, pg_typeof($1);
ERROR:  inconsistent types deduced for parameter $1
LINE 1: ...epare prepupd as update table1 set ts = $1, tsd = $1 returni...
                                                             ^
DETAIL:  timestamp without time zone versus date
"""
(the above confirming the set clause does provide information for the deduction, ts is timestamp, tsd is date)

> UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3::date

The parameter would seem to be first referenced as a timestamp (at least in parsing order).  As the explicit cast from timestamp to date is valid there is no inconsistency if we don't use the explicit cast for deduction and simply allow the cast to do its thing.

I'm not sure I actually believe this to be an improvement, but I also wasn't expecting that specific error message given the documentation, which at least lets the timestamp deduction stand to get past the assignment of a type to the parameter - letting some other type problem pop up during the rest of the planning or, as in this case, letting the explicit cast actually produce the desired result.

David J.

Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, May 3, 2022 at 12:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What do you think ought to happen?  The same parameter somehow having
>> different types in different places?

> Based upon:
> "When a parameter's data type is not specified or is declared as unknown,
> the type is inferred from the context in which the parameter is first
> referenced (if possible)."

Right ...

> The parameter would seem to be first referenced as a timestamp (at least in
> parsing order).

Actually, if you look at transformUpdateStmt, you'll see the parsing
order is WITH, then FROM, then WHERE, then RETURNING, then the targetlist.
Some of this is constrained by semantics: we have to do WITH first because
it'll define pseudo-tables used in FROM, and we have to do FROM before
we can make sense of the rest.  We could possibly do the targetlist
before WHERE, but any assumption that it works left-to-right is just
doomed to failure by the poor consistency of SQL syntax.

            regards, tom lane



Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
Daniele Varrazzo
Дата:
On Tue, 3 May 2022 at 21:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think it's operating exactly as designed.  The parameter can only
> have one type, and the cast is telling it to assume that that type
> is "date".

The same problem was actually reported without that cast. The OP was
using a query such as:

    UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3

$3 is used in two different contexts, a datetime and a date, and it
seems arbitrary that the date wins.

Actually, it doesn't seem to be the cast to decide the type. Adding a
timestamp cast and dropping the date one:

    UPDATE test289 SET num = $1, name = $2, ts = $3::timestamp WHERE
ts::date = $3'

reproduces the same artifact. A self-contained, psql-only test is:

=# create table test289 (num int, name text, ts timestamp);
=# insert into test289 values (300, 'Fred', '2022-03-03 11:00:00');
=# prepare tmpstat as update test289 set num = $1, name = $2, ts =
$3::timestamp where ts::date = $3;
=# execute tmpstat ('301', 'Fred2', '2022-03-03 20:00:00');
=# select * from test289;
┌─────┬───────┬─────────────────────┐
│ num │ name  │         ts          │
├─────┼───────┼─────────────────────┤
│ 301 │ Fred2 │ 2022-03-03 00:00:00 │
└─────┴───────┴─────────────────────┘

It seems that the date type is chosen arbitrarily, possibly depending
on the way the parsed query is traversed building the plan?

> > I see why it happens... I don't think it's the right behaviour though.
>
> What do you think ought to happen?  The same parameter somehow having
> different types in different places?

Looking at the above variation of the problem, maybe the right thing
to do would be to throw an error because the parameter can be
interpreted in different ways in different places? I seem to
understand, from David's example, that such an error is thrown, in
other contexts.

Another possibility is to fix the users. Many of them will see a
parametric query more like a C macro than like a C function, so
performing a literal replacement, because this is the behaviour they
get in psql typing:

    UPDATE ..., ts = '2022-03-03 11:00:00' WHERE ts::date =
'2022-03-03 11:00:00'::date

In psycopg, we might document this difference as one of the glitches
that can be met moving from client- to server-side binding
(https://www.psycopg.org/psycopg3/docs/basic/from_pg2.html#server-side-binding).

-- Daniele



Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
"David G. Johnston"
Дата:
On Tue, May 3, 2022 at 3:35 PM Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Tue, 3 May 2022 at 21:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think it's operating exactly as designed.  The parameter can only
> have one type, and the cast is telling it to assume that that type
> is "date".

The same problem was actually reported without that cast. The OP was
using a query such as:

    UPDATE test289 SET num = $1, name = $2, ts = $3 WHERE ts::date = $3

$3 is used in two different contexts, a datetime and a date, and it
seems arbitrary that the date wins.

As Tom mentioned in reply to me - that WHERE is evaluated before SET isn't arbitrary - nor is "first one wins".


It seems that the date type is chosen arbitrarily, possibly depending
on the way the parsed query is traversed building the plan?

Correct.  Parsing of a plan is deterministic.  Though I am curious to what extent this example might change if there were some layers of subqueries involved where the common parameter was used.

> > I see why it happens... I don't think it's the right behaviour though.
>
> What do you think ought to happen?  The same parameter somehow having
> different types in different places?

Looking at the above variation of the problem, maybe the right thing
to do would be to throw an error because the parameter can be
interpreted in different ways in different places? I seem to
understand, from David's example, that such an error is thrown, in
other contexts.

That horse has already left the barn.  While the current behavior is at least in part organically grown we tend to not introduce errors where none previously existed and where doing so would require breaking a long-established rule - in this case "first planned instance wins".  I do think that trying to clarify "first" better, since it's not the in-your-face textual order that is being used.

The error popped up because (I presume)  the two SET column references are tied in terms of "coming first" - their effective order doesn't matter so the rule could not be applied.


Another possibility is to fix the users. Many of them will see a
parametric query more like a C macro than like a C function, so
performing a literal replacement, because this is the behaviour they
get in psql typing:

    UPDATE ..., ts = '2022-03-03 11:00:00' WHERE ts::date =
'2022-03-03 11:00:00'::date

Extrapolating this particular suggested change from one example seems dangerous.  Particularly since much of this is limited to the case of treating a timestamp as both a timestamp and date at the same time.  Making any change from what is seemingly a rare corner-case doesn't seem to provide a sufficient benefit/cost ratio.

I have to put this into the "unfortunate foot-gun" category at this point.  Users should be testing their code.  I'm sure there are other concerns given the layer of indirection, but a general rule to "always cast your parameters" goes a long way if one chooses to avoid the API where explicit parameter types can be supplied.  Or at least "cast once, cast always" for any given parameter.

David J.

Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, May 3, 2022 at 3:35 PM Daniele Varrazzo <daniele.varrazzo@gmail.com>
> wrote:
>> Looking at the above variation of the problem, maybe the right thing
>> to do would be to throw an error because the parameter can be
>> interpreted in different ways in different places? I seem to
>> understand, from David's example, that such an error is thrown, in
>> other contexts.

> That horse has already left the barn.

Well, perhaps not.  It's certainly not great that the behavior depends
on non-obvious implementation details.  I could imagine changing to
a rule that says "if an unknown-type parameter is referenced more than
once, we must infer the same type for it at each of those places",
rather than "first one in parsing order wins".

What's not very clear is whether such a change would make more people
happy than sad.  It'd almost inevitably break some queries that had
coincidentally gotten the behavior their authors wanted.  But the
key word there is "coincidentally", I suspect.

> I do
> think that trying to clarify "first" better, since it's not the
> in-your-face textual order that is being used.

I'm not thrilled with having to document the parsing order, especially
since it's arbitrary in some places (e.g., why RETURNING before the
UPDATE list --- that doesn't seem to make a lot of sense).

            regards, tom lane



Re: Types pollution with iso-8859-1 oids and server-side parameters binding

От
Daniele Varrazzo
Дата:
On Wed, 4 May 2022 at 01:06, David G. Johnston
<david.g.johnston@gmail.com> wrote:

> Extrapolating this particular suggested change from one example seems dangerous.  Particularly since much of this is
limitedto the case of treating a timestamp as both a timestamp and date at the same time.  Making any change from what
isseemingly a rare corner-case doesn't seem to provide a sufficient benefit/cost ratio. 
>
> I have to put this into the "unfortunate foot-gun" category at this point.  Users should be testing their code.  I'm
surethere are other concerns given the layer of indirection, but a general rule to "always cast your parameters" goes a
longway if one chooses to avoid the API where explicit parameter types can be supplied.  Or at least "cast once, cast
always"for any given parameter. 

Yes, this was admittedly a contrived case, not only for the use of ts,
but also because ts was passed as a string rather than a Python
datetime, which would have resulted in the parameter oid specified.
So, many layers of good practices weren't followed in the first
place... making me think that specifying such a corner case in the
documentation would have limited utility (a rare occurrence mostly
encountered if someone doesn't read the docs).

-- Daniele