Обсуждение: RealDictCursor behavior question

Поиск
Список
Период
Сортировка

RealDictCursor behavior question

От
Israel Brewster
Дата:
When using the RealDictCursor from psycopg2 extras, is the behavior of RealDictCursor well defined for the case where you have repeated column names (such as in a join)? That is, obviously a dictionary can only have one value for a key, so is it safe to assume that the value for the column name key in the case of repeated column names is always going to be the one that appears LAST in the query?
-----------------------------------------------
Israel Brewster
Systems Analyst II
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------


  



  


Вложения

Re: RealDictCursor behavior question

От
Christophe Pettus
Дата:
> On Apr 30, 2018, at 11:18, Israel Brewster <ibrewster@flyravn.com> wrote:
>
> When using the RealDictCursor from psycopg2 extras, is the behavior of RealDictCursor well defined for the case where
youhave repeated column names (such as in a join)? That is, obviously a dictionary can only have one value for a key,
sois it safe to assume that the value for the column name key in the case of repeated column names is always going to
bethe one that appears LAST in the query? 

I don't believe it is guaranteed.  The best approach is to rename duplicate column names using AS so you are sure you
aregetting back what you expect. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: RealDictCursor behavior question

От
Daniele Varrazzo
Дата:
On Mon, Apr 30, 2018 at 7:18 PM, Israel Brewster <ibrewster@flyravn.com> wrote:
When using the RealDictCursor from psycopg2 extras, is the behavior of RealDictCursor well defined for the case where you have repeated column names (such as in a join)? That is, obviously a dictionary can only have one value for a key, so is it safe to assume that the value for the column name key in the case of repeated column names is always going to be the one that appears LAST in the query?

Not at all: the result is at best an error, at worse an undefined behaviour. Even if at the moment the result appears consistent we may change the implementation in the future.

Do specify explicit names for the columns, and never use `select *` in production, which may result in accidental duplications (maybe not now, but in the future, when you add an apparently unrelated field to a table and grep can't help you).


-- Daniele

Re: RealDictCursor behavior question

От
Israel Brewster
Дата:
On Apr 30, 2018, at 10:58 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:

On Mon, Apr 30, 2018 at 7:18 PM, Israel Brewster <ibrewster@flyravn.com> wrote:
When using the RealDictCursor from psycopg2 extras, is the behavior of RealDictCursor well defined for the case where you have repeated column names (such as in a join)? That is, obviously a dictionary can only have one value for a key, so is it safe to assume that the value for the column name key in the case of repeated column names is always going to be the one that appears LAST in the query?

Not at all: the result is at best an error, at worse an undefined behaviour. Even if at the moment the result appears consistent we may change the implementation in the future.

Understood. Of course, that's why I asked.

Just out of curiosity though: why? The behavior of Postgresql with repeated column names is neither an error or undefined. The following query works fine, and produces well-defined results:

SELECT 'one' AS one, 'two' AS two, 'three' AS one;

No errors are given, and the columns are returned in the same order as given in the query. In fact, this behavior is required for the basic non-dict cursor to work, otherwise you wouldn't be able to reference by index.

The behavior of python dictionaries when presented with multiple values for the same key is also well-defined and not an error: the last value presented is the value it gets.

So, given that both the input and the output have well-defined, non-error behavior for the situation, why does the middle step - assigning the raw result from the query to a dictionary - become "at best an error, at worse an undefined behavior"?

PLEASE NOTE: I am not arguing about the implementation here, if it is good or bad. I'm just curious about the design choices that went into it, and why this behavior *isn't* well-defined, especially given the ease with which it could arise.


Do specify explicit names for the columns, and never use `select *` in production, which may result in accidental duplications (maybe not now, but in the future, when you add an apparently unrelated field to a table and grep can't help you).

Yes, that is good reasoning *in general*, however in my particular use case, using `select *` makes things so much cleaner as to be worth the -again, in my use case - quite small risk. At least, until now that was the case. :-) 



-- Daniele

Re: RealDictCursor behavior question

От
Christophe Pettus
Дата:
> On Apr 30, 2018, at 12:19, Israel Brewster <ibrewster@flyravn.com> wrote:
> Yes, that is good reasoning *in general*, however in my particular use case, using `select *` makes things so much
cleaneras to be worth the -again, in my use case - quite small risk. At least, until now that was the case. :-)  

Remember that you can always introspect the column names, cache them locally, and then create whatever dict you want.
Thatway, it's 100% guaranteed to do the right thing for your application. 

--
-- Christophe Pettus
   xof@thebuild.com



Re: RealDictCursor behavior question

От
Israel Brewster
Дата:
On Apr 30, 2018, at 11:24 AM, Christophe Pettus <xof@thebuild.com> wrote:


On Apr 30, 2018, at 12:19, Israel Brewster <ibrewster@flyravn.com> wrote:
Yes, that is good reasoning *in general*, however in my particular use case, using `select *` makes things so much cleaner as to be worth the -again, in my use case - quite small risk. At least, until now that was the case. :-)

Remember that you can always introspect the column names, cache them locally, and then create whatever dict you want.  That way, it's 100% guaranteed to do the right thing for your application.

Good point. I'll have to do that :-) Should be easy enough, since I know which columns I need to mess with. Sometimes it's the simple solutions...

-----------------------------------------------
Israel Brewster
Systems Analyst II
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------


--
-- Christophe Pettus
  xof@thebuild.com


Re: RealDictCursor behavior question

От
Karsten Hilbert
Дата:
Well, both Python's and PostgreSQL's choice how to handle
duplicates are just that, clear-cut choices.

But they cannot be reconciled without violating POLA.

Karsten
-- 


Re: RealDictCursor behavior question

От
Daniele Varrazzo
Дата:
On Mon, Apr 30, 2018 at 8:19 PM, Israel Brewster <ibrewster@flyravn.com> wrote:

> Just out of curiosity though: why? The behavior of Postgresql with repeated
> column names is neither an error or undefined. The following query works
> fine, and produces well-defined results:
>
> SELECT 'one' AS one, 'two' AS two, 'three' AS one;

The above might work as a mechanism but it's totally ambiguous as
semantic, if you use the name as mapping to the position or the value.
What is the value of the column 'one' in the above query? The only
valid answer is "I don't know". The answer 'three' is as valid as
'one'. The answer "the field 'one' is the third in the recordset" is
as valid as "the field 'one' is the first in the recordset".

> No errors are given, and the columns are returned in the same order as given
> in the query. In fact, this behavior is required for the basic non-dict
> cursor to work, otherwise you wouldn't be able to reference by index.
>
> The behavior of python dictionaries when presented with multiple values for
> the same key is also well-defined and not an error: the last value presented
> is the value it gets.

This is an implementation detail. I wasn't aware of this:

    >>> {'one': 'one', 'two': 'two', 'one': 'three'}
    {'one': 'three', 'two': 'two'}

and even now that I know it, I wouldn't touch this "feature" with a bargepole.

    >>> def f(**kwargs): pass
    >>> f(one='one', two='two', one='three')
     File "<stdin>", line 1
    SyntaxError: keyword argument repeated

It seems inconsistent to me. I would have expected the semantic of the
above to be pretty much the same of f(**{'one': 'one', 'two': 'two',
'one': 'three'}).

> So, given that both the input and the output have well-defined, non-error
> behavior for the situation, why does the middle step - assigning the raw
> result from the query to a dictionary - become "at best an error, at worse
> an undefined behavior"?

For some entirely random reasons, arbitrary or justifiable or just for
lulz, we may decide to start populating the dictionary from the last
to the first field, or boustrophedonically, or in a spiral. Or we may
start relying on the python function keywords argument semantic. You
are just relying on the implementation detail that the order of
population is from left to right. The only interface the DictCursor
guarantees is that the results will be dictionaries: there is no
promise about how they were created.

> Do specify explicit names for the columns, and never use `select *` in
> production, which may result in accidental duplications (maybe not now, but
> in the future, when you add an apparently unrelated field to a table and
> grep can't help you).
>
>
> Yes, that is good reasoning *in general*, however in my particular use case,
> using `select *` makes things so much cleaner as to be worth the -again, in
> my use case - quite small risk. At least, until now that was the case. :-)

I know, it's tempting... :P

Even if the implementation doesn't change, good luck hoping that the
second table of your query will not grow a field overriding one of the
first - not raising an error but causing a change in the content of
the dicts. If your level of confidence that the system will not change
in a way that goes against your assumption you are totally free to use
the accidental behaviour of the current implementation. But we will
not guarantee it to be maintained even across minor releases, enforce
the behaviour in the test suite, or accept a bug report if you got
bitten by a change.

-- Daniele