Re: PG 7.2b4 bug?

Поиск
Список
Период
Сортировка
От Don Baccus
Тема Re: PG 7.2b4 bug?
Дата
Msg-id 3C1FBDAA.6080101@pacifier.com
обсуждение исходный текст
Ответ на PG 7.2b4 bug?  (Don Baccus <dhogaza@pacifier.com>)
Список pgsql-hackers
Tom Lane wrote:

> Don Baccus <dhogaza@pacifier.com> writes:
> 
>>Maybe the behavior's implementation defined ... if not, I'd presume SQL3 
>>  states that a function in the above context is called either once per 
>>row or once per query, not sometimes one or sometimes the other.


This is still bothering me so I decided to plunge into the standard 
myself.  First of all...


> 
> AFAICT, the relevant concept in SQL99 is whether a function is
> "deterministic" or not:


Actually this argument may well apply to the function all within the 
subselect or view, but I fail to see any language in the standard that 
suggests that this trumps the following declaration about the execution 
of a <query specification> (what many of us informally refer to as a 
"SELECT"):

(from section 7.12, Query Specification)

a) If T is not a grouped table, then
              Case:

(I deleted Case i, which refers to standard aggregates like COUNT)
             ii) If the <select list> does not include a <set function                 specification> that contains a
referenceto T, then                 each <value expression> is applied to each row of T 
 
                 yielding a table of M rows, where M is the cardinality 
                 of T ...

(FYI a <set function specification> is a standard aggregate like COUNT, 
i.e. Case ii pertains to those queries that don't fall into Case i)

ISTM that this quite clearly states that a subselect in a target list 
should be applied to each row to be returned in M.  I don't see any 
waffle-room here.  I would have to dig more deeply into the standard's 
view regarding VIEW semantics but I would assume it would knit together 
in a consistent manner.

For instance, earlier we saw the following exchange between Stephen and Tom:


Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > However, I'd think:> > "select (select nextval from test_seq) from multiple_rows;"> > should give you different
valuesand doesn't, although> > "select (select nextval from test_seq where i IS NULL or i IS NOT NULL)> > from
multiple_rows;"does give you different values.
 
>> In the first case, the subselect is visibly not dependent on the outer> query, so it's evaluated only once; in the
secondcase it has to be> re-evaluated for each row using that row's value of i.
 

Note that the standard does not give you this freedom.  It says the 
<value expression> (in this case the subselect, and yes subselects are 
valid <value expressions> in SQL3, at least in my draft) must be applied 
to each row.

IMO this means that the optimizer can choose to evaluate the <value 
expression> once only if it knows for certain that multiple calls will 
return the same value.  For example "my_cachable_function()", not 
"my_uncachable_function()" or "nextval()".

Or IMO a view built using a non-cachable function.

In other words it can only suppress evaluation if it can be certain that 
doing so doesn't change the result.

Another nit-pick, the claim's not even strictly true.  "i IS NULL OR i 
IS NOT NULL" can be folded to true, so the subselect's not  "visibly 
dependent on i". In fact, it is quite visibly *not* dependent on the 
outer query.  PG just isn't smart enough to fold the boolean expression 
into the known value "true".

It's this kind of uncertainty that makes the current behavior so ... 
ugly.  You get different answers depending on various optimization 
values, the complexity of the query, etc.

ISTM that the standard is quite clearly worded to avoid this unpleasantness.

...


> It looks to me like the spec does NOT attempt to nail down the behavior
> of non-deterministic functions; in the places where they talk about
> non-deterministic functions at all, it's mostly to forbid their use in
> contexts where nondeterminism would affect the final result.  Otherwise
> the results are implementation-dependent.


I've been looking at a few of the "non-deterministic" clauses in the 
General Rules, out of curiousity.

They generally aren't involved with the execution or non-execution of 
expressions.  Ordering of execution is in many cases non-deterministic 
and implementation-dependent.  There are plenty of General Rules of this 
sort.

We also have this big gaping black hole of non-determinism due to 
character set collation.

In other words:

select foo
from bar
order by foo;

is non-deterministic (we don't know the order in which the rows will be 
returned) if foo is a character type.  This can even be true within 
implementations, for instance in PG it changes with when you change 
locales (and have locale support enabled).

However, it seems clear that:

select foo, my_function()
from bar
order by foo;

requires my_function() to be called for every row - we just can't depend 
on the order in which it will be applied to those rows in the case where 
foo is a character type.  Of course, iscachable tells the optimizer that 
it's OK to just call it once but that's an extension outside SQL3's 
domain.

Obviously if you run this query over and over again with the same 
collation order the "order by" is deterministic.  The non-determinism is  in respect to the portability of the query to
implementationsbuilt on 
 
differing character sets.

I'm just not seeing justification for claiming that Section 7.12 can be 
ignored if the subselect or view happens to contain a function that's 
not cachable.

-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hiroshi Inoue"
Дата:
Сообщение: Re: problems with table corruption continued
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Explicit config patch 7.2B4