Обсуждение: Buggy logic in nodeIndexscan.c queue handling

Поиск
Список
Период
Сортировка

Buggy logic in nodeIndexscan.c queue handling

От
Tom Lane
Дата:
When deciding whether to pop entries from the queue (line 191), the
comparison is done against scandesc->xs_orderbyvals, which as the comment
states are the last values physically returned by the index.  I think this
is wrong, or at least pretty inefficient, in the case that the last index
tuple was lossy.  What will frequently happen is that the comparison value
will be too small so we'll not be able to pop the queue item, whereas if
we'd compared against the corrected values in node->iss_OrderByValues,
we would have popped the item.  I don't think this can result in a visible
failure, but for sure it can result in unnecessary queue traffic.
So ISTM we ought to preserve the state currently represented only in the
local variables lastfetched_vals/lastfetched_nulls, and use that in the
pop comparison.

OTOH, I've not yet consumed any caffeine today so maybe I missed
something.

One thing that is sadly lacking from this whole patch is any clear
specification of the behavior required from the index AM.  Is it expected
to return items in the exact order implied by their reported (possibly
lossy) distance values?  Or would it be okay, for instance, to return them
in the actually correct final order even though it returns lossy distance
values that are then not in order?  (And if the latter is disallowed, why?)
Without such a specification it's impossible to construct an argument that
this queuing mechanism delivers correct answers.
        regards, tom lane



Re: Buggy logic in nodeIndexscan.c queue handling

От
Heikki Linnakangas
Дата:
On 05/25/2015 06:35 PM, Tom Lane wrote:
> When deciding whether to pop entries from the queue (line 191), the
> comparison is done against scandesc->xs_orderbyvals, which as the comment
> states are the last values physically returned by the index.  I think this
> is wrong, or at least pretty inefficient, in the case that the last index
> tuple was lossy.  What will frequently happen is that the comparison value
> will be too small so we'll not be able to pop the queue item, whereas if
> we'd compared against the corrected values in node->iss_OrderByValues,
> we would have popped the item.  I don't think this can result in a visible
> failure, but for sure it can result in unnecessary queue traffic.
> So ISTM we ought to preserve the state currently represented only in the
> local variables lastfetched_vals/lastfetched_nulls, and use that in the
> pop comparison.

No, that would be wrong.

> One thing that is sadly lacking from this whole patch is any clear
> specification of the behavior required from the index AM.  Is it expected
> to return items in the exact order implied by their reported (possibly
> lossy) distance values?

Yes.

> Or would it be okay, for instance, to return them
> in the actually correct final order even though it returns lossy distance
> values that are then not in order?  (And if the latter isa disallowed, why?)
> Without such a specification it's impossible to construct an argument that
> this queuing mechanism delivers correct answers.

No, that's not allowed. Returning them in actually correct final order 
might happen to work, because the executor would not reorder them to 
wrong order, but in general the index AM is expected to return in the 
order of the lossy distance values. The queueing mechanism can be 
described as:

1. Read tuple from index AM. Calculate actual distance and push the 
tuple to the queue. Remember the lossy distance returned by the index 
AM, as X.
2. Return all tuples from queue with actual distance < X
3. Goto 1.

Whenever the index AM returns a tuple with lossy distance X, it's a 
promise that it will not later return a tuple with actual distance >= X. 
So at any point, we can return all tuples with actual distance < X, 
because we know we won't get any more of those from the index.


Hmm. To be precise, I guess it's not strictly required that the index AM 
returns tuples in the exact order of the lossy distance values, as long 
as that promise holds. In practice, it's not a very useful difference. 
If the index AM calculates a distance estimate of X for one tuple, and < 
X for the next tuple, but it somehow knows that the first tuple comes 
before the second tuple when ordered by the actual distances, then the 
actual distance of the second tuple must in fact also be >= X or the 
promise would be broken. So it might as well return X as the estimate 
for both.


Yeah, admittedly the documentation is not too clear on that. I'll try to 
come up with something...

- Heikki