Re: [HACKERS] What is "index returned tuples in wrong order" forrecheck supposed to guard against?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] What is "index returned tuples in wrong order" forrecheck supposed to guard against?
Дата
Msg-id CA+TgmoauhLf6R07sAUzQiRcstF5KfRw7nwiWn4VZgiSF8MaQaw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] What is "index returned tuples in wrong order" for recheck supposed to guard against?  ("Regina Obe" <lr@pcorp.us>)
Ответы Re: [HACKERS] What is "index returned tuples in wrong order" forrecheck supposed to guard against?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe <lr@pcorp.us> wrote:
>> cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance.
>> The estimated distance can be less than the actual distance, but it isn't allowed to be more.  See
gist_bbox_distancefor an example of a "lossy" distance calculation, and more generally "git show
35fcb1b3d038a501f3f4c87c05630095abaaadab".
>
> Did you mean would return < 0 ?

Yes, sorry.

> Since I thought 0 meant exact and not where it's Erroring?
>
> I think for points then maybe we should turn it off, as this could just be floating point issues with the way we
computethe index.
 
> That would explain why it doesn't happen for other cases like  polygon / point in our code
> or polygon /polygon in our code since the box box distance in our code would always be <= actual distance for those.
>
> So maybe the best course of action is just for us inspect the geometries and if both are points just disable
recheck.
>
> It's still not quite clear to me even looking at that git commit, why those need to error instead of going thru
recheckaside from efficiency.
 

The code that reorders the returned tuples assumes that (1) the actual
distance is always greater than or equal to the estimated distance and
(2) the index returns the tuples in order of increasing estimated
distance.  Imagine that the estimated distances are 0, 1, 2, 3... and
the real distances are 2,3,4,5...  When it sees the
estimated-distance-0 tuple it computes that the actual distance is 2,
but it doesn't know whether there's going to be a tuple later with an
actual distance between 0 and 2, so it buffers the tuple. When it sees
the estimated-distance-1 tuple it computes that the actual distance is
2, and now it knows there won't be any more estimated or actual
distances between 0 and 1, but there could still be a tuple with an
estimated distance of 1 and 2 whose actual distance is also between 1
and 2, so it buffers the second tuple as well.  When it sees the third
tuple, with estimated distance 2, it now knows that there won't be any
further tuples whose estimated or actual distance is less than 2.  So
now it can emit the first tuple that it saw, because that had an
actual distance of 2 and from this point forward the index will never
return anything with a smaller estimated or actual distance.  The
estimated-distance-1 tuple still has to stay in the buffer, though,
until we see a tuple whose estimated distance is greater than that
tuple's actual distance (3).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vladimir Rusinov
Дата:
Сообщение: Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] Proposal for changes to recovery.conf API