Re: KNN-GiST with recheck

Поиск

Список

Период

Сортировка

От	Emre Hasegeli
Тема	Re: KNN-GiST with recheck
Дата	25 сентября 2014 г. 20:00:31
Msg-id	20140925170046.GA29657@hasegeli.com обсуждение исходный текст
Ответ на	Re: KNN-GiST with recheck (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы	Re: KNN-GiST with recheck (Alexander Korotkov <aekorotkov@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

> Fixed, thanks.

Here are my questions and comments about the code.

doc/src/sgml/gist.sgml:812:
>        be rechecked from heap tuple before tuple is returned.  If
>        <literal>recheck</> flag isn't set then it's true by default for
>        compatibility reasons.  The <literal>recheck</> flag can be used only

Recheck flag is set to false on gistget.c so I think it should say
"false by default".  On the other hand, it is true by default on
the consistent function.  It is written as "the safest assumption"
on the code comments.  I don't know why the safest is chosen over
the backwards compatible for the consistent function.

src/backend/access/gist/gistget.c:505:
>             /* Recheck distance from heap tuple if needed */
>             if (GISTSearchItemIsHeap(*item) &&
>                 searchTreeItemNeedDistanceRecheck(scan, so->curTreeItem))
>             {
>                 searchTreeItemDistanceRecheck(scan, so->curTreeItem, item);
>                 continue;
>             }

Why so->curTreeItem is passed to these functions?  They can use
scan->opaque->curTreeItem.

src/backend/access/gist/gistscan.c:49:
>         /*
>          * When all distance values are the same, items without recheck
>          * can be immediately returned.  So they are placed first.
>          */
>         if (recheckCmp == 0 && distance_a.recheck != distance_b.recheck)
>             recheckCmp = distance_a.recheck ? 1 : -1;

I don't understand why items without recheck can be immediately
returned.  Do you think it will work correctly when there is
an operator class which will return recheck true and false for
the items under the same page?

src/backend/access/index/indexam.c:258:
>     /* Prepare data structures for getting original indexed values from heap */
>     scan->indexInfo = BuildIndexInfo(scan->indexRelation);
>     scan->estate = CreateExecutorState();
>     scan->slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));

With the changes in indexam.c, heap access become legal for all index
access methods.  I think it is better than the previous version but
I am leaving the judgement to someone experienced.  I will try to
summarize the pros and cons of sorting the rows in the GiST access
method, as far as I understand.

Pros:

* It does not require another queue.  It should be effective to sort the rows inside the queue the GiST access method
alreadyhas.
 
* It does not complicate index access method infrastructure.

Cons:

* It could be done without additional heap access.
* Other access methods could make use of the sorting infrastructure one day.
* It could be more transparent to the users.  Sorting information could be shown on the explain output.
* A more suitable data structure like binary heap could be used for the queue to sort the rows.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Josh Berkus
Дата: 25 сентября 2014 г., 19:58:45
Сообщение: Re: jsonb format is pessimal for toast compression

Следующее

От: Jeff Janes
Дата: 25 сентября 2014 г., 20:00:32
Сообщение: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: KNN-GiST with recheck

Предыдущее

Следующее