Re: [PATCH] kNN for btree

Поиск
Список
Период
Сортировка
От Nikita Glukhov
Тема Re: [PATCH] kNN for btree
Дата
Msg-id 351c1866-5f38-5e7c-e848-73df9c6668b0@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [PATCH] kNN for btree  (Anastasia Lubennikova <lubennikovaav@gmail.com>)
Ответы Re: [PATCH] kNN for btree
Список pgsql-hackers

Attached 9th version of the patches.

On 03.03.2019 12:46, Anastasia Lubennikova wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Hi,
thank you for your work on this patch.
Thank you for you review.

Patch #1 is ready for commit.
It only fixes lack of refactoring after INCLUDE index patch.

Patches 2-7 are ready for committer's review.
I found no significant problems in algorithm or implementation.
Here are few suggestions to improve readability:

1) patch 0002.
* Returned matched index clause exression.
* Number of matched index column is returned in *indexcol_p.

Typos in comment:
exPression
index columnS 
"exPression" is fixed.
But there should be "column" because only single index column is matched.
2) patch 0002. 
+			/*
+			 * We allow any column of this index to match each pathkey; they
+			 * don't have to match left-to-right as you might expect.
+			 */

Before refactoring this comment had a line about gist and sp-gist specific:

-			 * We allow any column of the index to match each pathkey; they
-			 * don't have to match left-to-right as you might expect.  This is
-			 * correct for GiST, and it doesn't matter for SP-GiST because
-			 * that doesn't handle multiple columns anyway, and no other
-			 * existing AMs support amcanorderbyop.  We might need different
-			 * logic in future for other implementations.

Now, when the code was moved to a separate function, it is not clear why the
same logic is ok for b-tree and other index methods.  If I got it right, it
doesn't affect the correctness of the algorithm, because b-tree specific 
checks are performed in another function.  Still it would be better to 
explain it in the comment for future readers.
It seems that match_orderbyop_pathkey() is simply the wrong place for this
comment.  I moved it into match_orderbyop_pathkeys() which is implementation of
ammatchorderby() for GiST an SP-GiST.  Also I added old sentence about its
correctness for GiST and SP-GiST.
3) patch 0004if (!so->distanceTypeByVal){  so->state.currDistance = PointerGetDatum(NULL);  so->state.markDistance = PointerGetDatum(NULL); }

Why do we reset these fields only for !distanceTypeByVal?
These fields should be initialized (it is initialization, not reset) only for
by-ref types because before writing a new distance values to these fields, 
the previous by-ref values are pfreed.  The corresponding comment was added.

4) patch 0004
+ * _bt_next_item() -- Advance to next tuple on current page;
+ * or if there's no more, try to step to the next page with data.
+ *
+ * If there are no more matching records in the given direction
*/

Looks like the last sentence of the comment is unfinished.
Yes, "false is returned" is missing. Fixed.
5) patch 0004
_bt_start_knn_scan()

so->currRightIsNearest = _bt_compare_current_dist(so, rstate, lstate);
/* Reset right flag if the left item is nearer. */
right = so->currRightIsNearest;

If this comment relates to the string above it?
No, it relates only to string below. 'right' flag determines later the selected 
scan direction, so 'currRightIsNearest' should be assigned to it. This comment 
was rewritten.
6) patch 0004
_bt_parallel_seize()

+ scanPage = state == &so->state
+ ? &btscan->btps_scanPage
+ : &btscan->btps_knnScanPage;

This code looks a bit tricke to me. Why do we need to pass BTScanState state
to _bt_parallel_seize() at all? Won't it be enough to choose the page before
function call and pass it?
If we will pass page, then we will have to pass it through the whole function 
tree:
_bt_parallel_seize() _bt_steppage()   _bt_next_item()     _bt_next_nearest()   _bt_load_first_page()     _bt_init_knn_scan() _bt_readnextpage()   _bt_parallel_readpage()     _bt_first()

I decided simply to add flag 'isKnn' to BtScanState, so the code now looks like 
this: scanPage = state->isKnn   ? &btscan->btps_scanPage   : &btscan->btps_knnScanPage;

I also can offer to add 'id' (0/1) to BtScanState instead, then the code will 
look like this: scanPage = &btscan->btps_scanPages[state->id];

7) patch 0006
+  <title>Upgrade notes for version 1.6</title>
+
+  <para>
+   In version 1.6 <literal>btree_gist</literal> switched to using in-core
+   distance operators, and its own implementations were removed.  References to
+   these operators in <literal>btree_gist</literal> opclasses will be updated
+   automatically during the extension upgrade, but if the user has created
+   objects referencing these operators or functions, then these objects must be
+   dropped manually before updating the extension.

Is this comment still relevant after the latest changes?
Functions are not removed, they're still in contrib.
Yes, comment is still relevant. SQL functions and operators are dropped, 
but C functions remain (see [1]).
Patches #8 and #9 need more review and tests.
I'll try to give a feedback on them in the week.

P.S. many thanks for splitting the code into separate patches. It made review a lot easier.

The new status of this patch is: Waiting on Author
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: patch to allow disable of WAL recycling
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Incomplete startup packet errors