Re: KNN-GiST with recheck
От | Emre Hasegeli |
---|---|
Тема | Re: KNN-GiST with recheck |
Дата | |
Msg-id | 20140803134809.GA58181@hasegeli-2.local обсуждение исходный текст |
Ответ на | Re: KNN-GiST with recheck (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: KNN-GiST with recheck
Re: KNN-GiST with recheck |
Список | pgsql-hackers |
> > 1. This patch introduces a new "polygon <-> point" operator. That seems > > useful on its own, with or without this patch. > > Yeah, but exact-knn cant come with no one implementation. But it would > better come in a separate patch. I tried to split them. Separated patches are attached. I changed the order of the arguments as point <-> polygon, because point was the first one on all the others. Its commutator was required for the index, so I added it on the second patch. I also added tests for the operator. I think it is ready for committer as a separate patch. We can add it to the open CommitFest. I have made some cosmetic changes on the patches. I hope they are useful. I added support to point <-> circle operator with the same GiST distance function you added for polygon. I can change it, if it is not the right way. > > 2. I wonder how useful it really is to allow mixing exact and non-exact > > return values from the distance function. The distance function included in > > the patch always returns recheck=true. I have a feeling that all other > > distance function will also always return either true or false. > > For geometrical datatypes recheck variations in consistent methods are also > very rare (I can't remember any). But imagine opclass for arrays where keys > have different representation depending on array length. For such opclass > and knn on similarity recheck flag could be useful. I also wonder how useful it is. Your example is convincing, but maybe setting it index-wide will make the decisions on the framework easier. For example, how hard would it be to decide if further sorting is required or not on the planner? > > 4. (as you mentioned in the other thread: ) It's a modularity violation > > that you peek into the heap tuple from gist. I think the proper way to do > > this would be to extend the IndexScan executor node to perform the > > re-shuffling of tuples that come from the index in wrong order, or perhaps > > add a new node type for it. > > > > Of course that's exactly what your partial sort patch does :-). I haven't > > looked at that in detail, but I don't think the approach the partial sort > > patch takes will work here as is. In the KNN-GiST case, the index is > > returning tuples roughly in the right order, but a tuple that it returns > > might in reality belong somewhere later in the ordering. In the partial > > sort patch, the "input stream" of tuples is divided into non-overlapping > > groups, so that the tuples within the group are not sorted, but the groups > > are. I think the partial sort case is a special case of the KNN-GiST case, > > if you consider the lower bound of each tuple to be the leading keys that > > you don't need to sort. > > Yes. But, for instance btree accesses heap for unique checking. Is really > it so crimilal? :-) > This is not only question of a new node or extending existing node. We need > to teach planner/executor access method can return value of some expression > which is lower bound of another expression. AFICS now access method can > return only original indexed datums and TIDs. So, I afraid that enormous > infrastructure changes are required. And I can hardly imagine what they > should look like. Unfortunately, I am not experienced enough to judge your implementation. As far as I understand the problem is partially sorting rows on the index scan node. It can lead the planner to choose non-optimal plans, because of not taking into account the cost of sorting.
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Re: Proposed changing the definition of decade for date_trunc and extract