Re: Understanding and implementing a GiST Index

Поиск
Список
Период
Сортировка
От Connor Wolf
Тема Re: Understanding and implementing a GiST Index
Дата
Msg-id 54374DD3.40308@imaginaryindustries.com
обсуждение исходный текст
Ответ на Re: Understanding and implementing a GiST Index  (Sergey Konoplev <gray.ru@gmail.com>)
Список pgsql-general
I had skimmed the presentation slides, but I hadn't looked that closely
because it appeared to be using cosine similarity metrics, and only
operated on matrices, neither of which are what I wanted.

On closer examination, I think I could explode my packed hash values to
a matrix. I'm not sure how the cosine similarity metric would work given
that the arrays would only contain the values either 0 or 1 (as my hash
value is fundamentally a integer of configurable length (you can alter
the hash size, I'm using 64 bits)).

I'll check out the code and poke it a bit.

I'll probably also move this discussion to the hackers mailing list (the
instructions say to ask here first, but Oleg suggested I go there, and I
generally agree).

Connor

On 10/9/2014 8:19 AM, Sergey Konoplev wrote:
> On Thu, Oct 9, 2014 at 12:09 AM, Connor Wolf
> <wolf@imaginaryindustries.com> wrote:
>> I'm trying to implement a custom indexing system using the GiST index
>> framework, and I have a few questions.
>> Basically, I'm trying to implement a system that allows me to search across
>> a set of 64 bit integers by hamming distance. This is intended to be used in
>> searching for similar images, where the 64 bit integer is actually a phash
>> of an image, and similarity between two images is reflected in the hamming
>> distance between two integers.
> Have you seen the smlar extension?
>
> http://www.pgcon.org/2012/schedule/events/443.en.html
> http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=blob;hb=HEAD;f=README
> http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
>
>> Anyways, The appropriate approach here is to use something called a BK tree,
>> for which I've written some test code and I think I have a decent grip of
>> (my test code seems to work, in any event).
>>
>> That said:
>>
>> Is there a simple piece of example-code for implementing a GiST index for a
>> single index? I've been looking through the /contrib/ directory,
>> particularly the /contrib/btree_gist/ files, but it looks like 90% of the
>> complexity there is related to supporting all the column types Postgres has,
>> rather then actually tied to producing a functional index.
>> Once I have something compiling, how can I check to be sure that I'm
>> actually using the indexing module I created? I can enable my compiled
>> extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX
>> test_index ON testing USING gist (val);` that lets me specify *which* GiST
>> index is actually employed? Is this even a valid question?
>> This is probably something that's available in one of the system tables
>> somewhere, but I haven't had much luck with google finding out where.
>> Testing: What's the appropriate way to examine the generated tree structure
>> of the index? I certainly went through a few bugs with my test tree system
>> (and that was in python!). Is there any way to examine the index structure
>> for debugging purposes?
>> Also, it looks like `ereport()` is the proper way to emit debugging
>> information. Is this correct?
>> In that vein, is there any way to have information that is on the operations
>> of an entire query? Looking at the number of tree nodes touched for a scan
>> would be nice (and I would not be surprised if there is already a facility
>> for it).
>>
>> Project code is here if anyone is interested, any help would be great. I
>> have very little idea what I'm doing.
>>
>> Thanks,
>> Connor
>
>



В списке pgsql-general по дате отправления:

Предыдущее
От: "Leonardo M. Ramé"
Дата:
Сообщение: Re: psql generate insert command based on select
Следующее
От: "vibhor.kumar@enterprisedb.com"
Дата:
Сообщение: Re: psql generate insert command based on select