semi-PoC: kNN-gist for cubes

Поиск
Список
Период
Сортировка
От Jay Levitt
Тема semi-PoC: kNN-gist for cubes
Дата
Msg-id 4F30616D.3030509@gmail.com
обсуждение исходный текст
Ответы Re: semi-PoC: kNN-gist for cubes  (David Fetter <david@fetter.org>)
Список pgsql-hackers
I have a rough proof-of-concept for getting nearest-neighbor searches 
working with cubes.  When I say "rough", I mean "I have no idea what I'm 
doing and I haven't written C for 15 years but I hear it got standardized 
please don't hurt me".  It seems to be about 400x faster for a 3D cube with 
1 million rows, more like 10-30x for a 6D cube with 10 million rows.

The patch adds operator <-> (which is just the existing cube_distance 
function) and support function 8, distance (which is just g_cube_distance, a 
wrapper around cube_distance).

The code is in no way production-quality; it is in fact right around "look! 
it compiles!", complete with pasted-in, commented-out code from something I 
was mimicking.  I thought I'd share at this early stage in the hopes I might 
get some pointers, such as:

- What unintended consequences should I be looking for?
- What benchmarks should I do?
- What kind of edge cases might I consider?
- I'm just wrapping cube_distance and calling it through DirectFunctionCall; 
it's probably more proper to extract out the "real" function and call it 
from both cube_distance and g_cube_distance. Right?
- What else don't I know?  (Besides C, funny man.)

The patch, such as it is, is at:

https://github.com/jaylevitt/postgres/commit/9cae4ea6bd4b2e582b95d7e1452de0a7aec12857

with an even-messier test at

https://github.com/jaylevitt/postgres/commit/daa33e30acaa2c99fe554d88a99dd7d78ff6c784

I initially thought this patch made inserting and indexing slower, but then 
I realized the fast version was doing 1 million rows, and the slow one did 
10 million rows.  Which means: dinnertime.

Jay Levitt


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Progress on fast path sorting, btree index creation time
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Progress on fast path sorting, btree index creation time