indexing array columns
| От | Rajarshi Guha | 
|---|---|
| Тема | indexing array columns | 
| Дата | |
| Msg-id | 1176502190.6880.8.camel@panda обсуждение исходный текст | 
| Ответы | Re: indexing array columns Re: indexing array columns | 
| Список | pgsql-general | 
Hi, I have a table of about 10M rows. It has two columns A and B, where A is a text field and B is a real[12] field. Now when this table is queried it is usually of the form: select A from theTable where sim(B, C) > 0.8 Now C will be a 12-element array specified by the user and the value 0.8 can be arbitrary. The function, sim(), is essentially a similarity function which simply computes the inverse of the Manhattan distance between the query array and the rows of the array column. Right now the above query uses a seq scan. Furthermore, the values of the individual array elements for any given row can vary from 0 to infinity (but for most cases will be numbers less than 1000) My question is: how can I index the column B so that such queries are fast. I realize that my table is essentially a collection of 12-dimensional points and that I could replace my similarity function with a distance function. Thus my query boils down to asking 'find me rows of the table that are within X distance of my query' I know that the GIS community deals with 2D points, but I'm not familiar with this area and if I understand correctly, they use Euclidean distances, where as I need Manhattan distances. What type of indexing, available in Postgres could be used for my problem? Would it require me to implement my own indexing scheme? Any pointers would be appreciated Thanks, ------------------------------------------------------------------- Rajarshi Guha <rguha@indiana.edu> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Every little picofarad has a nanohenry all its own. -- Don Vonada
В списке pgsql-general по дате отправления: