Re: index structure for 114-dimension vector

Список
Период
Сортировка
От Arjen van der Meijden
Тема Re: index structure for 114-dimension vector
Дата
Msg-id 4631932A.6000000@tweakers.net
обсуждение исходный текст
Ответ на Re: index structure for 114-dimension vector  (Mark Kirkwood)
Список pgsql-performance
Дерево обсуждения
Basic Q on superfluous primary keys  ("Kynn Jones", )
 Re: Basic Q on superfluous primary keys  (Bill Moran, )
  Re: Basic Q on superfluous primary keys  ("Merlin Moncure", )
   Re: Basic Q on superfluous primary keys  ("Craig A. James", )
    Re: Basic Q on superfluous primary keys  ("Merlin Moncure", )
     Re: Basic Q on superfluous primary keys  (Greg Smith, )
      Re: Basic Q on superfluous primary keys  ("Merlin Moncure", )
       Re: Basic Q on superfluous primary keys  ("Craig A. James", )
        Re: Basic Q on superfluous primary keys  (Richard Huxton, )
         Re: Basic Q on superfluous primary keys  (Greg Smith, )
        Re: Basic Q on superfluous primary keys  ("Merlin Moncure", )
         Re: Basic Q on superfluous primary keys  ("Craig A. James", )
        Re: Basic Q on superfluous primary keys  (Jeff Davis, )
       Re: Basic Q on superfluous primary keys  ("Dave Dutcher", )
        Re: Basic Q on superfluous primary keys  ("Merlin Moncure", )
         index structure for 114-dimension vector  (Andrew Lazarus, )
          Re: index structure for 114-dimension vector  (Jeff Davis, )
           Re: index structure for 114-dimension vector  (Mark Kirkwood, )
            Re: index structure for 114-dimension vector  (Andrew Lazarus, )
             Re: index structure for 114-dimension vector  (Mark Kirkwood, )
             Re: index structure for 114-dimension vector  (Tom Lane, )
            Re: index structure for 114-dimension vector  (Arjen van der Meijden, )
          Re: index structure for 114-dimension vector  (C Storm, )
          Re: index structure for 114-dimension vector  ("Alexander Staubo", )
           Re: index structure for 114-dimension vector  (Oleg Bartunov, )
            Re: index structure for 114-dimension vector  (Andrew Lazarus, )
             Re: index structure for 114-dimension vector  ("Alexander Staubo", )
       Re: Basic Q on superfluous primary keys  ("Craig A. James", )
    Re: Basic Q on superfluous primary keys  (Ron Mayer, )
On 21-4-2007 1:42 Mark Kirkwood wrote:
> I don't think that will work for the vector norm i.e:
>
> |x - y| = sqrt(sum over j ((x[j] - y[j])^2))

I don't know if this is usefull here, but I was able to rewrite that
algorithm for a set of very sparse vectors (i.e. they had very little
overlapping factors) to something like:
|x - y| = sum over j (x[j]^2) + sum over j (y[j]^2)
        + for each j where x[j] and y[j] are both non-zero: - (x[j]^2 +
y[j]^2) + (x[j] - y[j])^2

The first two parts sums can be calculated only once. So if you have
very little overlap, this is therefore much more efficient (if there is
no overlap at all you end up with x[j]^2 + y[j]^2 anyway). Besides, this
rewritten calculation allows you to store the X and Y vectors using a
trivial table-layout vector(x,i,value) which is only filled with
non-zero's and which you can trivially self-join to find the closest
matches. You don't care about the j's where there is either no x or
y-value anyway with this rewrite.

I can compare over 1000 y's of on average 100 elements to two x's of
over 1000 elements on just a single 1.8Ghz amd processor. (I use it for
a bi-kmeans algorithm, so there are only two buckets to compare to).

So it might be possible to rewrite your algorithm to be less
calculation-intensive. Obviously, with a dense-matrix this isn't going
to work, but there may be other ways to circumvent parts of the
algorithm or to cache large parts of it.
It might also help to extract only the 6 relevant columns into a
seperate temporary table which will have much smaller records and thus
can fit more records per page.

Best regards,

Arjen

В списке pgsql-performance по дате отправления:

Предыдущее
От: Shane Ambler
Дата:
Сообщение: Re: Usage up to 50% CPU
Следующее
От: Michael Stone
Дата:
Сообщение: Re: Usage up to 50% CPU