Re: [GENERAL] Creation of tsearch2 index is very slow

От: Martijn van Oosterhout
Тема: Re: [GENERAL] Creation of tsearch2 index is very slow
Дата: ,
Msg-id: 20060121130830.GA9955@svana.org
(см: обсуждение, исходный текст)
Ответ на: Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane)
Ответы: Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov)
Список: pgsql-performance

Скрыть дерево обсуждения

Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
 Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
  Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
   Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
    Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
     Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Craig A. James", )
            Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (David Lang, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Alvaro Herrera <-ip.org>, )
                Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
                 Re: [GENERAL] Creation of tsearch2 index is very  ("Craig A. James", )
                  Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
 Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )

On Fri, Jan 20, 2006 at 05:50:36PM -0500, Tom Lane wrote:
> Yeah, but fetching from a small constant table is pretty quick too;
> I doubt it's worth getting involved in machine-specific assembly code
> for this.  I'm much more interested in the idea of improving the
> furthest-distance algorithm in gtsvector_picksplit --- if we can do
> that, it'll probably drop the distance calculation down to the point
> where it's not really worth the trouble to assembly-code it.

I've played with another algorithm. Very simple but it's only O(N). It
doesn't get the longest distance but it does get close. Basically you
take the first two elements as your starting length. Then you loop over
each remaining string, each time finding the longest pair out of each
set of three.

I've only tried it on random strings. The maximum distance for 128
random strings tends to be around 291-295. This algorithm tends to find
lengths around 280. Pseudo code below (in Perl).

However, IMHO, this algorithm is optimising the wrong thing. It
shouldn't be trying to split into sets that are far apart, it should be
trying to split into sets that minimize the number of set bits (ie
distance from zero), since that's what's will speed up searching.
That's harder though (this algorithm does approximate it sort of)
and I havn't come up with an algorithm yet

sub MaxDistFast
{
  my $strings = shift;

  my $m1 = 0;
  my $m2 = 1;
  my $dist = -1;

  for my $i (2..$#$strings)
  {
    my $d1 = HammDist( $strings->[$i], $strings->[$m1] );
    my $d2 = HammDist( $strings->[$i], $strings->[$m2] );

    my $m = ($d1 > $d2) ? $m1 : $m2;
    my $d = ($d1 > $d2) ? $d1 : $d2;

    if( $d > $dist )
    {
      $dist = $d;
      $m1 = $i;
      $m2 = $m;
    }
  }
  return($m1,$m2,$dist);
}

Full program available at:
http://svana.org/kleptog/temp/picksplit.pl

Have a nice day,
--
Martijn van Oosterhout   <>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] Creation of tsearch2 index is very
От: Martijn van Oosterhout
Дата:
Сообщение: Re: [GENERAL] Creation of tsearch2 index is very slow