Re: [GENERAL] Creation of tsearch2 index is very slow

От: Tom Lane
Тема: Re: [GENERAL] Creation of tsearch2 index is very slow
Дата: ,
Msg-id: 27264.1137793817@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout)
Ответы: Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson")
Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout)
Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson")
Список: pgsql-performance

Скрыть дерево обсуждения

Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
 Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
  Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
   Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
    Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
     Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Craig A. James", )
            Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (David Lang, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Alvaro Herrera <-ip.org>, )
                Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
                 Re: [GENERAL] Creation of tsearch2 index is very  ("Craig A. James", )
                  Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
 Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )

Martijn van Oosterhout <> writes:
> Given that all it's doing is counting bits, a simple fix would be to
> loop over bytes, use XOR and count ones. For extreme speedup create a
> lookup table with 256 entries to give you the answer straight away...

Yeah, I just finished doing that and got about a 20x overall speedup
(30-some seconds to build the index instead of 10 minutes).  However,
hemdistsign is *still* 70% of the runtime after doing that.  The problem
seems to be that gtsvector_picksplit is calling it an inordinate number
of times:

                0.53   30.02    1649/1649        FunctionCall2 <cycle 2> [19]
[20]    52.4    0.53   30.02    1649         gtsvector_picksplit [20]
               29.74    0.00 23519673/33035195     hemdistsign [18]
                0.14    0.00 22985469/22985469     hemdistcache [50]
                0.12    0.00  268480/10030581     makesign [25]
                0.02    0.00  270400/270400      fillcache [85]
                0.00    0.00    9894/4077032     AllocSetAlloc [34]
                0.00    0.00    9894/2787477     MemoryContextAlloc [69]

(hemdistcache calls hemdistsign --- I think gprof is doing something
funny with tail-calls here, and showing hemdistsign as directly called
from gtsvector_picksplit when control really arrives through hemdistcache.)

The bulk of the problem is clearly in this loop, which performs O(N^2)
comparisons to find the two entries that are furthest apart in hemdist
terms:

    for (k = FirstOffsetNumber; k < maxoff; k = OffsetNumberNext(k))
    {
        for (j = OffsetNumberNext(k); j <= maxoff; j = OffsetNumberNext(j))
        {
            if (k == FirstOffsetNumber)
                fillcache(&cache[j], GETENTRY(entryvec, j));

            size_waste = hemdistcache(&(cache[j]), &(cache[k]));
            if (size_waste > waste)
            {
                waste = size_waste;
                seed_1 = k;
                seed_2 = j;
            }
        }
    }

I wonder if there is a way to improve on that.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: "Craig A. James"
Дата:
Сообщение: Re: [GENERAL] Creation of tsearch2 index is very slow
От: Rikard Pavelic
Дата:
Сообщение: Re: [PERFORMANCE] Stored Procedures