Re: [GENERAL] Creation of tsearch2 index is very slow

От: Oleg Bartunov
Тема: Re: [GENERAL] Creation of tsearch2 index is very slow
Дата: ,
Msg-id: Pine.GSO.4.63.0601211808490.14417@ra.sai.msu.su
(см: обсуждение, исходный текст)
Ответ на: Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout)
Ответы: Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout)
Список: pgsql-performance

Скрыть дерево обсуждения

Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
 Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
  Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
   Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
    Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
     Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
      Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Steinar H. Gunderson", )
            Re: [GENERAL] Creation of tsearch2 index is very slow  ("Craig A. James", )
            Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
             Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (David Lang, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Tom Lane, )
              Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
               Re: [GENERAL] Creation of tsearch2 index is very  (Alvaro Herrera <-ip.org>, )
                Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
                 Re: [GENERAL] Creation of tsearch2 index is very  ("Craig A. James", )
                  Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
        Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
         Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
          Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
            Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
           Re: [GENERAL] Creation of tsearch2 index is very slow  (Oleg Bartunov, )
       Re: [GENERAL] Creation of tsearch2 index is very slow  (Martijn van Oosterhout, )
      Re: [GENERAL] Creation of tsearch2 index is very  (Ron, )
 Re: [GENERAL] Creation of tsearch2 index is very  (Oleg Bartunov, )

On Sat, 21 Jan 2006, Martijn van Oosterhout wrote:

> On Sat, Jan 21, 2006 at 04:29:13PM +0300, Oleg Bartunov wrote:
>> Martijn, you're right! We want not only to split page to very
>> different parts, but not to increase the number of sets bits in
>> resulted signatures, which are union (OR'ed) of all signatures
>> in part. We need not only fast index creation (thanks, Tom !),
>> but a better index. Some information is available here
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>> There are should be more detailed document, but I don't remember where:)
>
> I see how it works, what I don't quite get is whether the "inverted
> index" you refer to is what we're working with here, or just what's in
> tsearchd?

just tsearchd. We plan to implement inverted index into PostgreSQL core
and then adapt tsearch2 to use it as option for read-only archives.

>
>>> That's harder though (this algorithm does approximate it sort of)
>>> and I havn't come up with an algorithm yet
>>
>> Don't ask how hard we thought :)
>
> Well, looking at how other people are struggling with it, it's
> definitly a Hard Problem. One thing though, I don't think the picksplit
> algorithm as is really requires you to strictly have the longest
> distance, just something reasonably long. So I think the alternate
> algorithm I posted should produce equivalent results. No idea how to
> test it though...

you may try our development module 'gevel' to see how dense is a signature.

www=# \d v_pages
           Table "public.v_pages"
   Column   |       Type        | Modifiers
-----------+-------------------+-----------
  tid       | integer           | not null
  path      | character varying | not null
  body      | character varying |
  title     | character varying |
  di        | integer           |
  dlm       | integer           |
  de        | integer           |
  md5       | character(22)     |
  fts_index | tsvector          |
Indexes:
     "v_pages_pkey" PRIMARY KEY, btree (tid)
     "v_pages_path_key" UNIQUE, btree (path)
     "v_gist_key" gist (fts_index)

# select * from gist_print('v_gist_key') as t(level int, valid bool, a gtsvector) where level =1;
  level | valid |               a
-------+-------+--------------------------------
      1 | t     | 1698 true bits, 318 false bits
      1 | t     | 1699 true bits, 317 false bits
      1 | t     | 1701 true bits, 315 false bits
      1 | t     | 1500 true bits, 516 false bits
      1 | t     | 1517 true bits, 499 false bits
(5 rows)



     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: , http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


В списке pgsql-performance по дате сообщения:

От: Rikard Pavelic
Дата:
Сообщение: Re: [PERFORMANCE] Stored Procedures
От: "Constantine Filin"
Дата:
Сообщение: libpq vs. unixODBC performance