Re: full text indexing
От | Mitch Vincent |
---|---|
Тема | Re: full text indexing |
Дата | |
Msg-id | 016301c028c2$39325420$0200000a@doot обсуждение исходный текст |
Ответ на | full text indexing ("Poul L. Christiansen" <poulc@cs.auc.dk>) |
Список | pgsql-general |
> Hi > > In my PostgreSQL database I have a lot of newspaper articles (size: > 100mb now, growing beyond 1gb within few months). > I wan't to use full text indexing so that users can search the articles > with a keyword and have the results in less than one second. > > How do I accomplish that? It's complicated :-) > Does PostgreSQL have this feature? Nope. > Which 3rd party indexing tools are available that easily interoperate > with PostgreSQL? There is some code in contrib called FTI (Full Text Index - no doubt).. I re-wrote it for my uses but ended up not using because I was doing so many sorts and joins, it made a scan (even an index scan) of a table with 3 million rows in it very slow. However id you were just searching the keyword table itself it was very, very fast. The FTI trigger in the contrib breaks the words down to 2 letter bits (for substring searching) -- mine doesn't, it only indexes whole words without duplicates and looks at a list of words not to index (words like a, an, the, anything else you want -- I think it has the 300 most used English words in there already) It's drawback is speed, it does take a few seconds to index on INSERT (and UPDATE) -- that's if your text fields are 30ish k (very close to the max PG can store).. I would say you're going to run into the 32k limit pretty quick with newspaper articles -- I index resumes and I've run into it many times (32k of text really isn't all that much).. Anyway, I'll try and get that trigger together that I did and send it to the PG guys to see if it's worthy of being added to contrib -- I'll send you a copy in private if you'd like. Note: I think there have been more people to re-write that trigger, I haven't seen anything else though.. Good luck! <End of long-winded response> -Mitch
В списке pgsql-general по дате отправления: