Re: multi terabyte fulltext searching
От | Oleg Bartunov |
---|---|
Тема | Re: multi terabyte fulltext searching |
Дата | |
Msg-id | Pine.LNX.4.64.0703211908400.12152@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: multi terabyte fulltext searching (Benjamin Arai <benjamin@araisoft.com>) |
Ответы |
Re: multi terabyte fulltext searching
|
Список | pgsql-general |
On Wed, 21 Mar 2007, Benjamin Arai wrote: > Hi Oleg, > > I am currently using GIST indexes because I receive about 10GB of new data a > week (then again I am not deleting any information). The do not expect to be > able to stop receiving text for about 5 years, so the data is not going to > become static any time soon. The reason I am concerned with performance is > that I am providing a search system for several newspapers since essentially > the beginning of time. Many bibliographer etc would like to use this utility > but if each search takes too long I am not going to be able to support many > concurrent users. > GiST is ok for your feed, but archive part should use GIN index. inheritance+CE should help your life. > Benjamin > > On Mar 21, 2007, at 8:42 AM, Oleg Bartunov wrote: > >> Benjamin, >> >> as one of the author of tsearch2 I'd like to know more about your setup. >> tsearch2 in 8.2 has GIN index support, which scales much better than old >> GiST index. >> >> Oleg >> >> On Wed, 21 Mar 2007, Benjamin Arai wrote: >> >>> Hi, >>> >>> I have been struggling with getting fulltext searching for very large >>> databases. I can fulltext index 10s if gigs without any problem but when >>> I start geting to hundreds of gigs it becomes slow. My current system is >>> a quad core with 8GB of memory. I have the resource to throw more >>> hardware at it but realistically it is not cost effective to buy a system >>> with 128GB of memory. Is there any solutions that people have come up >>> with for indexing very large text databases? >>> >>> Essentially I have several terabytes of text that I need to index. Each >>> record is about 5 paragraphs of text. I am currently using TSearch2 >>> (stemming and etc) and getting sub-optimal results. Queries take more >>> than a second to execute. Has anybody implemented such a database using >>> multiple systems or some special add-on to TSearch2 to make things faster? >>> I want to do something like partitioning the data into multiple systems >>> and merging the ranked results at some master node. Is something like >>> this possible for PostgreSQL or must it be a software solution? >>> >>> Benjamin >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 9: In versions below 8.0, the planner will ignore your desire to >>> choose an index scan if your joining column's datatypes do not >>> match >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: