Re: advice on indexing email

Поиск
Список
Период
Сортировка
От Maarten Boekhold
Тема Re: advice on indexing email
Дата
Msg-id 39097E62.9FB27E81@tibcofinance.com
обсуждение исходный текст
Ответ на advice on indexing email  (Marc Tardif <intmktg@CAM.ORG>)
Ответы Re: advice on indexing email  (Marc Tardif <intmktg@CAM.ORG>)
Список pgsql-general
Hi,

I wrote that fti stuff in contrib...

> My problem is how to create the full word index. The actual code to
> seperate the email into seperate words isn't a problem, but should I be
> using INSERT, BEGIN/END or COPY? In this last case, I would have to create
> a temporary file holding each word of the email and then use COPY... all
> of which also has it's fair share of overhead.

You can use one of 2 ways.

1. the fti stuff in contrib uses triggers, so every time you
insert/update/delete something in/from the 'fti-ed' table, the full text index
is also updated. If you're coding abilities are OK, you can just replace the
word breakup code in contrib/fti with your own one.

2. if you have to insert large amounts of data, it is probably faster to *not*
create the triggers at first, bulk load all your data, write a little perl
script that reads the data from your table, does the word breakup and inserts
those words into the full text index table. Using a 'sort' on the output of
the perl script will help performance as the fti data will now already be
pre-sorted in the database (you could also use CLUSTER on the fti table after
the index has been created). I think I described this somewhat better in the
README in contrib/fti. If you take this approach, don't forget to create the
triggers after the bulk load of the fti table!

Maarten

--

Maarten Boekhold, maarten.boekhold@tibcofinance.com
TIBCO Finance Technology Inc.
"Sevilla" Building
Entrada 308
1096 ED Amsterdam, The Netherlands
tel: +31 20 6601000 (direct: +31 20 6601066)
fax: +31 20 6601005
http://www.tibcofinance.com

В списке pgsql-general по дате отправления:

Предыдущее
От: "Joseph"
Дата:
Сообщение: FoxPro Fronted/Performance
Следующее
От: Bill Barnes
Дата:
Сообщение: date format problem