Simplifying the tsvector format for simple glossaries

Поиск

Список

Период

Сортировка

От	Marc Mamin
Тема	Simplifying the tsvector format for simple glossaries
Дата	29 января 2012 г. 19:39:52
Msg-id	C4DAC901169B624F933534A26ED7DF3103E91825@JENMAIL01.ad.intershop.net обсуждение исходный текст
Ответы	Re: Simplifying the tsvector format for simple glossaries (Oleg Bartunov <oleg@sai.msu.su>)
Список	pgsql-general

Дерево обсуждения

Hello,

We have a text search on data from error logs, and our application
offer a rather simple search on lexemes only (no weighting, no
neighbouring ...).
This works quite well, except when the applications generating the logs
get mad and we have to handle millions of messages per day :-)
We also have an ETL (perl) tool, that first transform the logs to CSV
files for COPY

My idea is to let perl create a list of single words for each message,
and run the search only on these "glossaries".
Going further, I'd like to import these lists directly as tsvectors to
save a processing step within Postgres.

The standard tsvector representation in CSV would then look like

'lex_1':1 'lex_2':2 'lex_3':3 ...

when casting from text to tsvector, I've notice with 9.1 that this simpler format is valid too:

'lex_1 lex_2 lex_3 ...'

So my questions:
Is it safe to define tsvectors that way, or should I expect problems
with future release being stricter with the tsvector format?

Do I have to respect the lexemes ordering within a tsvector (using which
NLS Format) ?

Is it an issue if some tsvectors contains stop words, or is it just
annoying noise ?

For the case when this simplification is fine, I'd suggest to add a
description on this possible tsvector representation to the doc.

best regards,

Marc Mamin

В списке pgsql-general по дате отправления:

Предыдущее

От: Dave Page
Дата: 29 января 2012 г., 18:57:45
Сообщение: Re: [pgeu-general] FOSDEM booth volunteer

Следующее

От: Oleg Bartunov
Дата: 29 января 2012 г., 21:07:00
Сообщение: Re: Simplifying the tsvector format for simple glossaries

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Simplifying the tsvector format for simple glossaries

Предыдущее

Следующее