Re: Multiple word synonyms (maybe?)

Поиск
Список
Период
Сортировка
От rob stone
Тема Re: Multiple word synonyms (maybe?)
Дата
Msg-id 1445338679.1853.30.camel@gmail.com
обсуждение исходный текст
Ответ на Multiple word synonyms (maybe?)  (Tim van der Linden <tim@shisaa.jp>)
Ответы Re: Multiple word synonyms (maybe?)
Список pgsql-general
On Tue, 2015-10-20 at 19:35 +0900, Tim van der Linden wrote:
> Hi All
>
> I have a question regarding PostgreSQL's full text capabilities and
> (presumably) the synonym dictionary.
>
> I'm currently implementing FTS on a medical themed setup which uses
> domain specific jargon to denote a bunch of stuff. A specific request
> I wish to implement here are the jargon synonyms that are heavily
> used.
>
> Of course, I can simply go ahead and create my own synonym dictionary
> with a jargon specific synonym file to feed it. However, most of the
> synonyms are comprised out of more then a single word.
>
> The term "heart attack" for example has the following "synonyms":
>
> - Acute MI
> - MI
> - Myocardial infarction
>
> As far as I understand it, the tokenizer within PostgreSQL FTS engine
> splits words on spaces to generate tokens which are then proposed to
> each dictionary. I think it is therefor impossible to have "multi-
> word synonyms" in this sense as multiple words cannot reach the
> dictionary. The term "heart attack" would be presented as the tokens
> "heart" and "attack".
>
> From a technical standpoint I understand FTS is about looking at
> individual words and lexemizing them ... yet from a natural language
> lookup perspective you still wish to tie "Heart attack" to "Acute MI"
> so when a client search on one, the other will turn up as well.
>
> Should I write my own tokenizer to catch all these words and present
> them as a single token? Or is this completely outside the realm of
> FTS (or FTS within Postgresql)?
>
> Cheers,
> Tim
>
>


Looking at this from an entirely different perspective, why are you not
using ICD codes to identify patient events?
It is a one to many relationship between patient and their events
identified by the relevant ICD code and date.
Given that MI has several applicable ICD codes you can use a select
along the lines of:-
WHERE icd_code IN (  . . . )


I know it doesn't answer your question!

Cheers,
Rob


В списке pgsql-general по дате отправления:

Предыдущее
От: Tim van der Linden
Дата:
Сообщение: Multiple word synonyms (maybe?)
Следующее
От: Geoff Winkless
Дата:
Сообщение: Re: Multiple word synonyms (maybe?)