word/phrase extraction & ranking

Поиск
Список
Период
Сортировка
От Marius Andreiana
Тема word/phrase extraction & ranking
Дата
Msg-id 1352918050.97151.YahooMailNeo@web140704.mail.bf1.yahoo.com
обсуждение исходный текст
Список pgsql-general
Hello,

From selected rows in a table, how can one extract and rank words/phrases based on how often they occur?
Here's an example: http://developer.yahoo.com/search/content/V1/termExtraction.html

INPUT:
CREATE TABLE phrases (
id BIGSERIAL,
phrase VARCHAR(10000)
);

INSERT INTO phrases (phrase) VALUES (‘Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration.’)
INSERT INTO phrases (phrase) VALUES (‘Andrea Bolgi was an italian sculptor’)

OUTPUT:
phrase | weight
italian sculptor  |  5
virgin mary | 2
painters | 1
renaissance | 1
inspiration | 1
Andrea Bolgi | 1

Some notes:
* phrases could contain “stop words”, e.g. “easy to answer”
* ideally, english language variations and synonyms would be automatically grouped.

I understand one might use postgresql’s full text search support, and maybe pg_trgm, but how exactly?


Thanks

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Access disk from plpython
Следующее
От: Raymond O'Donnell
Дата:
Сообщение: Re: Using Postgresql 9.2 on windows 7 and windows vista