Re: Replacement for Oracle Text

Поиск

Список

Период

Сортировка

От	Oleg Bartunov
Тема	Re: Replacement for Oracle Text
Дата	19 февраля 2016 г. 22:24:00
Msg-id	CAF4Au4zq=GLioTWF9byiQ_iSY3TYcbEaM31za5X2OKQu5yQJfA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Replacement for Oracle Text (Josh berkus <josh@agliodbs.com>)
Список	pgsql-general

Дерево обсуждения

On Fri, Feb 19, 2016 at 8:28 PM, Josh berkus <josh@agliodbs.com> wrote:

On 02/19/2016 05:49 AM, s d wrote:
On 19 February 2016 at 14:19, Bruce Momjian <bruce@momjian.us
<mailto:bruce@momjian.us>> wrote:

I wonder if PLPerl could be used to extract the words from a PDF
document and create a tsvector column from it.

I don't know about PLPerl(I'm pretty sure it could be used for this
purpose, though.). On the other hand I've written code for this in
Python which should be easy to adapt for PLPython, if necessary.

I'd swear someone already built something to do this. All you need is a library which reads PDF and transforms it into text, and then you can FTS it. I know there's a module for OpenOffice docs somewhere as well, but heck if I can remember where.

I used pdftotext for that.

I think it'd be useful to have extension{s}, which can be used to convert anything to text. I remember someone indexed chemical formulae, TeX/LaTeX, DOC files.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее

От: Jeff Janes
Дата: 19 февраля 2016 г., 22:18:28
Сообщение: Re: Monitoring and insight into NOTIFY queue

Следующее

От: Stephen Davies
Дата: 20 февраля 2016 г., 03:11:05
Сообщение: Re: Replacement for Oracle Text

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Replacement for Oracle Text

Предыдущее

Следующее