Re: Replacement for Oracle Text

Поиск
Список
Период
Сортировка
От Chris Travers
Тема Re: Replacement for Oracle Text
Дата
Msg-id CAKt_ZftXtFD-8BddWc2QwgDc5dC+Xc2_FwO7vpu0pKYRXA8jLA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Replacement for Oracle Text  (Stephen Davies <sdavies@sdc.com.au>)
Ответы Re: Replacement for Oracle Text  (Stephen Davies <sdavies@sdc.com.au>)
Список pgsql-general
A more general way would be to have a function which takes a pdf in and returns the text.  Mark it immutable.

Then you can index the output of converting that text to a tsvector.

You may want to pull everything into a tsvector column for ease of review, but functional indexes also make that less important

On Sat, Feb 20, 2016 at 1:10 AM, Stephen Davies <sdavies@sdc.com.au> wrote:
On 20/02/16 00:24, Bruce Momjian wrote:
On Fri, Feb 19, 2016 at 02:49:16PM +0100, s d wrote:
On 19 February 2016 at 14:19, Bruce Momjian <bruce@momjian.us> wrote:
     >     Ah, no. That's not possible
     >
     >
     > ...not possible, Yet.
     >
     > PostgreSQL grows by adding the features people need and its changing
     rapidly.

     I wonder if PLPerl could be used to extract the words from a PDF
     document and create a tsvector column from it.

  I don't know about PLPerl(I'm pretty sure it could be used for this purpose,
though.).  On the other hand I've written code for this in Python which should
be easy to adapt for PLPython, if necessary.

Right, so you would write a PL/Perl or PL/Python trigger function that
would populate the tsvector column on every INSERT or UPDATE.

FWIW, I just use pdftotext in my CGI.

--
=============================================================================
Stephen Davies Consulting P/L                             Phone: 08-8177 1595
Adelaide, South Australia.                                Mobile:040 304 0583



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

В списке pgsql-general по дате отправления:

Предыдущее
От: Stephen Davies
Дата:
Сообщение: Re: Replacement for Oracle Text
Следующее
От: Stephen Davies
Дата:
Сообщение: Re: Replacement for Oracle Text