Re: Importing text file into a TEXT field

Поиск
Список
Период
Сортировка
От Sam Mason
Тема Re: Importing text file into a TEXT field
Дата
Msg-id 20081110124704.GC2459@frubble.xen.chris-lamb.co.uk
обсуждение исходный текст
Ответ на Re: Importing text file into a TEXT field  (Bruno Lavoie <bruno.lavoie@gmail.com>)
Ответы Re: Importing text file into a TEXT field  (Bruno Lavoie <bruno.lavoie@gmail.com>)
Список pgsql-general
On Fri, Nov 07, 2008 at 01:20:27PM -0500, Bruno Lavoie wrote:
> The intent is to use pdftotext and store the resulting text in datbase
> for full text search purposes... I'm trying to develop a mini content
> server where I'll put pdf documents to make it searchable.

I've not tried to do this sort of thing before; but the FTS code (native
in PG 8.3, contrib modules before this version) sounds like what you
want to be using.  As far as getting the data in, you're going to have
to write a bit of code.  A quick hack suggests that you can get things
going in a small amount of Python code:

  import sys;
  import psycopg2;
  conn = psycopg2.connect("");
  cur = conn.cursor();
  cur.execute("INSERT INTO tbl (tsvec) SELECT to_tsvector(%s);",
              [sys.stdin.read()]);
  conn.commit();

You can then do:

  pdftotext file.pdf - | python script.py

One performance issue with psycopg2 is that it always expands the SQL;
you may want to find something that uses PQexecParams() underneath so
you spend less time escaping everything and then having PG undo that
work.


  Sam

В списке pgsql-general по дате отправления:

Предыдущее
От: "dbalinglung"
Дата:
Сообщение: Re: Get interval in months
Следующее
От: Thomas Kellerer
Дата:
Сообщение: Re: Importing text file into a TEXT field