Re: Importing text file into a TEXT field

Поиск
Список
Период
Сортировка
От Thomas Kellerer
Тема Re: Importing text file into a TEXT field
Дата
Msg-id gf9b50$pvm$1@ger.gmane.org
обсуждение исходный текст
Ответ на Re: Importing text file into a TEXT field  (Bruno Lavoie <bruno.lavoie@gmail.com>)
Ответы Re: Importing text file into a TEXT field  (Bruno Lavoie <bruno.lavoie@gmail.com>)
Список pgsql-general
Bruno Lavoie, 07.11.2008 19:20:
> Hello,
>
> The intent is to use pdftotext and store the resulting text in datbase
> for full text search purposes... I'm trying to develop a mini content
> server where I'll put pdf documents to make it searchable.
>
> Generally, PDFs are in size of 500 to 3000 pages resulting in text from
> 500kb to 2megabytes...
>
> I'm also looking at open source projects like Alfresco if it can serve
> with ease to my purpose... Anyone use this one? Comments are welcome.

If you are not bound to "native" Postgres tools, you might want to take a look at my SQL Workbench/J
(http://www.sql-workbench.net)

It can insert the contents of files (located on the client) into tables. You can either do this using an extended SQL
syntax: 

UPDATE pdf_table
  SET text_content = {$clobfile=c:/temp/convertet.txt encoding=utf8}
WHERE id = 42;

(of course this statement can not be run with psql)

You could also bulk-upload several files at one using my flat-file import.
(http://www.sql-workbench.net/manual/command-import.html)

Assuming the table has two columns (id, text_content), the flat file would look like this:

id|text_content
1|content_1.txt
2|content_2.txt
3|content_3.txt

and the import would store the content of the files not the literl 'content_1.txt' in the column text_content.

You can either insert or update the content, depending on your needs. You could even store the orginal pdf file if the
tablecontains a bytea column for the blob data. 

Contact me offline (contact information on my homepage) if you need help.

Regards
Thomas

В списке pgsql-general по дате отправления:

Предыдущее
От: Sam Mason
Дата:
Сообщение: Re: Importing text file into a TEXT field
Следующее
От: "Isak Hansen"
Дата:
Сообщение: Re: Optimizing IN queries