Re: more than 2GB data string save

Поиск
Список
Период
Сортировка
От Allan Kamau
Тема Re: more than 2GB data string save
Дата
Msg-id ab1ea6541002092334g7ef631b7p3a923d7266fe6d00@mail.gmail.com
обсуждение исходный текст
Ответ на Re: more than 2GB data string save  (Steve Atkins <steve@blighty.com>)
Список pgsql-general
On Wed, Feb 10, 2010 at 10:11 AM, Steve Atkins <steve@blighty.com> wrote:
>
> On Feb 9, 2010, at 10:38 PM, AI Rumman wrote:
>
>> Thanks for your quick answes.
>>
>> But if I use a file and then store the name in the database, is it possible to use TEXT search tsvector and tsquery
indexingon these external files? 
>
> No, no way at all.
>
> A database isn't really the right way to do full text search for single files that big. Even if they'd fit in the
databaseit's way bigger than the underlying index types tsquery uses are designed for. 
>
> Are you sure that the documents are that big? A single document of that size would be 400 times the size of the
bible.That's a ridiculously large amount of text, most of a small library. 
>
> If the answer is "yes, it's really that big and it's really text" then look at clucene or, better, hiring a
specialist.
>
> Cheers,
>  Steve
>
>>
>> On Wed, Feb 10, 2010 at 12:26 PM, Steve Atkins <steve@blighty.com> wrote:
>>
>> On Feb 9, 2010, at 9:52 PM, Scott Marlowe wrote:
>>
>> > On Tue, Feb 9, 2010 at 9:38 PM, AI Rumman <rummandba@gmail.com> wrote:
>> >> How to save 2 GB or more text string in Postgresql?
>> >> Which data type should I use?
>> >
>> > If you have to you can use either the lo interface, or you can use
>> > bytea.  Large Object (i.e. lo) allows for access much like fopen /
>> > fseek  etc in C, but the actual data are not stored in a row with
>> > other data, but alone in the lo space.  Bytea is a legit type that you
>> > can have as one of many in a row, but you retrieve the whole thing at
>> > once when you get the row.
>>
>> Bytea definitely won't handle more than 1 GB. I don't think the lo interface
>> will handle more than 2GB.
>>
>> >
>> > Preferred way to store 2GB data is to put it into a file and put the
>> > name of the file into the database.
>>
>>
>> This.
>>
>> Cheers,
>>  Steve
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



I have some simple (and possibly wrong) solutions are presented below.
a) Store only the file name in the db and leave the file on the file
system where you can access it after after reading in the file name
from the DB to determine it's location
This solution may offer the possibility of using many of the very fast
and efficient file content search and manipulation algorithms and
tools that are part of most OSes (for instance grep, sed, ...)

b) If you really need to store the file contents in the DB. You may
write a small method which reads in the file contents in batches of n
number of bytes and writes each of these batches in a new table row,
but one thing to remember is to capture and store the batch number
(which should be sequencial) in another field of the respective batch.
And finally indicate the last batch by maybe setting another field in
the row as 'true'.
Doing so will provide you with the opportunity to parallize your the
batch persistance to DB and still have a way to correctly put together
the sections of the file contents (the batches) while reconstructing
the file.
Also will ensure the size of your transaction will be as small the
size of a batch.



Allan.

В списке pgsql-general по дате отправления:

Предыдущее
От: Steve Atkins
Дата:
Сообщение: Re: more than 2GB data string save
Следующее
От: John R Pierce
Дата:
Сообщение: Re: dump of 700 GB database