Re: [UNSURE] Re: Streaming large data into postgres [WORM like applications]

Поиск
Список
Период
Сортировка
От Tom Allison
Тема Re: [UNSURE] Re: Streaming large data into postgres [WORM like applications]
Дата
Msg-id 8918B544-42FC-4885-B807-3EBBFB824E80@tacocat.net
обсуждение исходный текст
Ответ на Re: Streaming large data into postgres [WORM like applications]  ("Dhaval Shah" <dhaval.shah.m@gmail.com>)
Список pgsql-general
One approach would be to spool all the data to a flat file and then
pull them into the database as you are able to.  This would give you
extremely high peak capability.


On May 11, 2007, at 10:35 PM, Dhaval Shah wrote:

> I do care about the following:
>
> 1. Basic type checking
> 2. Knowing failed inserts.
> 3. Non-corruption
> 4. Macro transactions. That is a minimal read consistency.
>
> The following is not necessary
>
> 1. Referential integrity
>
> In this particular scenario,
>
> 1. There is a sustained load and peak loads. As long as we can handle
> peak loads, the sustained loads can be half of the quoted figure.
> 2.  The row size has limited columns. That is, it is spans at most a
> dozen or so columns and most integer or varchar.
>
> It is more data i/o heavy rather than cpu heavy.
>
> Regards
> Dhaval
>
> On 5/11/07, Ben <bench@silentmedia.com> wrote:
>> Inserting 50,000 rows a second is, uh... difficult to do, no matter
>> what database you're using. You'll probably have to spool the inserts
>> and insert them as fast as you can, and just hope you don't fall too
>> far behind.
>>
>> But I'm suspecting that you aren't going to be doing much, if any,
>> referential integrity checking, at least beyond basic type checking.
>> You probably aren't going to care about multiple inserts affecting
>> each other, or worry about corruption if a given insert fails... in
>> fact, you probably aren't even going to need transactions at all,
>> other than as a way to insert faster. Is SQL the right tool for you?
>>
>> On May 11, 2007, at 1:43 PM, Dhaval Shah wrote:
>>
>> > Here is the straight dope, one of internal teams at my customer
>> site
>> > is looking into MySql and replacing its storage engine so that they
>> > can store large amount of streamed data. The key here is that
>> the data
>> > they are getting is several thousands of rows in an extremely short
>> > duration. They say that only MySql provides them the ability to
>> > replace the storage engine, which granted is easier.
>> >
>> > If I go with the statement that postgres can basically do what they
>> > intend to do for handling large datasets, I need to prepare my
>> talking
>> > points.
>> >
>> > The requirements are as follows:
>> >
>> > 1. Large amount of streamed rows. In the order of @50-100k rows per
>> > second. I was thinking that the rows can be stored into a file
>> and the
>> > file then copied into a temp table using copy and then appending
>> those
>> > rows to the master table. And then dropping and recreating the
>> index
>> > very lazily [during the first query hit or something like that]
>> >
>> > The table size can grow extremely large. Of course, if it can be
>> > partitioned, either by range or list.
>> >
>> > 2. Most of the streamed rows are very similar. Think syslog rows,
>> > where for most cases only the timestamp changes. Of course, if the
>> > data can be compressed, it will result in improved savings in
>> terms of
>> > disk size.
>> >
>> > The key issue here is that the ultimate data usage is Write Once
>> Read
>> > Many, and in that sense I am looking for a very optimal solution
>> for
>> > bulk writes and maintaining indexes during bulk writes.
>> >
>> > So with some intelligent design, it is possible to use postgres.
>> Any
>> > help in preparing my talking points is appreciated.
>> >
>> > Regards
>> > Dhaval
>> >
>> > ---------------------------(end of
>> > broadcast)---------------------------
>> > TIP 5: don't forget to increase your free space map settings
>>
>>
>
>
> --
> Dhaval Shah
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend


В списке pgsql-general по дате отправления:

Предыдущее
От: "Dhaval Shah"
Дата:
Сообщение: Re: Streaming large data into postgres [WORM like applications]
Следующее
От: Paul Lambert
Дата:
Сообщение: Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)