Re: [UNSURE] Re: Streaming large data into postgres [WORM like applications]
От | Tom Allison |
---|---|
Тема | Re: [UNSURE] Re: Streaming large data into postgres [WORM like applications] |
Дата | |
Msg-id | 8918B544-42FC-4885-B807-3EBBFB824E80@tacocat.net обсуждение исходный текст |
Ответ на | Re: Streaming large data into postgres [WORM like applications] ("Dhaval Shah" <dhaval.shah.m@gmail.com>) |
Список | pgsql-general |
One approach would be to spool all the data to a flat file and then pull them into the database as you are able to. This would give you extremely high peak capability. On May 11, 2007, at 10:35 PM, Dhaval Shah wrote: > I do care about the following: > > 1. Basic type checking > 2. Knowing failed inserts. > 3. Non-corruption > 4. Macro transactions. That is a minimal read consistency. > > The following is not necessary > > 1. Referential integrity > > In this particular scenario, > > 1. There is a sustained load and peak loads. As long as we can handle > peak loads, the sustained loads can be half of the quoted figure. > 2. The row size has limited columns. That is, it is spans at most a > dozen or so columns and most integer or varchar. > > It is more data i/o heavy rather than cpu heavy. > > Regards > Dhaval > > On 5/11/07, Ben <bench@silentmedia.com> wrote: >> Inserting 50,000 rows a second is, uh... difficult to do, no matter >> what database you're using. You'll probably have to spool the inserts >> and insert them as fast as you can, and just hope you don't fall too >> far behind. >> >> But I'm suspecting that you aren't going to be doing much, if any, >> referential integrity checking, at least beyond basic type checking. >> You probably aren't going to care about multiple inserts affecting >> each other, or worry about corruption if a given insert fails... in >> fact, you probably aren't even going to need transactions at all, >> other than as a way to insert faster. Is SQL the right tool for you? >> >> On May 11, 2007, at 1:43 PM, Dhaval Shah wrote: >> >> > Here is the straight dope, one of internal teams at my customer >> site >> > is looking into MySql and replacing its storage engine so that they >> > can store large amount of streamed data. The key here is that >> the data >> > they are getting is several thousands of rows in an extremely short >> > duration. They say that only MySql provides them the ability to >> > replace the storage engine, which granted is easier. >> > >> > If I go with the statement that postgres can basically do what they >> > intend to do for handling large datasets, I need to prepare my >> talking >> > points. >> > >> > The requirements are as follows: >> > >> > 1. Large amount of streamed rows. In the order of @50-100k rows per >> > second. I was thinking that the rows can be stored into a file >> and the >> > file then copied into a temp table using copy and then appending >> those >> > rows to the master table. And then dropping and recreating the >> index >> > very lazily [during the first query hit or something like that] >> > >> > The table size can grow extremely large. Of course, if it can be >> > partitioned, either by range or list. >> > >> > 2. Most of the streamed rows are very similar. Think syslog rows, >> > where for most cases only the timestamp changes. Of course, if the >> > data can be compressed, it will result in improved savings in >> terms of >> > disk size. >> > >> > The key issue here is that the ultimate data usage is Write Once >> Read >> > Many, and in that sense I am looking for a very optimal solution >> for >> > bulk writes and maintaining indexes during bulk writes. >> > >> > So with some intelligent design, it is possible to use postgres. >> Any >> > help in preparing my talking points is appreciated. >> > >> > Regards >> > Dhaval >> > >> > ---------------------------(end of >> > broadcast)--------------------------- >> > TIP 5: don't forget to increase your free space map settings >> >> > > > -- > Dhaval Shah > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend
В списке pgsql-general по дате отправления:
Предыдущее
От: "Dhaval Shah"Дата:
Сообщение: Re: Streaming large data into postgres [WORM like applications]
Следующее
От: Paul LambertДата:
Сообщение: Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)