Re: 4G row table?
От | Josh Berkus |
---|---|
Тема | Re: 4G row table? |
Дата | |
Msg-id | 200212191115.20391.josh@agliodbs.com обсуждение исходный текст |
Ответ на | 4G row table? (george young <gry@ll.mit.edu>) |
Ответы |
Re: 4G row table?
("Charles H. Woloszynski" <chw@clearmetrix.com>)
|
Список | pgsql-performance |
George, > [linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2] What kind of RAID? How many drives? Will you be updating the data frequently, or mostly just running reports on it? With 4G rows, you will have *heavy* disk access, so the configuration and quality of your disk array is a big concern. You also might think about upping th ememory if you can. > We're setting up a DB of IC test data, which is very simple and regular, but large. > One project (we get three or four per year) has ~4 giga bits, each specified by > a few small integer values, e.g.: > Name Type Values > ---------------------- > wafer int 1-50 > die_row int 2-7 > die_col int 2-7 > testtype string (~10 different short strings) > vdd int 0-25 > bit_col int 0-127 > bit_row int 0-511 > value bit 0 or 1 > > with 4 billion(4e9) rows. I would guess to make wafer, die_row, etc. be of > type "char", probably testtype a char too with a separate testtype lookup table. > Even so, this will be a huge table. 1. Use INT2 and not INT for the INT values above. If you can hire a PostgreSQL hacker, have them design a new data type for you, an unsigned INT1 which will cut your storage space even further. 2. Do not use CHAR for wafer & die-row. CHAR requries min 3bytes storage; INT2 is only 2 bytes. 3. If you can use a lookup table for testtype, make it another INT2 and create a numeric key for the lookup table. > Questions: How much overhead will there be in the table in addition to the > 9 bytes of data I see? There's more than 9 bytes in the above. Count again. > How big will the primary index on the first seven columns > be? Will this schema work at all? As large as the 7 columns themselves, plus a little more. I suggest creating a surrogate key as an int8 sequence to refer to most rows. > Of course, we could pack 128 bits into an 8 byte "text" field (or should we use bit(128)?), > but lose some ease of use, especially for naive (but important) users. This is also unlikely to be more efficient due to the translation<->conversion process requried to access the data when you query. > Comments, suggestions? Unless you have a *really* good RAID array, expect slow performance on this hardware platform. -- -Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-performance по дате отправления: