Re: Best way to handle multi-billion row read-only table?
| От | Greg Smith |
|---|---|
| Тема | Re: Best way to handle multi-billion row read-only table? |
| Дата | |
| Msg-id | 4B725760.8070905@2ndquadrant.com обсуждение исходный текст |
| Ответ на | Best way to handle multi-billion row read-only table? (Asher <asher@piceur.co.uk>) |
| Ответы |
Re: Best way to handle multi-billion row read-only table?
|
| Список | pgsql-general |
Asher wrote: > Once loaded into the database the data will never be deleted or > modified and will typically be accessed over a particular date range > for a particular channel (e.g. "sample_time >= X AND sample_time <= Y > AND channel=Z"). A typical query won't return more than a few million > rows and speed is not desperately important (as long as the time is > measured in minutes rather than hours). > > Are there any recommended ways to organise this? Should I partition my > big table into multiple smaller ones which will always fit in memory > (this would result in several hundreds or thousands of sub-tables)? > Are there any ways to keep the index size to a minimum? At the moment > I have a few weeks of data, about 180GB, loaded into a single table > and indexed on sample_time and channel and the index takes up 180GB too. One approach to consider is partitioning by sample_time and not even including the channel number in the index. You've got tiny records; there's going to be hundreds of channels of data on each data page pulled in, right? Why not minimize physical I/O by reducing the index size, just read that whole section of time in to memory (they should be pretty closely clustered and therefore mostly sequential I/O), and then push the filtering by channel onto the CPU instead. If you've got billions of rows, you're going to end up disk bound anyway; minimizing physical I/O and random seeking around at the expense of CPU time could be a big win. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
В списке pgsql-general по дате отправления: