Re: Table partitioning problem
От | Samba GUEYE |
---|---|
Тема | Re: Table partitioning problem |
Дата | |
Msg-id | 4D7F8392.5060306@intesens.com обсуждение исходный текст |
Ответ на | Re: Table partitioning problem (Shaun Thomas <sthomas@peak6.com>) |
Список | pgsql-performance |
hi Thanks again very much for these clear-cut answers I think i'll try to implement the partitionning despite all the difficulties you raise about it in this thread because i can't find any viable solution right now for this situation. It will constrain me to change the datamodel to workaround the inheritance foreigh key issue but i will at least test it because we have limited resources and can't afford to have many servers or whatever to boost performances... Best Regards Le 15/03/2011 14:18, Shaun Thomas a écrit : > On 03/15/2011 05:10 AM, Samba GUEYE wrote: > >> 1. Why "... partitionning is not a good idea ..." like you said >> Robert and Conor "... I grant that it would be better to never need >> to do that" ? > > There are a number of difficulties the planner has with partitioned > tables. Only until very recently, MAX and MIN would scan every single > partition, even if you performed the action on the constraint column. > Basically quite a few queries will either not have an ideal execution > plan, or act in unexpected manners unless you physically target the > exact partition you want. > > Even though we have several tables over the 50-million rows, I'm > reluctant to partition them because we have a very > transaction-intensive database, and can't risk the possible penalties. > >> 2. Is there another way or strategy to deal with very large tables >> (over 100 000 000 rows per year in one table) beyond indexing and >> partitionning? > > What you have is a very good candidate for partitioning... if you can > effectively guarantee a column to partition the data on. If you're > getting 100M rows per year, I could easily see some kind of > created_date column and then having one partition per month. > > One of the things we hate most about very large tables is the amount > of time necessary to vacuum or index them. CPU and disk IO can only go > so fast, so eventually you encounter a point where it can take hours > to index a single column. If you let your table get too big, your > maintenance costs will be prohibitive, and partitioning may be > required at that point. > > As an example, we have a table that was over 100M rows and we have > enough memory that the entire table was in system cache. Even so, > rebuilding the indexes on that table required about an hour and ten > minutes *per index*. We knew this would happen and ran the reindex in > parallel, which we confirmed by watching five of our CPUs sit at 99% > utilization for the whole interval. > > That wouldn't have happened if the table were partitioned. > >> 3. If you had to quantify a limit of numbers of rows per table in a >> single postgresql database server what would you say? > > I'd personally avoid having any tables over 10-million rows. We have > quad Xeon E7450's, tons of ram, and even NVRAM PCI cards to reduce IO > contention, and still, large tables are a nuisance. Even the best CPU > will balk at processing 10-million rows quickly. > > And yes. Good queries and design will help. Always limiting result > sets will help. Efficient, highly selective indexes will help. But > maintenance grows linearly, despite our best efforts. The only way to > sidestep that issue is to partition tables or rewrite your application > to scale horizontally via data sharding or some other shared-nothing > cluster with plProxy, GridSQL or PGPool. > > You'll have this problem with any modern database. Big tables are a > pain in everybody's asses. > > It's too bad PostgreSQL can't assign one thread per data-file and > merge the results. >
В списке pgsql-performance по дате отправления: