Re: Partition table in 9.0.x?

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Partition table in 9.0.x?
Дата
Msg-id CAMkU=1waDCS4d6M1SeRDZq3JP6V7_Mf2PL2di_V4UwyfJYybiw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Partition table in 9.0.x?  (AJ Weber <aweber@comcast.net>)
Ответы Re: Partition table in 9.0.x?
Список pgsql-performance
On Sunday, January 6, 2013, AJ Weber wrote:
All fair questions...

Thank you for your detailed response!


On 1/4/2013 11:03 PM, Jeff Janes wrote:
On Friday, January 4, 2013, AJ Weber wrote:
Hi all,

I have a table that has about 73mm rows in it and growing.  

How big is the table in MB?  Its indexes?
Not sure on this.  Will see if pgAdmin tells me.

It probably does, but from psql command line, you can do \d+ and \di+

 
Anyway, this table is going to continue to grow, and it's used frequently (Read and Write).

Are all rows in the table read and written with equal vigor, or are there hot rows and cold rows that can be recognized based on the row's values?
No, I could probably figure out a way to setup an "archive" or "older" section of the data that is updated much less frequently. 

So the data that deliniates this does not exist in that table, but it does exist someplace, either just in your head, or in the column of a higher level table?
 
Deletes are rare.  Inserts/Updates "yes".  Select on existing rows -- very frequent.


If you have little control over your storage and are already IO bound, and the tables are growing rapidly, you may need to rethink that "deletes are rare" bit.  So the inserts and updates do target a hot part, while the selects are evenly spread?

In that case, it is very important to know if the slow part are the selects, or the insert and deletes.  If the selects are slow, and the hot rows for selects can't be gathered together into a hot partition, then after clustering they will still be slow as the disk will still have to seek all over the place (massive data-mining type selects might be an exception to that, but I wouldn't count on it).
 

 
 From what I read, this table is a candidate to be partitioned for performance and scalability.  I have tested some scripts to build the "inherits" tables with their constraints and the trigger/function to perform the work.

Am I doing the right thing by partitioning this?

Probably not.  Or at least, you haven't given us the information to know.  Very broadly speaking, well-implemented partitioning makes bulk loading and removal operations take less IO, but makes normal operations take more IO,  or if lucky leaves it unchanged.  There are exceptions, but unless you can identify a very specific reason to think you might have one of those exceptions, then you probably don't.
I know you can't believe everything you read, but I thought I saw some metrics about when a table's size exceeds some fraction of available RAM, or when it approaches 100mm rows, it's a big candidate for partitioning.

I think it is a matter of semantics. A small table is poor candidate for partitioning even if it has an excellent key to use for partitioning.  A large table could be a good candidate up until you realize it doesn't have a good key to use, at which point it stops being a good candidate (in my opinion).




Should I run a cluster or vacuum full after all is done?

Probably not.  If a cluster after the partitioning would be beneficial, there would be a pretty good chance you could do a cluster *instead* of the partitioning and get the same benefit.  

I did try clustering the table on the PK (which is actually 4 columns), and it appeared to help a bit.  I was hoping partitioning was going to help me even more.

Was the order_num (from the parent table) the leading field of the 4 column PK?  If not, you might want to reorder the PK so that it is the leading field and cluster again.  Or if reordering the PK columns is not convenient, make a new index on the order_num and cluster on that (perhaps dropping the index after the cluster, if it no longer serves a purpose)
  

Is there a major benefit if I can upgrade to 9.2.x in some way that I haven't realized?

If you have specific queries that are misoptimized and so are generating more IO than they need to, then upgrading could help.  On the other hand, it could also make things worse, if a currently well optimized query becomes worse. 

Is there some new feature or optimization you're thinking about with this comment?  If so, could you please just send me a link and/or feature name and I'll google it myself?


The main things I am thinking of are the "fudge factor" for large indexes, which is currently being discussed in both performance and hackers mailing lists, which was made overly aggressive in 9.2 and so can make it choose worse plans, and the "allow the planner to generate custom plans for specific parameter values even when using prepared statements" from the 9.2 release notes, which can allow it to choose better plans.  But, surely there are other changes as well, which amount to corner cases and so are hard to discuss in the abstract.  Which is why instrumentation is important.  There isn't much point in worrying about possible changed plans until you've identified the queries that are important to worry about.
 

But, instrumentation has improved in 9.2 from 9.0, so upgrading would make it easier to figure out just which queries are really bad and have the most opportunity for improvement.  A little well informed optimization might obviate the need for either partitioning or more hard drives.

This is interesting too.  I obviously would like the best available options to tune the database and the application.  Is this detailed in the release notes somewhere, and what tools could I use to take advantage of this?  (Are there new/improved details included in the EXPLAIN statement or something?)

track_io_timing is new, and it exposes new data into EXPLAIN (ANALYZE, BUFFERS) as well as into other places. You might not want to turn this on permanently, as it can affect performance (but you can test with pg_test_timing as outlined in the docs to see how large probable affect it).  Also, EXPLAIN displays the number row removed by filters, which may or may not be useful to you.  

Most exciting I think are the improvements to the contrib module pg_stat_statements.  That would be my first recourse, to find out which of your statements are taking the most time (and/or IO).  I try to install and configure this for all of my databases now as a matter of course.  

See the 9.2 release notes (with links therein to the rest of the documentation) for discussion of these.
 
Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Andrea Suisani
Дата:
Сообщение: Re: Two Necessary Kernel Tweaks for Linux Systems
Следующее
От: Patrick Dung
Дата:
Сообщение: Re: Sub optimal performance with default setting of Postgresql with FreeBSD 9.1 on ZFS