Re: [rfc] overhauling pgstat.stat
От | Satoshi Nagayasu |
---|---|
Тема | Re: [rfc] overhauling pgstat.stat |
Дата | |
Msg-id | 522D7AD2.4060203@uptime.jp обсуждение исходный текст |
Ответ на | Re: [rfc] overhauling pgstat.stat (Tomas Vondra <tv@fuzzy.cz>) |
Список | pgsql-hackers |
(2013/09/09 8:19), Tomas Vondra wrote: > On 8.9.2013 23:04, Jeff Janes wrote: >> On Tue, Sep 3, 2013 at 10:09 PM, Satoshi Nagayasu <snaga@uptime.jp> >> wrote: >>> Hi, >>> >>> >>> (2013/09/04 13:07), Alvaro Herrera wrote: >>>> >>>> Satoshi Nagayasu wrote: >>>> >>>>> As you may know, this file could be handreds of MB in size, >>>>> because pgstat.stat holds all access statistics in each >>>>> database, and it needs to read/write an entire pgstat.stat >>>>> frequently. >>>>> >>>>> As a result, pgstat.stat often generates massive I/O operation, >>>>> particularly when having a large number of tables in the >>>>> database. >>>> >>>> >>>> We already changed it: >>> >>>> >>>> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro >>>> Herrera <alvherre@alvh.no-ip.org> Date: Mon Feb 18 17:56:08 >>>> 2013 -0300 >>>> >>>> Split pgstat file in smaller pieces >>> >>> Thanks for the comments. I forgot to mention that. >>> >>> Yes, we have already split single pgstat.stat file into several >>> pieces. >>> >>> However, we still need to read/write large amount of statistics >>> data when we have a large number of tables in single database or >>> multiple databases being accessed. Right? >> >> Do you have a test case for measuring this? I vaguely remember from >> when I was testing the split patch, that I thought that after that >> improvement the load that was left was so low that there was little >> point in optimizing it further. > > This is actually a pretty good point. Creating a synthetic test case is > quite simple - just create 1.000.000 tables in a single database, but > I'm wondering if it's actually realistic. Do we have a real-world > example where the current "one stat file per db" is not enough? I have several assumptions for that. - Single shared database contains thousands of customers. - Each customer has hundreds of tables and indexes. - Customers are separated by schemas (namespaces) in single database. - Application server uses connection pooling for performance reason. - Workload (locality in the table access) can not be predicted. Looks reasonable? > The reason why I worked on the split patch is that our application is > slightly crazy and creates a lot of tables (+ indexes) on the fly, and > as we have up to a thousand databases on each host, we often ended up > with a huge stat file. > > Splitting the stat file improved that considerably, although that's > partially because we have the stats on a tmpfs, so I/O is not a problem, > and the CPU overhead is negligible thanks to splitting the stats per > database. I agree that splitting a single large database into several pieces, like thousands of tiny databases, could be an option in some cases. However, what I intend here is eliminating those limitations on database design. In fact, when considering connection pooling, splitting a database is not a good idea, because AFAIK, many connection poolers manage connections per database. So, I'd like to support 100k tables in single database. Any comments? Regards, > But AFAIK there are operating systems where creating a filesystem in RAM > is not that simple - e.g. Windows. In such cases even a moderate number > of objects may be a significant issue I/O-wise. But then again, I can't > really think of reasonable a system creating that many objects in a > single database (except for e.g. a shared database using schemas instead > of databases). > > Tomas > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
В списке pgsql-hackers по дате отправления: