Re: max_wal_senders must die
От | Robert Haas |
---|---|
Тема | Re: max_wal_senders must die |
Дата | |
Msg-id | AANLkTim-Hna6qiUvZYK=876cNwDxptZ8e7FJ2ns4oQXq@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: max_wal_senders must die (Greg Smith <greg@2ndquadrant.com>) |
Ответы |
Re: max_wal_senders must die
Re: max_wal_senders must die |
Список | pgsql-hackers |
On Wed, Oct 20, 2010 at 1:06 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Josh Berkus wrote: >> >> Well, now that you mention it, I also think that "hot standby" should be >> the default. Yes, I know about the overhead, but I also think that the >> number of our users who want easy replication *far* outnumber the users >> who care about an extra 10% WAL overhead. > > I think this whole situation is similar to the resistance to increasing > default_statistics_target. There's additional overhead added by enabling > both of these settings, in return for making it more likely for the average > person to see useful behavior by default. If things are rejiggered so the > advanced user has to turn things off in order to get optimal performance > when not using these features, in return for the newbie being more likely to > get things working in basic form, that's probably a net win for PostgreSQL > advocacy. But much like default_statistics_target, there needs to be some > more formal work done on quantifying just how bad each of these overheads > really are first. We lost 3-7% on multiple simple benchmarks between 8.3 > and 8.4[1] because of that retuning for ease of use on real-world workloads, > and that's not something you want to repeat too often. Exactly. It doesn't take many 3-7% slowdowns to add up to being 50% or 100% slower, and that sucks. In fact, I'm still not convinced that we were wise to boost default_statistics_target as much as we did. I argued for a smaller boost at the time. Actually, I think the best thing for default_statistics_target might be to scale the target based on the number of rows in the table, e.g. given N rows: 10 + (N / 1000), if N < 40,000 46 + (N / 10000), if 50,000 < N < 3,540,000 400, if N > 3,540,000 Consider a table with 2,000 rows. With default_statistics_target = 100, we can store up to 100 MCVs; and we break the remaining ~1900 values up into 100 buckets with 19 values/bucket. In most cases, that is probably overkill. Where you tend to run into problems with inadequate statistics is with the values that are not quite common enough to be an MCV, but are still significantly more common than their companions in the same bucket. However, with only 19 values in a bucket, you're probably not going to have that problem. If you scale the table down to 1000 rows you now have 9 values in a bucket, which makes it *really* unlikely you're going to have that problem. On the other hand, on a table with 4 million rows, it is entirely likely that there could be more than 100 values whose frequencies are worth tracking individually, and odds are good also that even if the planning time is a little longer to no purpose, it'll still be small relatively to the query execution time. It's unfortunately impractical for the size of the MCV list to track linearly with the size of the table, because there are O(n^2) algorithms in use, but I think some kind of graduated scheme might enable us to buy back some of that lost performance without damaging real workloads very much. Possibly even helping real workloads, because you may very well join large fact tables against small dimension tables, and odds are good that under the present scheme the fact tables have more statistics than they really need. As to replication, I don't believe the contention that most people will want to use replication. Many will, and that is fine, but many also won't. The world is full of development and test machines where replication is a non-issue, and some people won't even run it in production because the nature of their application makes the data on that box non-essential, or because they replicate with Bucardo or Slony. I completely agree that we should make it easier to get replication set up without (multiple) server restarts, but imposing a performance overhead on untuned systems is not the right way to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: