Re: max_wal_senders must die
От | Robert Haas |
---|---|
Тема | Re: max_wal_senders must die |
Дата | |
Msg-id | AANLkTimzaty5a3cs1qabY5B-GQOmof3gOUuuDDyzJUBj@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: max_wal_senders must die (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: max_wal_senders must die
(Greg Stark <gsstark@mit.edu>)
Re: max_wal_senders must die (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark <gsstark@mit.edu> wrote: > On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Exactly. It doesn't take many 3-7% slowdowns to add up to being 50% >> or 100% slower, and that sucks. In fact, I'm still not convinced that >> we were wise to boost default_statistics_target as much as we did. I >> argued for a smaller boost at the time. > > Well we don't want to let ourselves be paralyzed by FUD so it was > important to identify specific concerns and then tackle those > concerns. Once we identified the worst-case planning cases we profiled > them and found that the inflection point of the curve was fairly > clearly above 100 but that there were cases where values below 1,000 > caused problems. So I'm pretty happy with the evidence-based approach. The inflection point of the curve was certainly a good thing for us to look at but the fact remains that we took a hit on a trivial benchmark, and we can't afford to take too many of those. >> Actually, I think the best thing for default_statistics_target might >> be to scale the target based on the number of rows in the table, e.g. >> given N rows: > > The number of buckets needed isn't related to the population size -- > it's related to how wide the ranges you'll be estimating selectivity > for are. That is, with our current code, if you're selecting tuples > within a range a..b and that range happens to be the same size as the > bucket size then you'll get an accurate estimate with a fixed 95th > percentile precision regardless of the size of the table (to an > approximation). If you have a WHERE clause of the form WHERE x > some_constant, then the effects vary depending on how that constant is chosen. If it's the median value, then as you say the statistics target doesn't matter much at all; but that's not necessarily representative of real life. For example, suppose x is a date and the constant is Monday of the current week. As the table grows, the present week's data becomes a smaller and smaller fraction of the table data. When it gets to be a tiny fraction of the very last histogram bucket, the estimates start to get progressively worse. At some point you have to give up and partition the table for other reasons anyway, but having to do it because the statistics are off is inexcusable. We've seen people hit this precise issue on -performance a few times. > I'm not sure how our selectivity works at all for the degenerate case > of selecting for specific values. I don't understand how histograms > are useful for such estimates at all. I think the MCV lists are > basically an attempt to overcome this problem and as you point out I'm > not sure the statistics target is really the right thign to control > them -- but since I don't think there's any real statistics behind > them I'm not sure there's any right way to control them. If you have a WHERE clause of the form WHERE x = some_constant, then you get a much better estimate if some_constant is an MCV. If the constant is not an MCV, however, you still get better estimates, because the estimation code knows that no non-MCV can occur more frequently than any MCV, so increasing the number of MCVs pushes those estimates closer to reality. It is especially bad when the frequency "falls off a cliff" at a certain point in the distribution e.g. if there are 243 values that occur much more frequently than any others, a stats target of 250 will do much better than 225. But even if that's not an issue, it still helps. The bottom line here is that I can't remember any message, ever, on -performance, or any incident within my personal experience, where it was necessary to increase the statistics target beyond 50-100 on a table with 10K rows. However, there are certainly cases where we've recommended that for big tables, which means there are also people out there who have a performance problem on a big table but haven't asked for help and therefore haven't gotten that advice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: