Re: autovacuum truncate exclusive lock round two
От | Robert Haas |
---|---|
Тема | Re: autovacuum truncate exclusive lock round two |
Дата | |
Msg-id | CA+TgmobD6XOw1nZSdbFYVGWa=NzAQMuXO-4qQ2yxodwFohE+bA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: autovacuum truncate exclusive lock round two ("Kevin Grittner" <kgrittn@mail.com>) |
Ответы |
Re: autovacuum truncate exclusive lock round two
|
Список | pgsql-hackers |
On Wed, Dec 5, 2012 at 11:24 AM, Kevin Grittner <kgrittn@mail.com> wrote: > Robert Haas wrote: >> Since people *already* raise deadlock_timeout to obscenely high >> values (a minute? an hour???) and then complain that things blow >> up in their face, I think there's a decent argument to be made >> that piggybacking anything else on that setting is unwise. > > If people are really doing that, then I tend to agree. I wasn't > aware of that practice. It's probably not quite common enough to be called a "practice", but I have encountered it a number of times in support situations. Alas, I no longer remember the details of exactly what misery it caused, but I do remember it wasn't good. :-) >> Against that, FWICT, this problem only affects a small number of >> users: Jan is the only person I can ever remember reporting this >> issue. I'm not dumb enough to think he's the only person who it >> affects; but my current belief is that it's not an enormously >> common problem. So the main argument I can see against adding a >> GUC is that the problem is too marginal to justify a setting of >> its own. What I really see as the key issue is: suppose we >> hardcode this to say 2 seconds. Is that going to fix the problem >> effectively for 99% of the people who have this problem, or for >> 25% of the people who have this problem? In the former case, we >> probably don't need a GUC; in the latter case, we probably do. > > Given the fact that autovacuum will keep throwing workers at it to > essentially loop indefinitely at an outer level, I don't think the > exact setting of this interval is all that critical either. My gut > feel is that anything in the 2 second to 5 second range would be > sane, so I won't argue over any explicit setting within that range. > Below that, I think the overhead of autovacuum coming back to the > table repeatedly would probably start to get too high; below that > we could be causing some small, heavily-updated table to be > neglected by autovacuum -- especially if you get multiple > autovacuum workers tied up in this delay on different tables at the > same time. I think that part of what's tricky here is that the dynamics of this problem depend heavily on table size. I handled one support case where lowering autovacuum_naptime to 15s was an indispenable part of the solution, so in that case having an autovacuum worker retry for more than a few seconds sounds kind of insane. OTOH, that case involved a small, rapidly changing table. If you've got an enormous table where vacuum takes an hour to chug through all of it, abandoning the effort to truncate the table after a handful of seconds might sound equally insane. Many it'd be sensible to relate the retry time to the time spend vacuuming the table. Say, if the amount of time spent retrying exceeds 10% of the time spend vacuuming the table, with a minimum of 1s and a maximum of 1min, give up. That way, big tables will get a little more leeway than small tables, which is probably appropriate. > Regarding how many people are affected, I have seen several reports > of situations where users claim massive impact on performance when > autovacuum kicks in. The reports have not included enough detail to > quantify the impact or in most cases to establish a cause, but this > seems like it could have a noticable impact, especially if the > deadlock timeout was set to more than a second. Yeah, I agree this could be a cause of those types of reports, but I don't have any concrete evidence that any of the cases I've worked were actually due to this specific issue. The most recent case of this type I worked on was due to I/O saturation - which, since it happened to be EC2, really meant network saturation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: