Re: How to cripple a postgres server
От | Stephen Robert Norris |
---|---|
Тема | Re: How to cripple a postgres server |
Дата | |
Msg-id | 1022628439.25604.2.camel@chinstrap обсуждение исходный текст |
Ответ на | Re: How to cripple a postgres server (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
On Wed, 2002-05-29 at 09:08, Tom Lane wrote: > Stephen Robert Norris <srn@commsecure.com.au> writes: > > I've already strace'ed the idle backend, and I can see the SIGUSR2 being > > delivered just before everything goes bad. > > >> Yes, but what happens after that? > > > The strace stops until I manually kill the connecting process - the > > machine stops in general until then (vmstat 1 stops producing output, > > shells stop responding ...). So who knows what happens :( > > Hmm, I hadn't quite understood that you were complaining of a > system-wide lockup and not just Postgres getting wedged. I think the > chances are very good that this *is* a kernel bug. In any case, no > self-respecting kernel hacker would be happy with the notion that > a completely unprivileged user program can lock up the whole machine. > So even if Postgres has got a problem, the kernel is clearly failing > to defend itself adequately. > > Are you able to reproduce the problem with fewer than 800 backends? > How about if you try it on a smaller machine? Yep, on a PIII-800 with 256MB I can do it with fewer backends (I forget how many) and only a few vacuums. It's much easier, basically, but there's much less CPU on that machine. It also locks the machine up for several minutes... > Another thing that would be entertaining to try is other ways of > releasing 800 queries at once. For example, on connection 1 do > BEGIN; LOCK TABLE foo; > then issue a "SELECT COUNT(*) FROM foo" on each other connection, > and finally COMMIT on connection 1. If that creates similar misbehavior > then I think the SI-overrun mechanism is probably not to be blamed. > > > ... Sometimes, the > > SIGUSR2 does just create a very brief load spike (vmstat shows >500 > > processes on the run queue, but the next second everything is back to > > normal and no unusual amount of CPU is consumed). > > That's the behavior I'd expect. We need to figure out what's different > between that case and the cases where it locks up. > > regards, tom lane Yeah. I'll try your suggestion above and report back. Stephen
Вложения
В списке pgsql-general по дате отправления: