Обсуждение: statement_timeout is not cancelling query
I have a few things to report so I'm not sure if one email is good or several but here goes. We are using Postgresql 8.3.8 We were having a blocking query problem that should have been fixed by statement_timeout = 90000 however this seems to have had zero effect. The query we have was like so: update articles set views=views+1 where id=7223 Thats it. Fairly simple right? Well, we also had a trigger function that updates a full text index on that record whenever any value is updated. We have since changed this function to only update the gist index for inserts or updates when one of the indexed columns is updated. However, let's stick with the original for a moment. There should have really been no problem updating the GIST index for a single row in an insert/update trigger. So what happened is, the above update never completed and the Postgresql service consumed all available memory. We had to forcefully reboot the machine, we turned on track activity, and watch it do it again and again. Luckily we were able to kill the process with the offending query before losing the machine. The postgresql configuration has a max of 255 connections. The machine has 16 gigabytes of RAM and 2 quad core xeons. We have several instances of Postgresql running on different ports. Our reason for doing this was to prevent one customer's database 'instance' from impacting another customer. A couple of years ago we had a run away query that brought the whole system down. So I implemented this separate instance concept and it has been purring along great ever since, until now. So we contacted a PG expert who was able to determine we had a corrupt full text index and recommended rebuilding it and fixing the trigger function. Once we rebuilt the index things worked (or are working) so far. So we have a couple of questions: Why is it that statement_timeout was ignored and the update statement was allowed to run for excessive time? Why does Postgresql NOT have a maximum memory allowed setting? We want to allocate resources efficiently and cannot allow one customer to impact others. That's it for now. Hope someone can provide helpful answers. Thanks, Mark W.
On 15/12/2009 12:35 PM, Mark Williamson wrote: > So what happened is, the above update never completed and the Postgresql > service consumed all available memory. We had to forcefully reboot the > machine That means your server is misconfigured. PostgreSQL should never consume all available memory. If it does, you have work_mem and/or maintenance_work_mem set way too high, and you have VM overcommit enabled in the kernel. You also have too much swap. http://www.postgresql.org/docs/current/interactive/kernel-resources.html http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommit.html I wouldn't be surprised if you had shared_buffers set too high as well, and you have no ulimit set on postgresql's memory usage. All those things add up to "fatal". A properly configured machine should be able to survive memory exhaustion caused by a user process fine. Disable VM overcommit, set a ulimit on postgresql so it can't consume all memory, use a sane amount of swap, and set sane values for work_mem and maintenance_work_mem. > Why does Postgresql NOT have a maximum memory allowed setting? We want > to allocate resources efficiently and cannot allow one customer to > impact others. It does. "man ulimit". The operating system can enforce it much better than PostgreSQL can. If a Pg bug was to cause Pg to go runaway or try to allocate insane amounts of RAM, the ulimit would catch it. I *do* think it'd be nice to have ulimit values settable via postgresql.conf so that you didn't have to faff about editing init scripts, though. ( TODO item? ) -- Craig Ringer
On Tue, Dec 15, 2009 at 12:09 AM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 15/12/2009 12:35 PM, Mark Williamson wrote: > >> So what happened is, the above update never completed and the Postgresql >> service consumed all available memory. =A0We had to forcefully reboot the >> machine > > That means your server is misconfigured. PostgreSQL should never consume = all > available memory. If it does, you have work_mem and/or maintenance_work_m= em > set way too high, and you have VM overcommit enabled in the kernel. You a= lso > have too much swap. > > http://www.postgresql.org/docs/current/interactive/kernel-resources.html > > http://www.network-theory.co.uk/docs/postgresql/vol3/LinuxMemoryOvercommi= t.html > > I wouldn't be surprised if you had shared_buffers set too high as well, a= nd > you have no ulimit set on postgresql's memory usage. All those things add= up > to "fatal". > > A properly configured machine should be able to survive memory exhaustion > caused by a user process fine. Disable VM overcommit, set a ulimit on > postgresql so it can't consume all memory, use a sane amount of swap, and > set sane values for work_mem and maintenance_work_mem. I am skeptical that this is the real cause of the problem. Yeah, OK, ulimit might have stopped it, but I don't think any of the rest of this would have mattered. Unfortunately, we don't know how large the table was that the OP attempted to update, or the details of how the trigger was set up, but my guess it that it was the pending-trigger list that sucked up all the available memory on the box. This is an issue that other people have run into in the past, and I don't think we have a good solution. I wonder if we should put some kind of a limit in place so that queries like this will at least fail relatively gracefully with an error message rather than taking down the box. There is a feature forthcoming in 8.5 which will make it easier to avoid these types of problems, by allowing you to test a condition before the trigger gets added to the pending trigger list. So if your trigger is designed to update some side table whenever a certain column is updated, you can really skip the trigger altogether without incurring any memory overhead. But there will still be people who do it the old way, and it would be nice if we could at least mitigate the impact a little. ...Robert
On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > =A0This is an > issue that other people have run into in the past, and I don't think > we have a good solution. =A0I wonder if we should put some kind of a > limit in place so that queries like this will at least fail relatively > gracefully with an error message rather than taking down the box. Eh? That's exactly what's supposed to happen now. When malloc returns 0 you're supposed to get a graceful error message and transaction abort. --=20 greg
On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark <gsstark@mit.edu> wrote: > On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrot= e: >> =A0This is an >> issue that other people have run into in the past, and I don't think >> we have a good solution. =A0I wonder if we should put some kind of a >> limit in place so that queries like this will at least fail relatively >> gracefully with an error message rather than taking down the box. > > Eh? That's exactly what's supposed to happen now. When malloc returns > 0 you're supposed to get a graceful error message and transaction > abort. I didn't know that, but it I think by the time malloc returns 0 usually other bad things are happening. I don't think that's really an answer. ...Robert
On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I didn't know that, but it I think by the time malloc returns 0 > usually other bad things are happening. =A0I don't think that's really > an answer. Only if, as Craig said and you disputed, you have overcommit enabled or lots of swap. There is a problem though, if you have overcommit disabled you kind of need lots of swap to avoid running out of virtual memory long before you're actually short on physical memory. And it's true that Linux has had trouble dealing with low memory situations in the past few years, at least in my experience. But that's not true on all OSes and we can hope it won't be true for Linux forever too. --=20 greg
On Tue, Dec 15, 2009 at 12:07 PM, Greg Stark <gsstark@mit.edu> wrote: > On Tue, Dec 15, 2009 at 4:16 PM, Robert Haas <robertmhaas@gmail.com> wrot= e: >> I didn't know that, but it I think by the time malloc returns 0 >> usually other bad things are happening. =A0I don't think that's really >> an answer. > > Only if, as Craig said and you disputed, you have overcommit enabled > or lots of swap. I definitely dispute that. :-) I mean, typically what happens when you allocate a lot of memory is that everything else on the system that's not in active use gets pushed out of RAM to make room for the memory hog. So all of your file cache goes away and not-recently-accessed portions of other processes code and data segments get swapped out. The system becomes crushingly slow and unresponsive. On my laptop, for example, eventually as memory consumption increases you can't use Xwindows any more because all of those processes have been pushed out to disk to make room for the memory hog. Every time you move the mouse it faults all those pages back in to try to redraw the screen. But as soon as you stop it pushes them all back out again. It's difficult to reach the point where malloc actually fails because as swapping increases the activity of the system (including the runaway process) grinds almost to a halt. I'm not willing to wait that long for my transaction to fail. I suppose that I could fix this by getting rid of my swap partition altogether, but that seems a rather extreme solution, and it's certainly not the way most UNIX/Linux systems I run across are configured, if for no other reason than that the operating system configurator usually recommends creating one. Am I all wet here? I thought waiting for malloc to fail was usually considered a poor error recovery mechanism. ...Robert
On Tue, Dec 15, 2009 at 5:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I suppose that I could fix this by getting rid of my swap partition > altogether, but that seems a rather extreme solution, and it's > certainly not the way most UNIX/Linux systems I run across are > configured, if for no other reason than that the operating system > configurator usually recommends creating one. Well I suppose it's a question of degree. The more swap you have the more you can do before you run out of memory. But nobody said swap performs as well as RAM.... What's the kernel going to do though, refuse to allocate memory when it has swap available? The problem is that the OS gives no feedback on when you're running low on RAM but haven't run out yet. There were experiments in the 90s with a SIGDANGER but I think it never worked great and I don't think it's widespread. If Postgres imposed a limit itself it would a) be hard for it to be a server-wide limit and b) wouldn't take into account other things running on the system. So that doesn't really help either. -- greg
Robert Haas escribió: > On Tue, Dec 15, 2009 at 11:02 AM, Greg Stark <gsstark@mit.edu> wrote: > > On Tue, Dec 15, 2009 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> This is an > >> issue that other people have run into in the past, and I don't think > >> we have a good solution. I wonder if we should put some kind of a > >> limit in place so that queries like this will at least fail relatively > >> gracefully with an error message rather than taking down the box. > > > > Eh? That's exactly what's supposed to happen now. When malloc returns > > 0 you're supposed to get a graceful error message and transaction > > abort. > > I didn't know that, but it I think by the time malloc returns 0 > usually other bad things are happening. I don't think that's really > an answer. Actually the real answer is that we can't spill the deferred trigger queue to disk, so it consumes all memory. *That* is very much our own shortcoming. The fact that this drove the operating system into thrashing is just an expected (if undesirable) side effect. Greg is right that if malloc returns 0 we act sanely, i.e. abort transaction and release all the memory; Craig is right that having this happen is a good thing. Whether or not this causes a lot of thrashing depends on the situation, but that's an OS level problem, not Postgres'. If we're to do anything about this, it is spilling the trigger queue so it doesn't eat an unbounded amount of memory. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > If we're to do anything about this, it is spilling the trigger queue so > it doesn't eat an unbounded amount of memory. Of course, the reason nothing much has been done about that is that by the time your trigger queue is long enough to cause such an issue, you're screwed anyway --- actually executing all those triggers would take longer than you'll want to wait. I tend to agree with the upthread opinion that if you're on a platform that responds so ungracefully to memory hogs, you'd be best off using ulimit to confine backends to a moderate amount of memory space. But this *is* a platform problem; it's just one more example of how poorly designed and implemented Linux's memory handling is. regards, tom lane
On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> If we're to do anything about this, it is spilling the trigger queue so >> it doesn't eat an unbounded amount of memory. > > Of course, the reason nothing much has been done about that is that > by the time your trigger queue is long enough to cause such an issue, > you're screwed anyway --- actually executing all those triggers would > take longer than you'll want to wait. What is the best way to go about doing that, anyway? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> If we're to do anything about this, it is spilling the trigger queue so >>> it doesn't eat an unbounded amount of memory. >> >> Of course, the reason nothing much has been done about that is that >> by the time your trigger queue is long enough to cause such an issue, >> you're screwed anyway --- actually executing all those triggers would >> take longer than you'll want to wait. > What is the best way to go about doing that, anyway? Well, we added conditional triggers which provides a partial fix. The only other idea I've heard that sounds like it'd really help is having some sort of lossy storage for foreign-key triggers, where we'd fall back to per-block or whole-table rechecking of the constraint instead of trying to track the exact rows that were modified. Not sure how you apply that to non-FK triggers though. regards, tom lane
On Tue, Dec 15, 2009 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Dec 15, 2009 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Alvaro Herrera <alvherre@commandprompt.com> writes: >>>> If we're to do anything about this, it is spilling the trigger queue so >>>> it doesn't eat an unbounded amount of memory. >>> >>> Of course, the reason nothing much has been done about that is that >>> by the time your trigger queue is long enough to cause such an issue, >>> you're screwed anyway --- actually executing all those triggers would >>> take longer than you'll want to wait. > >> What is the best way to go about doing that, anyway? > > Well, we added conditional triggers which provides a partial fix. =A0The > only other idea I've heard that sounds like it'd really help is having > some sort of lossy storage for foreign-key triggers, where we'd fall > back to per-block or whole-table rechecking of the constraint instead of > trying to track the exact rows that were modified. =A0Not sure how you > apply that to non-FK triggers though. Err, sorry, I quoted the wrong part. I meant, how would you rlimit the server memory usage? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Err, sorry, I quoted the wrong part. I meant, how would you rlimit > the server memory usage? Put a ulimit command in the server start script? Depending on the details of the start script you might need to put it in the postgres user's .profile instead, but it's certainly doable. regards, tom lane
On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Err, sorry, I quoted the wrong part. =A0I meant, how would you rlimit >> the server memory usage? > > Put a ulimit command in the server start script? =A0Depending on the > details of the start script you might need to put it in the postgres > user's .profile instead, but it's certainly doable. This may be a stupid question, but when you hit the limit, will it result in an ERROR or a PANIC? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 15, 2009 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Put a ulimit command in the server start script? Depending on the >> details of the start script you might need to put it in the postgres >> user's .profile instead, but it's certainly doable. > This may be a stupid question, but when you hit the limit, will it > result in an ERROR or a PANIC? Should be an ERROR ... if it isn't, that's probably a bug. The design intention is that malloc failure is just an ERROR. regards, tom lane
What is the most ideal/optimal platform for postgresql? Linux (distro?), freebsd, windows, etc. consider memory management, file system performance, threading model etc.
On 16/12/2009 10:31 AM, Mark Williamson wrote: > What is the most ideal/optimal platform for postgresql? Linux > (distro?), freebsd, windows, etc. Pg has been around on UNIX-like platforms for longer than Windows, and is better tested on those platforms. Its design is also more friendly toward UNIX-like systems, being based on a connection-per-process rather than connection-per-thread model - though that matters less with the NT kernel used in all modern Windows flavours. Of the UNIX-like systems, the vast majority of people on the list seem to use a Linux flavour or FreeBSD. I mostly see Red Hat Enterprise, FreeBSD, Ubuntu, Debian, and CentOS users when OS is mentioned. I'd stick to whatever Linux/BSD you're most familiar with, personally. Rather than worry about OS, though, your real issues are with hardware selection. Getting a good quality RAID controller with battery backup, plus plenty of disks in RAID 10 and plenty of RAM will make more difference than about anything else. -- Craig Ringer