Обсуждение: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
On Sat, Jan 23, 2010 at 4:37 PM, Simon Riggs <sriggs@postgresql.org> wrote: > max_standby_delay = -1 option removed to prevent deadlock. This seems unacceptable to me. It means it's impossible to configure a reporting slave so it doesn't spuriously signal errors if your reports run too long. Recall that I am still of the opinion that the only reasonable default value for this parameter is actually -1. I don't think we should signal errors for correctly working systems unless the user requests them in some way. Was there any discussion about this change? I don't recall seeing it proposed on hackers. -- greg
On Sat, 2010-01-23 at 17:35 +0000, Greg Stark wrote: > On Sat, Jan 23, 2010 at 4:37 PM, Simon Riggs <sriggs@postgresql.org> wrote: > > max_standby_delay = -1 option removed to prevent deadlock. > > This seems unacceptable to me. It means it's impossible to configure a > reporting slave so it doesn't spuriously signal errors if your reports > run too long. > > Recall that I am still of the opinion that the only reasonable default > value for this parameter is actually -1. I don't think we should > signal errors for correctly working systems unless the user requests > them in some way. What is your proposed way of handling buffer pin deadlocks? That will be acceptable and working to some extent in the next week? Wait forever isn't always a good idea, anymore, if it ever was. Lots of things still on the TODO, if you are looking for a project. http://wiki.postgresql.org/wiki/Hot_Standby_TODO -- Simon Riggs www.2ndQuadrant.com
On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > What is your proposed way of handling buffer pin deadlocks? That will be > acceptable and working to some extent in the next week? > > Wait forever isn't always a good idea, anymore, if it ever was. I've never said it was always a good idea. But killing correctly running queries isn't always a good idea either. I'm interested in using HS for running read-only replicas for load balancing. It would pretty sad if queries dispatched to a read-only replica received a spurious unpredictable errors for reasons the application programmer cannot control. I'll look at the buffer pin deadlock problem again, but I didn't realize the situation was so dire. And what were the downsides of the "stop gap"? -- greg
On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote: > On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > What is your proposed way of handling buffer pin deadlocks? That will be > > acceptable and working to some extent in the next week? > > > > Wait forever isn't always a good idea, anymore, if it ever was. > > I've never said it was always a good idea. But killing correctly > running queries isn't always a good idea either. I'm interested in > using HS for running read-only replicas for load balancing. It would > pretty sad if queries dispatched to a read-only replica received a > spurious unpredictable errors for reasons the application programmer > cannot control. I understand your concern and seek to provide the best way forwards in the time available. Hopefully you have a better way, but we can do little about the time. Your input is welcome, and your code also. > I'll look at the buffer pin deadlock problem again, but I didn't > realize the situation was so dire. And what were the downsides of the > "stop gap"? Any query that attempted to wait for a lock threw an ERROR. Since the -1 setting would never resolve a deadlock itself, if we allowed it we would have to either use the stop gap or use a full deadlock detector. Given the stop gap does what -1 says it will never do, ISTM that having -1 would be contradictory. I did not wish to remove it, but it seemed safer to do so. Putting it back is straightforward, if it makes sense. We would need to detect deadlock from both directions, when Startup begins to wait when users sleep and when users begin to wait when Startup sleeps. Full deadlock detection is to much code for too small a problem. -- Simon Riggs www.2ndQuadrant.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Heikki Linnakangas
		    Дата:
		        Simon Riggs wrote: > On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote: >> On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> What is your proposed way of handling buffer pin deadlocks? That will be >>> acceptable and working to some extent in the next week? >>> >>> Wait forever isn't always a good idea, anymore, if it ever was. >> I've never said it was always a good idea. But killing correctly >> running queries isn't always a good idea either. I'm interested in >> using HS for running read-only replicas for load balancing. It would >> pretty sad if queries dispatched to a read-only replica received a >> spurious unpredictable errors for reasons the application programmer >> cannot control. > > I understand your concern and seek to provide the best way forwards in > the time available. Hopefully you have a better way, but we can do > little about the time. Your input is welcome, and your code also. I just woke up to this thread too. I have to agree with Greg, we must think harder. Can you summarize the problem again? I don't immediately see how the deadlock could happen. Would this simple scheme work: When the startup process has waited for a short while (ie deadlock_timeout), it sends the signal "please check if you're holding a pin on buffer X" to all backends. When a backend receives that signal, it checks if it is holding a pin on the given buffer *and* waiting on a lock. If it is, abort the transaction. Assuming that a backend can only block waiting on a lock held by the startup process, deadlock detection is as simple as that. > Given the stop gap does what -1 says it will never do, ISTM that having > -1 would be contradictory. I did not wish to remove it, but it seemed > safer to do so. Putting it back is straightforward, if it makes sense. For all practical purposes, INT_MAX, which is the upper limit for max_standby_delay, is the same as infinity. So removing -1 doesn't really get you out of jail. And no, let's not make the upper limit smaller, there's no natural upper limit for that setting. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Simon Riggs
		    Дата:
		        On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Sat, 2010-01-23 at 21:40 +0000, Greg Stark wrote: > >> On Sat, Jan 23, 2010 at 8:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >>> What is your proposed way of handling buffer pin deadlocks? That will be > >>> acceptable and working to some extent in the next week? > >>> > >>> Wait forever isn't always a good idea, anymore, if it ever was. > >> I've never said it was always a good idea. But killing correctly > >> running queries isn't always a good idea either. I'm interested in > >> using HS for running read-only replicas for load balancing. It would > >> pretty sad if queries dispatched to a read-only replica received a > >> spurious unpredictable errors for reasons the application programmer > >> cannot control. > > > > I understand your concern and seek to provide the best way forwards in > > the time available. Hopefully you have a better way, but we can do > > little about the time. Your input is welcome, and your code also. > > I just woke up to this thread too. I have to agree with Greg, we must > think harder. Must is a word I would disagree with. There are other bigger usability issues to resolve at present and I'm not personally going to be distracted away from addressing them. I have no problem in other contributions. > Can you summarize the problem again? I don't immediately see how the > deadlock could happen. > > Would this simple scheme work: > > When the startup process has waited for a short while (ie > deadlock_timeout), it sends the signal "please check if you're holding a > pin on buffer X" to all backends. When a backend receives that signal, > it checks if it is holding a pin on the given buffer *and* waiting on a > lock. If it is, abort the transaction. Assuming that a backend can only > block waiting on a lock held by the startup process, deadlock detection > is as simple as that. No, it won't work. A deadlock could occur after the startup process has already been waiting for longer than the deadlock timeout. Better ideas welcome, but solutions may not be forthcoming in the time available. -- Simon Riggs www.2ndQuadrant.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Heikki Linnakangas
		    Дата:
		        Simon Riggs wrote: > On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote: >> Would this simple scheme work: >> >> When the startup process has waited for a short while (ie >> deadlock_timeout), it sends the signal "please check if you're holding a >> pin on buffer X" to all backends. When a backend receives that signal, >> it checks if it is holding a pin on the given buffer *and* waiting on a >> lock. If it is, abort the transaction. Assuming that a backend can only >> block waiting on a lock held by the startup process, deadlock detection >> is as simple as that. > > No, it won't work. A deadlock could occur after the startup process has > already been waiting for longer than the deadlock timeout. Retry every deadlock_timeout seconds? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Heikki Linnakangas
		    Дата:
		        Heikki Linnakangas wrote: > Simon Riggs wrote: >> On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote: >>> Would this simple scheme work: >>> >>> When the startup process has waited for a short while (ie >>> deadlock_timeout), it sends the signal "please check if you're holding a >>> pin on buffer X" to all backends. When a backend receives that signal, >>> it checks if it is holding a pin on the given buffer *and* waiting on a >>> lock. If it is, abort the transaction. Assuming that a backend can only >>> block waiting on a lock held by the startup process, deadlock detection >>> is as simple as that. >> No, it won't work. A deadlock could occur after the startup process has >> already been waiting for longer than the deadlock timeout. > > Retry every deadlock_timeout seconds? Or better yet, also check if the current backend is holding the waited-for pin in CheckDeadLock(). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Simon Riggs
		    Дата:
		        On Mon, 2010-01-25 at 10:59 +0200, Heikki Linnakangas wrote: > Heikki Linnakangas wrote: > > Simon Riggs wrote: > >> On Mon, 2010-01-25 at 09:52 +0200, Heikki Linnakangas wrote: > >>> Would this simple scheme work: > >>> > >>> When the startup process has waited for a short while (ie > >>> deadlock_timeout), it sends the signal "please check if you're holding a > >>> pin on buffer X" to all backends. When a backend receives that signal, > >>> it checks if it is holding a pin on the given buffer *and* waiting on a > >>> lock. If it is, abort the transaction. Assuming that a backend can only > >>> block waiting on a lock held by the startup process, deadlock detection > >>> is as simple as that. > >> No, it won't work. A deadlock could occur after the startup process has > >> already been waiting for longer than the deadlock timeout. > > > > Retry every deadlock_timeout seconds? > > Or better yet, also check if the current backend is holding the > waited-for pin in CheckDeadLock(). The deadlock can be caused by either party. As long as the check occurs in both places, it can be done. The logic for the startup process must be enhanced to allow for both deadlocks and normal pin buffer checks happening at different times without confusion. The SIGUSR1 message received by backend would need to differ as to whether it was a deadlock check timeout or a normal buffer pin timeout. It can be done, though will require very careful testing. It's clearly a lower priority than other code based upon feedback from the Hot Standby user group. My assessment is too much code, too rare a case and too little time, so it is a relative, not absolute judgement. I would not personally argue this is something worth delaying for, though you and Greg may wish to do that. If you insisted it was me that did this, I would not be in a position to start it for about 10 days. -- Simon Riggs www.2ndQuadrant.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Heikki Linnakangas
		    Дата:
		        Simon Riggs wrote: > It's clearly a > lower priority than other code based upon feedback from the Hot Standby > user group. What's the "the Hot Standby user group"? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Simon Riggs
		    Дата:
		        On Mon, 2010-01-25 at 16:22 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > It's clearly a > > lower priority than other code based upon feedback from the Hot Standby > > user group. > > What's the "the Hot Standby user group"? A group of people who have an interest in using Hot Standby, as advertised on postgresql.org and Weekly News. -- Simon Riggs www.2ndQuadrant.com
Re: Re: pgsql: In HS, Startup process sets SIGALRM when waiting for buffer pin.
От
 
		    	Josh Berkus
		    Дата:
		        > A group of people who have an interest in using Hot Standby, as > advertised on postgresql.org and Weekly News. There are pg users who won't be using HS/SR? ;-) --Josh