Обсуждение: ERROR: XLogFlush: request
Hi All, xlog.c code from version we use (7.3.2) /* * If we still haven't flushed to the request point then we have a * problem; most likely, the requested flush point is past end of * XLOG. This has been seen to occur when a disk page has a corrupted * LSN. * * Formerly we treated this as a PANIC condition, but that hurts the * system's robustness rather than helping it: we do not want to take * down the whole system due to corruption on one data page. In * particular, if the bad page is encountered again during recovery * then we would be unable to restart the database at all! (This * scenario has actually happened in the field several times with 7.1 * releases. Note that we cannot get here while InRedo is true, but if * the bad page is brought in and marked dirty during recovery then * CreateCheckpoint will try to flush it at the end of recovery.) * * The current approach is to ERROR under normal conditions, but only * WARNING during recovery, so that the system can be brought up even * if there's a corrupt LSN. Note that for calls from xact.c, the * ERROR will be promoted to PANIC since xact.c calls this routine * inside a critical section. However, calls from bufmgr.c are not * within critical sections and so we will not force a restart for a * bad LSN on a data page. */ if (XLByteLT(LogwrtResult.Flush, record)) elog(InRecovery ? WARNING : ERROR, "XLogFlush: request %X/%X is not satisfied --- flushed only to %X/%X", record.xlogid, record.xrecoff, LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff); A java process using postgres 7.3.2, got these errors java.sql.SQLException: ERROR: XLogFlush: request 0/240169BC is not satisfied --- flushed only to 0/23FFC01C While these errors where filling the logs, we were able to connect via psql, and see all the data. > This has been seen to occur when a disk page has a corrupted LSN I suppose LSN refers to Logical sector number of a WAL. If that was corrupted how-come we were able to access it via psql. Is it just an isolated phenomenon? Does postgres have an auto-recovery for this? If yes did old connections have stale values of LSN? Coming to safeguard: 1. Is there any use of restart java process when this happens? 2. Is there any use of or Is it safe to restart postmaster at this time? What all should be done when this happened? Any suggestions. -- Nitin
"Nitin Verma" <nitinverma@azulsystems.com> writes: > xlog.c code from version we use (7.3.2) > ... > What all should be done when this happened? Any suggestions. Updating to something newer than 7.3.2 would seem to be a good idea. 7.3.18 is the current release in that branch. regards, tom lane
Thanx Tom, anyway we are moving to 8.1.0 soon. Leaving that moving all our client to newer release will take sometime. I hope you know how it works in a product. Till that time we need to release a patch that recovers from this condition. Said that, do we have some advice or workarounds? I saw 8.1.0's code; it even ends up handling the same condition. /* * If we still haven't flushed to the request point then we have a * problem; most likely, the requested flush point is past end of XLOG. * This has been seen to occur when a disk page has a corrupted LSN. * * Formerly we treated this as a PANIC condition, but that hurts the system's * robustness rather than helping it: we do not want to take down the * whole system due to corruption on one data page. In particular, if the * bad page is encountered again during recovery then we would be unable * to restart the database at all! (This scenario has actually happened * in the field several times with 7.1 releases. Note that we cannot get * here while InRedo is true, but if the bad page is brought in and marked * dirty during recovery then CreateCheckPoint will try to flush it at the * end of recovery.) * * The current approach is to ERROR under normal conditions, but only WARNING * during recovery, so that the system can be brought up even if there's a * corrupt LSN. Note that for calls from xact.c, the ERROR will be * promoted to PANIC since xact.c calls this routine inside a critical * section. However, calls from bufmgr.c are not within critical sections * and so we will not force a restart for a bad LSN on a data page. */ if (XLByteLT(LogwrtResult.Flush, record)) elog(InRecovery ? WARNING : ERROR, "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X", record.xlogid, record.xrecoff, LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff); Thus there is a probability of same happing again, so will need a solution to recover out of it. So I re-quote myself again: ========= A java process using postgres 7.3.2, got these errors java.sql.SQLException: ERROR: XLogFlush: request 0/240169BC is not satisfied --- flushed only to 0/23FFC01C While these errors where filling the logs, we were able to connect via psql, and see all the data. > This has been seen to occur when a disk page has a corrupted LSN I suppose LSN refers to Logical sector number of a WAL. If that was corrupted how-come we were able to access it via psql. Is it just an isolated phenomenon? Does postgres have an auto-recovery for this? If yes did old connections have stale values of LSN? Coming to safeguard: 1. Is there any use of restarting java process when this happens? 2. Is there any use of or Is it safe to restart postmaster at this time? What all should be done when this happened? Any suggestions. ========= -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, April 13, 2007 8:18 PM To: Nitin Verma Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] ERROR: XLogFlush: request "Nitin Verma" <nitinverma@azulsystems.com> writes: > xlog.c code from version we use (7.3.2) > ... > What all should be done when this happened? Any suggestions. Updating to something newer than 7.3.2 would seem to be a good idea. 7.3.18 is the current release in that branch. regards, tom lane
Nitin Verma wrote: > Thanx Tom, anyway we are moving to 8.1.0 soon. > > Leaving that moving all our client to newer release will take sometime. I > hope you know how it works in a product. Till that time we need to release a > patch that recovers from this condition. > > Said that, do we have some advice or workarounds? Please read http://www.postgresql.org/support/versioning -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Please read http://www.postgresql.org/support/versioning Quoting part of the document: Upgrading to a minor release does not require a dump and restore; merely stop the database server, install the updated binaries, and restart the server. For some releases, manual changes may be required to complete the upgrade, so always read the release notes before upgrading. ========= Thanx, Alvaro... looks like I can just change the binary and move clients to 7.3.18, with any database recreation and dump/restore. This I can do in the patch itself. So Alvaro/Tom, do you think 7.3.18 will get me on a lower risk of getting in this situation? -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Friday, April 13, 2007 8:59 PM To: Nitin Verma Cc: Tom Lane; pgsql-general@postgresql.org Subject: Re: [GENERAL] ERROR: XLogFlush: request Nitin Verma wrote: > Thanx Tom, anyway we are moving to 8.1.0 soon. > > Leaving that moving all our client to newer release will take sometime. I > hope you know how it works in a product. Till that time we need to release a > patch that recovers from this condition. > > Said that, do we have some advice or workarounds? Please read http://www.postgresql.org/support/versioning -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, 2007-04-13 at 10:10, Nitin Verma wrote: > Thanx Tom, anyway we are moving to 8.1.0 soon. > > Leaving that moving all our client to newer release will take sometime. I > hope you know how it works in a product. Till that time we need to release a > patch that recovers from this condition. > > Said that, do we have some advice or workarounds? > > I saw 8.1.0's code; it even ends up handling the same condition. A couple of points. 1: This problem may be fixed in the latest 7.3.18 version. An upgrade from 7.3.x to 7.3.18 is pretty close to painless. Shut down pgsql, update package, startup pgsql. Backup beforehand is a nice option, but you should have backups already anyway... 2: Do not upgrade to 8.1.0. It's been updated many times since then. I think the latest 8.1.x is 8.1.8 or so. Get that. 3: Keep your versions updated and you should avoid most situations like this in the future.
> An upgrade from 7.3.x to 7.3.18 is pretty close to painless http://www.postgresql.org/docs/7.3/static/release-7-3-13.html http://www.postgresql.org/docs/7.3/static/release-7-3-10.html Can't do it blindfolded but still it can be categorized as painless :) > 2: Do not upgrade to 8.1.0. It's been updated many times since then. I think the latest 8.1.x is 8.1.8 or so. Get that. Thanx and understood > 3: Keep your versions updated and you should avoid most situations like this in the future. I hope I do realize that by now :) I can see something change on http://www.postgresql.org/docs/7.3/static/release-7-3-5.html * Force zero_damaged_pages to be on during recovery from WAL Is this related to XLogFlush problem? -----Original Message----- From: Scott Marlowe [mailto:smarlowe@g2switchworks.com] Sent: Friday, April 13, 2007 9:26 PM To: Nitin Verma Cc: pgsql general Subject: Re: [GENERAL] ERROR: XLogFlush: request On Fri, 2007-04-13 at 10:10, Nitin Verma wrote: > Thanx Tom, anyway we are moving to 8.1.0 soon. > > Leaving that moving all our client to newer release will take sometime. I > hope you know how it works in a product. Till that time we need to release a > patch that recovers from this condition. > > Said that, do we have some advice or workarounds? > > I saw 8.1.0's code; it even ends up handling the same condition. A couple of points. 1: This problem may be fixed in the latest 7.3.18 version. An upgrade from 7.3.x to 7.3.18 is pretty close to painless. Shut down pgsql, update package, startup pgsql. Backup beforehand is a nice option, but you should have backups already anyway... 2: Do not upgrade to 8.1.0. It's been updated many times since then. I think the latest 8.1.x is 8.1.8 or so. Get that. 3: Keep your versions updated and you should avoid most situations like this in the future.