Обсуждение: [HACKERS] Crash on promotion when recovery.conf is renamed
I had a system where the recovery.conf file was renamed "out of the way" at some point, and then the system was promoted. This is obviously operator error, but it seems like something we should handle.
			
		What happens now is that the non-existance of recovery.conf is a FATAL error. I wonder if it should just be a WARNING, at least in the case of ENOENT?
What happens is this.
Log output:
2016-12-15 09:36:46.265 CET [25437] LOG:  received promote request
2016-12-15 09:36:46.265 CET [25438] FATAL:  terminating walreceiver process due to administrator command
mha@mha-laptop:~/postgresql/inst/head$ 2016-12-15 09:36:46.265 CET [25437] LOG:  invalid record length at 0/5015168: wanted 24, got 0
2016-12-15 09:36:46.265 CET [25437] LOG:  redo done at 0/5015130
2016-12-15 09:36:46.265 CET [25437] LOG:  last completed transaction was at log time 2016-12-15 09:36:19.27125+01
2016-12-15 09:36:46.276 CET [25437] LOG:  selected new timeline ID: 2
2016-12-15 09:36:46.429 CET [25437] FATAL:  could not open file "recovery.conf": No such file or directory
2016-12-15 09:36:46.429 CET [25436] LOG:  startup process (PID 25437) exited with exit code 1
2016-12-15 09:36:46.429 CET [25436] LOG:  terminating any other active server processes
2016-12-15 09:36:46.429 CET [25456] WARNING:  terminating connection because of crash of another server process
2016-12-15 09:36:46.429 CET [25456] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-12-15 09:36:46.429 CET [25456] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-12-15 09:36:46.431 CET [25436] LOG:  database system is shut down
So we can see it switches to timeline 2. Looking in pg_wal (or pg_xlog -- customer system was on 9.5, but this is reproducible in HEAD):
-rw------- 1 mha mha 16777216 Dec 15 09:36 000000010000000000000004
-rw------- 1 mha mha 16777216 Dec 15 09:36 000000010000000000000005
-rw------- 1 mha mha 16777216 Dec 15 09:36 000000020000000000000005
-rw------- 1 mha mha       41 Dec 15 09:36 00000002.history
However, according to pg_controldata, we are still on timeline 1:
Latest checkpoint location:           0/4000060
Prior checkpoint location:            0/4000060
Latest checkpoint's REDO location:    0/4000028
Latest checkpoint's REDO WAL file:    000000010000000000000004
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
..
Minimum recovery ending location:     0/5015168
Min recovery ending loc's timeline:   1
But since we have a history file for timeline 2 in the data directory (and neatly archived), this data directory isn't consistent with that. Meaning that for example any other standbys that you try to connect to this cluster will simply fail, because they try to join up on timeline 2 which doesn't actually exist.
I wonder if there might be more corner cases like this, but in this particular one it seems easy enough to just say that failing to rename recovery.conf because it didn't exist is safe.
But in the case of failing to rename recovery.conf for example because of permissions errors, we don't want to ignore it. But we also really don't want to end up with this kind of inconsistent data directory IMO. I don't know that code well enough to suggest how to fix it though -- hoping for input for someone who knows it closer?
On 12/15/2016 10:44 AM, Magnus Hagander wrote: > I wonder if there might be more corner cases like this, but in this > particular one it seems easy enough to just say that failing to rename > recovery.conf because it didn't exist is safe. Yeah. It's unexpected though, so I think erroring out is quite reasonable. If the recovery.conf file went missing, who knows what else is wrong. > But in the case of failing to rename recovery.conf for example because of > permissions errors, we don't want to ignore it. But we also really don't > want to end up with this kind of inconsistent data directory IMO. I don't > know that code well enough to suggest how to fix it though -- hoping for > input for someone who knows it closer? Hmm. Perhaps we should write the timeline history file only after renaming recovery.conf. In general, we should keep the window between writing the timeline history file and writing the end-of-recovery record as small as possible. I don't think we can eliminate it completely, but it makes sense to minimize it. Something like the attached (completely untested). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Dec 15, 2016 at 11:25:10AM +0200, Heikki Linnakangas wrote: > On 12/15/2016 10:44 AM, Magnus Hagander wrote: > > I wonder if there might be more corner cases like this, but in this > > particular one it seems easy enough to just say that failing to rename > > recovery.conf because it didn't exist is safe. > > Yeah. It's unexpected though, so I think erroring out is quite reasonable. > If the recovery.conf file went missing, who knows what else is wrong. i'd rather not mess up with the exiting behavior and just complain to the pilot to not do that. This enters in the category of not removing the backup_label file after taking a backup... > > But in the case of failing to rename recovery.conf for example because of > > permissions errors, we don't want to ignore it. But we also really don't > > want to end up with this kind of inconsistent data directory IMO. I don't > > know that code well enough to suggest how to fix it though -- hoping for > > input for someone who knows it closer? > > Hmm. Perhaps we should write the timeline history file only after renaming > recovery.conf. In general, we should keep the window between writing the > timeline history file and writing the end-of-recovery record as small as > possible. I don't think we can eliminate it completely, but it makes sense > to minimize it. Something like the attached (completely untested). I did some tests. And after a lookup it looks like a good thing to book the timeline number at an earlier step and let other nodes know about it. So +1 on it. Looking at PrescanPreparedTransactions(), I am thinking as well that it would be better to get a hard failure when bumping on a corrupted 2PC file. Future files are one thing, but corrupted files should be treated more carefully. -- Michael
On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
 
 
On Thu, Dec 15, 2016 at 11:25:10AM +0200, Heikki Linnakangas wrote:
> On 12/15/2016 10:44 AM, Magnus Hagander wrote:
> > I wonder if there might be more corner cases like this, but in this
> > particular one it seems easy enough to just say that failing to rename
> > recovery.conf because it didn't exist is safe.
>
> Yeah. It's unexpected though, so I think erroring out is quite reasonable.
> If the recovery.conf file went missing, who knows what else is wrong.
i'd rather not mess up with the exiting behavior and just complain to the
pilot to not do that. This enters in the category of not removing the
backup_label file after taking a backup...
I'm happy to say that people shouldn't do that. However, we have a system that is unnecessarily fragile, by making something like that so easy to break. This could equally happen in other ways. For example, someone could accidentally provision a recovery.conf with the wrong permissions.
Reducing the fragility of such an important part of the system is a big improvement, IMO. Of course, we have to be extra careful when touching those parts.
Bottom line, I'm perfectly fine with failing in such a scenario. I'm not fine with leaving the user with a corrupt cluster if we can avoid it.
As for your comparison, we fixed the backup_label part by adding support for taking backups without using the fragile system. So it's not like we didn't recognize the problem.
> > But in the case of failing to rename recovery.conf for example because of
> > permissions errors, we don't want to ignore it. But we also really don't
> > want to end up with this kind of inconsistent data directory IMO. I don't
> > know that code well enough to suggest how to fix it though -- hoping for
> > input for someone who knows it closer?
>
> Hmm. Perhaps we should write the timeline history file only after renaming
> recovery.conf. In general, we should keep the window between writing the
> timeline history file and writing the end-of-recovery record as small as
> possible. I don't think we can eliminate it completely, but it makes sense
> to minimize it. Something like the attached (completely untested).
I haven't looked at the code either, but the reason is definitely solid - let's keep the time as short as possible. In particular, keeping it recoverable for things that are caused by humans, because they will keep making mistakes. And editing recovery.conf is a normal thing for a DBA to do (sure, not removing it -- but it's one of the few files in the data directory that the DBA is actually supposed to handle manually, so that increases the risk).
And also by doing things in an order that will at least make it less likely to end up corrupt if people manage to do it anyway.  
I did some tests. And after a lookup it looks like a good thing to book
the timeline number at an earlier step and let other nodes know about
it. So +1 on it.
Looking at PrescanPreparedTransactions(), I am thinking as well that it would
be better to get a hard failure when bumping on a corrupted 2PC file. Future
files are one thing, but corrupted files should be treated more carefully.
Again without looking at it, I agree (so much easier that way :P). Ignoring corruption is generally a bad idea. Failing hard makes the user notice the error, and makes it possible to initiate recovery from a standby or from backups or something, or to *intentionally* remove/clear/ignore it. 
On Sat, Dec 17, 2016 at 9:23 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> Looking at PrescanPreparedTransactions(), I am thinking as well that it >> would >> be better to get a hard failure when bumping on a corrupted 2PC file. >> Future >> files are one thing, but corrupted files should be treated more carefully. > > > Again without looking at it, I agree (so much easier that way :P). Ignoring > corruption is generally a bad idea. Failing hard makes the user notice the > error, and makes it possible to initiate recovery from a standby or from > backups or something, or to *intentionally* remove/clear/ignore it. And I am finishing with the two patches attached: - 0001 changes the 2PC checks so as corrupted entries are FATAL. PreScanPreparedTransaction is used when a hot standby is initialized. In this case a failure protects the range of XIDs generated, potentially saving from corruption of data. At the end of recovery, this is done before any on-disk actions are taken. - 0002 is the thing that Heikki has sent previously to minimize the window between end-of-recovery record write and timeline history file archiving. I am attaching that to next CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Tue, Dec 20, 2016 at 4:54 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > I am attaching that to next CF. Moved to CF 2017-03. Both patches still apply. -- Michael
On 12/20/16 2:54 AM, Michael Paquier wrote: > On Sat, Dec 17, 2016 at 9:23 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier <michael.paquier@gmail.com> >> wrote: >>> Looking at PrescanPreparedTransactions(), I am thinking as well that it >>> would >>> be better to get a hard failure when bumping on a corrupted 2PC file. >>> Future >>> files are one thing, but corrupted files should be treated more carefully. >> >> >> Again without looking at it, I agree (so much easier that way :P). Ignoring >> corruption is generally a bad idea. Failing hard makes the user notice the >> error, and makes it possible to initiate recovery from a standby or from >> backups or something, or to *intentionally* remove/clear/ignore it. > > And I am finishing with the two patches attached: > - 0001 changes the 2PC checks so as corrupted entries are FATAL. > PreScanPreparedTransaction is used when a hot standby is initialized. > In this case a failure protects the range of XIDs generated, > potentially saving from corruption of data. At the end of recovery, > this is done before any on-disk actions are taken. > - 0002 is the thing that Heikki has sent previously to minimize the > window between end-of-recovery record write and timeline history file > archiving. > > I am attaching that to next CF. This patch still applies cleanly and compiles at cccbdde. Any idea when you'll have a chance to review? -- -David david@pgmasters.net
From: David Steele [mailto:david@pgmasters.net] > Any idea when you'll have a chance to review? I'll do it by early next week. Regards Takayuki Tsunakawa
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> Moved to CF 2017-03. Both patches still apply.
Sorry to be late for reviewing this, but done now.  The patch applied, make check passed, and the code looks almost
good. I could successfully perform a simple archive recovery.  Finally, I broke the 2pc state file while the server is
down,and could confirm that the server failed to start as expected, emitting a FATAL message.  Worked nicely.
 
Just two cosmetic points:
(1)
Other places use "two-phase state file", not "two-phase file".
(2)
All other places in twophase.c and most places in other files put ereport() and errmsg() on separate lines.  I think it
wouldbe better to align with surrounding code.
 
+                ereport(FATAL, (errmsg("corrupted two-phase file \"%s\"",
Regards
Takayuki Tsunakawa
			
		"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> All other places in twophase.c and most places in other files put ereport() and errmsg() on separate lines.  I think
itwould be better to align with surrounding code. 
> +                ereport(FATAL, (errmsg("corrupted two-phase file \"%s\"",
Actually, the *real* problem with that coding is it lacks a SQLSTATE
(errcode call).  The only places where it's acceptable to leave that
out are for internal "can't happen" cases, which this surely isn't.
        regards, tom lane
			
		On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes: 
> All other places in twophase.c and most places in other files put ereport() and errmsg() on separate lines. I think it would be better to align with surrounding code.
> + ereport(FATAL, (errmsg("corrupted two-phase file \"%s\"",
Actually, the *real* problem with that coding is it lacks a SQLSTATE
(errcode call). The only places where it's acceptable to leave that
out are for internal "can't happen" cases, which this surely isn't.
Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't happen" case as well.
if (ControlFile->backupEndRequired)
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),
errhint("All WAL generated while online backup was taken must be available at recovery.")));
else if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),
errhint("Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery.")));
else
ereport(FATAL,
(errmsg("WAL ends before consistent recovery point")));
Should we consider fixing them?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually, the *real* problem with that coding is it lacks a SQLSTATE
>> (errcode call).  The only places where it's acceptable to leave that
>> out are for internal "can't happen" cases, which this surely isn't.
> Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
> happen" case as well.
> Should we consider fixing them?
Yup.  Just remember that the default is
XX000    E    ERRCODE_INTERNAL_ERROR              internal_error
If that's not how you want the error case reported, you need an errcode()
call.
We might need more ERRCODEs than are there now, if none of the existing
ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED
and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
example?
        regards, tom lane
			
		On Mon, Mar 27, 2017 at 11:26 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.
------
Alexander Korotkov
Postgres Professional: http://www.
From: pgsql-hackers-owner@postgresql.org 
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier 
> Moved to CF 2017-03. Both patches still apply.
Sorry to be late for reviewing this, but done now. The patch applied, make check passed, and the code looks almost good. I could successfully perform a simple archive recovery. Finally, I broke the 2pc state file while the server is down, and could confirm that the server failed to start as expected, emitting a FATAL message. Worked nicely.
I've tested moving recovery.conf away case which was originally reported by Magnus.
When I'm trying to promote standby when recovery.conf is moved away, I get FATAL error.
waiting for server to promote....2017-03-27 17:12:38.225 MSK [30307] LOG:  received promote request
2017-03-27 17:12:38.225 MSK [30311] FATAL:  terminating walreceiver process due to administrator command
2017-03-27 17:12:38.225 MSK [30307] LOG:  redo done at 0/3000028
2017-03-27 17:12:38.226 MSK [30307] LOG:  selected new timeline ID: 2
2017-03-27 17:12:38.253 MSK [30307] FATAL:  could not open file "recovery.conf": No such file or directory
2017-03-27 17:12:38.253 MSK [30306] LOG:  startup process (PID 30307) exited with exit code 1
2017-03-27 17:12:38.253 MSK [30306] LOG:  terminating any other active server processes
2017-03-27 17:12:38.256 MSK [30306] LOG:  database system is shut down
..............................
server is still promoting
If I try to start it after crash, it becomes a master on timeline 1.  Just like in case we deleted recovery.conf while server was shut down.
waiting for server to start....2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv6 address "::1", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv6 address "fe80::1%lo0", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv4 address "127.0.0.1", port 5430
2017-03-27 17:16:54.187 MSK [30413] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5430"
2017-03-27 17:16:54.195 MSK [30414] LOG:  database system was interrupted while in recovery at log time 2017-03-27 17:10:23 MSK
2017-03-27 17:16:54.195 MSK [30414] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2017-03-27 17:16:54.218 MSK [30414] LOG:  database system was not properly shut down; automatic recovery in progress
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo starts at 0/2000060
2017-03-27 17:16:54.219 MSK [30414] LOG:  consistent recovery state reached at 0/3000060
2017-03-27 17:16:54.219 MSK [30414] LOG:  invalid record length at 0/3000060: wanted 24, got 0
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo done at 0/3000028
2017-03-27 17:16:54.224 MSK [30413] LOG:  database system is ready to accept connections
 done
server started
# select pg_is_in_recovery();
 pg_is_in_recovery
───────────────────
 f
(1 row)
# select pg_walfile_name(pg_current_
     pg_walfile_name
──────────────────────────
 000000010000000000000003
(1 row)
If instead I put recovery back and start server, then streaming replication continues normally.
waiting for server to start....2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv6 address "::1", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv6 address "fe80::1%lo0", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv4 address "127.0.0.1", port 5430
2017-03-27 17:32:16.888 MSK [30783] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5430"
2017-03-27 17:32:16.897 MSK [30784] LOG:  database system was interrupted while in recovery at log time 2017-03-27 17:28:05 MSK
2017-03-27 17:32:16.897 MSK [30784] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2017-03-27 17:32:16.914 MSK [30784] LOG:  entering standby mode
2017-03-27 17:32:16.916 MSK [30784] LOG:  redo starts at 0/8000028
2017-03-27 17:32:16.916 MSK [30784] LOG:  consistent recovery state reached at 0/9000060
2017-03-27 17:32:16.916 MSK [30784] LOG:  invalid record length at 0/9000060: wanted 24, got 0
2017-03-27 17:32:16.916 MSK [30783] LOG:  database system is ready to accept read only connections
2017-03-27 17:32:16.920 MSK [30788] LOG:  started streaming WAL from primary at 0/9000000 on timeline 1
 done
server started
# select pg_is_in_recovery();
 pg_is_in_recovery
───────────────────
 t
(1 row)
Thus, I think patch is working as expected in this case.
Also, I'd like to notice that it passes check-world without warnings on my laptop.
------
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company 
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
> "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> > All other places in twophase.c and most places in other files put ereport()
> and errmsg() on separate lines.  I think it would be better to align with
> surrounding code.
> 
> > +                ereport(FATAL, (errmsg("corrupted
> two-phase file \"%s\"",
> 
> Actually, the *real* problem with that coding is it lacks a SQLSTATE (errcode
> call).  The only places where it's acceptable to leave that out are for
> internal "can't happen" cases, which this surely isn't.
Oh, I overlooked it.  But...
> Yup.  Just remember that the default is
> XX000    E    ERRCODE_INTERNAL_ERROR              internal_error
> 
> If that's not how you want the error case reported, you need an errcode()
> call.
> 
> We might need more ERRCODEs than are there now, if none of the existing
> ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED and
> ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
> example?
I'd be always happy if the error code is more specific, but maybe that would be a separate patch.  WAL corruption
messageso far doesn't accompany a specific error code like this in xlog.c:
 
        /*         * We only end up here without a message when XLogPageRead()         * failed - in that case we
alreadylogged something. In         * StandbyMode that only happens if we have been triggered, so we         *
shouldn'tloop anymore in that case.         */        if (errormsg)            ereport(emode_for_corrupt_record(emode,
                                          RecPtr ? RecPtr : EndRecPtr),            (errmsg_internal("%s", errormsg) /*
alreadytranslated */ ));
 
Regards
Takayuki Tsunakawa
			
		On Mon, Mar 27, 2017 at 11:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Actually, the *real* problem with that coding is it lacks a SQLSTATE >>> (errcode call). The only places where it's acceptable to leave that >>> out are for internal "can't happen" cases, which this surely isn't. > >> Surrounding code also has ereports lacking SQLSTATE. And that isn't "can't >> happen" case as well. >> Should we consider fixing them? > > Yup. Just remember that the default is > XX000 E ERRCODE_INTERNAL_ERROR internal_error > > If that's not how you want the error case reported, you need an errcode() > call. > > We might need more ERRCODEs than are there now, if none of the existing > ones seem to fit these cases. There's already ERRCODE_DATA_CORRUPTED > and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for > example? While I agree with that in general, we are taking about 2PC files that are on disk in patch 0001, so I would think that in this case ERRCODE_DATA_CORRUPTED is the most adapted (or ERRCODE_TWOPHASE_CORRUPTED?). The other WARNING messages refer to stale files of already committed transactions, which are not actually corrupted. What about in this case having a ERRCODE_TWOPHASE_INVALID? Updated patches are attached, I did not change the WARNING portion though as I am not sure what's the consensus on the matter. -- Michael
Вложения
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > While I agree with that in general, we are taking about 2PC files that are > on disk in patch 0001, so I would think that in this case > ERRCODE_DATA_CORRUPTED is the most adapted (or > ERRCODE_TWOPHASE_CORRUPTED?). > > The other WARNING messages refer to stale files of already committed > transactions, which are not actually corrupted. What about in this case > having a ERRCODE_TWOPHASE_INVALID? > > Updated patches are attached, I did not change the WARNING portion though > as I am not sure what's the consensus on the matter. > I get the impression that DATA_CORRUPTED means the table data is corrupted, because there's an error code named INDEX_CORRUPTED. Anyway, I don't think this patch needs to attach an error code because: * Currently, other interlal files (not tables or indexes) seem to use INTERNAL_ERROR (XX000). For example, see ReadControlFile()in xlog.c and pgstat_read_statsfiles() in pgstat.c. * It doesn't seem that the user needs distinction. I don't object to providing a specific error code for this case, butif the patch needs a specific error code to be committed, I'd like to know how that's useful (e.g. how it affects therecovery action the user takes.) So, I'd like to mark the patch as ready for committer when ereport() and errmsg() are on separate lines and the message changesto "two-phase state file." Regards Takayuki Tsunakawa
On Tue, Mar 28, 2017 at 1:33 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > I get the impression that DATA_CORRUPTED means the table data is corrupted, because there's an error code named INDEX_CORRUPTED. I have interpreted that as the other way around, aka DATA_CORRUPTED could be used as well to 2PC files :) But grepping around it seems that you are grabbing the meaning better than I do, ERRCODE_DATA_CORRUPTED is only used now for relation pages or large pages. > Anyway, I don't think this patch needs to attach an error code because: > * Currently, other interlal files (not tables or indexes) seem to use INTERNAL_ERROR (XX000). For example, see ReadControlFile()in xlog.c and pgstat_read_statsfiles() in pgstat.c. > * It doesn't seem that the user needs distinction. I don't object to providing a specific error code for this case, butif the patch needs a specific error code to be committed, I'd like to know how that's useful (e.g. how it affects therecovery action the user takes.) > So, I'd like to mark the patch as ready for committer when ereport() and errmsg() are on separate lines and the messagechanges to "two-phase state file." Okay. I got the message, and I agree with what you say here. You are right by the way, the error messages just use "two-phase file" and not "two-phase STATE file", missed that previously. -- Michael
Вложения
From: Michael Paquier [mailto:michael.paquier@gmail.com] > Okay. I got the message, and I agree with what you say here. You are right > by the way, the error messages just use "two-phase file" and not "two-phase > STATE file", missed that previously. > -- Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. Regards Takayuki Tsunakawa
On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote: > From: Michael Paquier [mailto:michael.paquier@gmail.com] >> Okay. I got the message, and I agree with what you say here. You are right >> by the way, the error messages just use "two-phase file" and not "two-phase >> STATE file", missed that previously. >> -- > Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. This bug has been moved to CF 2017-07. -- -David david@pgmasters.net
On Sat, Apr 8, 2017 at 10:05 AM, David Steele <david@pgmasters.net> wrote: > On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote: >> From: Michael Paquier [mailto:michael.paquier@gmail.com] >>> Okay. I got the message, and I agree with what you say here. You are right >>> by the way, the error messages just use "two-phase file" and not "two-phase >>> STATE file", missed that previously. >>> -- >> Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. > > This bug has been moved to CF 2017-07. This bug fix has been pending in "Ready for Committer" state for about 4.5 months. Three committers (Magnus, Heikki, Tom) have contributed to the thread to date. Maybe one of them would like to commit this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Apr 8, 2017 at 10:05 AM, David Steele <david@pgmasters.net> wrote: >> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote: >>> Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. >> >> This bug has been moved to CF 2017-07. > > This bug fix has been pending in "Ready for Committer" state for about > 4.5 months. Three committers (Magnus, Heikki, Tom) have contributed > to the thread to date. Maybe one of them would like to commit this? In the meantime its bits have begun to rot. Michael, could you please rebase? Thanks! -- Thomas Munro http://www.enterprisedb.com
On Fri, Sep 1, 2017 at 7:17 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Apr 8, 2017 at 10:05 AM, David Steele <david@pgmasters.net> wrote: >>> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote: >>>> Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. >>> >>> This bug has been moved to CF 2017-07. >> >> This bug fix has been pending in "Ready for Committer" state for about >> 4.5 months. Three committers (Magnus, Heikki, Tom) have contributed >> to the thread to date. Maybe one of them would like to commit this? > > In the meantime its bits have begun to rot. Michael, could you please rebase? Thanks for the reminder, Thomas. The 2PC code triggered during recovery has been largely reworked in PG10, explaining the conflicts. As this has been some time since I touched this patch, I had again a look at its logic and could not find any problems around it. So attached are a rebased versions for both 0001 and 0002, I have updated the messages as well to be more in-line with what is in HEAD for corrupted entries. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
> On 02 Sep 2017, at 14:17, Michael Paquier <michael.paquier@gmail.com> wrote: > > On Fri, Sep 1, 2017 at 7:17 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Sat, Aug 26, 2017 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Sat, Apr 8, 2017 at 10:05 AM, David Steele <david@pgmasters.net> wrote: >>>> On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote: >>>>> Thanks, marked as ready for committer, as the code is good and Alexander reported the test success. >>>> >>>> This bug has been moved to CF 2017-07. >>> >>> This bug fix has been pending in "Ready for Committer" state for about >>> 4.5 months. Three committers (Magnus, Heikki, Tom) have contributed >>> to the thread to date. Maybe one of them would like to commit this? >> >> In the meantime its bits have begun to rot. Michael, could you please rebase? > > Thanks for the reminder, Thomas. The 2PC code triggered during > recovery has been largely reworked in PG10, explaining the conflicts. > As this has been some time since I touched this patch, I had again a > look at its logic and could not find any problems around it. So > attached are a rebased versions for both 0001 and 0002, I have updated > the messages as well to be more in-line with what is in HEAD for > corrupted entries. I’ve moved this to the next CF, but since this no longer applies cleanly I’ve reset it to Waiting for author. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 2, 2017 at 8:13 AM, Daniel Gustafsson <daniel@yesql.se> wrote: > I’ve moved this to the next CF, but since this no longer applies cleanly I’ve > reset it to Waiting for author. Thanks Daniel for the reminder. Attached are rebased patches. This thing rots easily... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Mon, Oct 2, 2017 at 3:29 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Oct 2, 2017 at 8:13 AM, Daniel Gustafsson <daniel@yesql.se> wrote: >> I’ve moved this to the next CF, but since this no longer applies cleanly I’ve >> reset it to Waiting for author. > > Thanks Daniel for the reminder. Attached are rebased patches. This > thing rots easily... This set of patches still applies, and is marked as ready for committer. Are any of the committers cited on this thread, aka Magnus, Heikki, Tom interested in this patch set? Or not? We are close to the end of CF 2017-11, so I am bumping it to the next one. -- Michael
Magnus, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Mon, Oct 2, 2017 at 3:29 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Mon, Oct 2, 2017 at 8:13 AM, Daniel Gustafsson <daniel@yesql.se> wrote: > >> I’ve moved this to the next CF, but since this no longer applies cleanly I’ve > >> reset it to Waiting for author. > > > > Thanks Daniel for the reminder. Attached are rebased patches. This > > thing rots easily... > > This set of patches still applies, and is marked as ready for > committer. Are any of the committers cited on this thread, aka Magnus, > Heikki, Tom interested in this patch set? Or not? We are close to the > end of CF 2017-11, so I am bumping it to the next one. Magnus, this was your thread to begin with, though I know others have been involved, any chance you'll be able to review this for commit during this CF? I agree that this is certainly a good thing to have too, though I've not looked at the patch itself in depth. Is there anything we can do to help move it along? Appears to compile and pass make check-world still (not sure why Thomas' bot currently thinks the build fails, since it worked here..). Thanks! Stephen
Вложения
On Fri, Jan 12, 2018 at 4:35 PM, Stephen Frost <sfrost@snowman.net> wrote: > Appears to compile and pass make check-world still (not sure why Thomas' > bot currently thinks the build fails, since it worked here..). It looks good now. There was a brief window when all Travis CI builds said this while updating various packages at startup: W: GPG error: http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release: The following signatures were invalid: KEYEXPIRED 1515625755 W: The repository 'http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release' is not signed. That's because the apt.sources happens to have that repository in it. I didn't look into it because it fixed itself at the next rebuild. I speculate that MongoDB's package repository is eventually consistent. -- Thomas Munro http://www.enterprisedb.com
On Thu, Jan 11, 2018 at 10:35:22PM -0500, Stephen Frost wrote: > Magnus, this was your thread to begin with, though I know others have > been involved, any chance you'll be able to review this for commit > during this CF? I agree that this is certainly a good thing to have > too, though I've not looked at the patch itself in depth. Is there > anything we can do to help move it along? As an effort to move on with bug items in the commit fest, attached are two patches with a proposed commit message as well as polished comments Those are proposed for a back-patched. The 2PC issue is particularly bad in my opinion because having any 2PC file on-disk and corrupted means that a transaction is lost. I have been playing a bit with hexedit and changed a couple of bytes in one of them... If trying to use a base backup which includes one, then the standby reading it would be similarly confused. -- Michael
Вложения
On Thu, Jun 28, 2018 at 1:39 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jan 11, 2018 at 10:35:22PM -0500, Stephen Frost wrote: >> Magnus, this was your thread to begin with, though I know others have >> been involved, any chance you'll be able to review this for commit >> during this CF? I agree that this is certainly a good thing to have >> too, though I've not looked at the patch itself in depth. Is there >> anything we can do to help move it along? > > As an effort to move on with bug items in the commit fest, attached are > two patches with a proposed commit message as well as polished comments > Those are proposed for a back-patched. The 2PC issue is particularly > bad in my opinion because having any 2PC file on-disk and corrupted > means that a transaction is lost. I have been playing a bit with > hexedit and changed a couple of bytes in one of them... If trying to > use a base backup which includes one, then the standby reading it would > be similarly confused. Thanks to Michael for progressing this. Back in August, nearly a year ago, Robert Haas said upthread: > This bug fix has been pending in "Ready for Committer" state for about > 4.5 months. Three committers (Magnus, Heikki, Tom) have contributed > to the thread to date. Maybe one of them would like to commit this? Although the state is now back to "Needs Review", I echo those sentiments. This issue has now been hanging around for about 18 months. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Robert Haas wrote: > Although the state is now back to "Needs Review", I echo those > sentiments. This issue has now been hanging around for about 18 > months. For what it's worth, I volunteer to finish the work :) The 2PC patch is really simple, and fixes a data loss issue. The second patch has been looked up by Heikki, Magnus and me at least once by each, and there is visibly an agreement on having it. Having reviews after a new patch version is sent, by somebody else than the one who sent the patches is of course always nice.. -- Michael
Вложения
On 07/06/2018 07:18 AM, Michael Paquier wrote: > Robert Haas wrote: >> Although the state is now back to "Needs Review", I echo those >> sentiments. This issue has now been hanging around for about 18 >> months. No, those are my words, not Robert's :-) > For what it's worth, I volunteer to finish the work :) > > The 2PC patch is really simple, and fixes a data loss issue. The second > patch has been looked up by Heikki, Magnus and me at least once by each, > and there is visibly an agreement on having it. Having reviews after > a new patch version is sent, by somebody else than the one who sent the > patches is of course always nice.. If you're comfortable committing it then go for it. It will be good to have the CF item resolved. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 06, 2018 at 08:05:50AM -0400, Andrew Dunstan wrote: > On 07/06/2018 07:18 AM, Michael Paquier wrote: > > For what it's worth, I volunteer to finish the work :) > > > > The 2PC patch is really simple, and fixes a data loss issue. The second > > patch has been looked up by Heikki, Magnus and me at least once by each, > > and there is visibly an agreement on having it. Having reviews after > > a new patch version is sent, by somebody else than the one who sent the > > patches is of course always nice.. > > If you're comfortable committing it then go for it. It will be good to have > the CF item resolved. Sure. I think I can finish the set on Monday JST then. -- Michael
Вложения
On Fri, Jul 06, 2018 at 09:09:27PM +0900, Michael Paquier wrote: > Sure. I think I can finish the set on Monday JST then. So, I have been able to back-patch things down to 9.5, but further down I am not really convinced that it is worth meddling with that, particularly in light of 7cbee7c which has reworked the way partial segments on older timelines are handled at the end of promotion. The portability issues I have found is related to the timeline number exitArchiveRecovery uses which comes for the WAL reader hence this gets set to the timeline from the last page read by the startup process. This can actually cause quite a bit of trouble at the end of recovery as we would get a failure when trying to copy the last segment from the old timeline to the new timeline. Well, it could be possible to fix things properly by roughly back-porting 7cbee7c down to REL9_4_STABLE but that's not really worth the risk, and moving exitArchiveRecovery() and PrescanPreparedTransactions() around is a straight-forward move with 9.5~ thanks to this commit. I have also done nothing yet for the detection of corrupted 2PC files which get ignored at recovery. While looking again at the patch I sent upthread, the thing was actually missing some more error handling in ReadTwoPhaseFile(). In particular, with the proposed patch we would still not report an error if a 2PC file cannot be opened because of for example EPERM. In FinishPreparedTransaction, one would example get a simply a *crash* with no hints about what happened. I have a patch for that which still needs some polishing, and I will start a new thread on the matter. -- Michael