Re: Cluster seems broken after pg_basebackup
От | Adrian Klaver |
---|---|
Тема | Re: Cluster seems broken after pg_basebackup |
Дата | |
Msg-id | 54D91170.1000809@aklaver.com обсуждение исходный текст |
Ответ на | Cluster seems broken after pg_basebackup (Guillaume Drolet <droletguillaume@gmail.com>) |
Список | pgsql-general |
On 02/09/2015 08:34 AM, Guillaume Drolet wrote: CCing list so the information stays in the thread. > > > 2015-02-06 18:44 GMT-05:00 Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>: > > On 02/06/2015 09:17 AM, Guillaume Drolet wrote: > > Dear Adrian, > > Thanks for helping me. Sorry for the lack of details, I had said to > myself I had to not forget to give these details but I hit the send > button too fast. You know how it is... > > I added more info in your reply below. > > > First some questions: > > 1) What Postgres version? > > > 9.3 > > > Windows 7 > > > 3) Where were you backing up from and to? > > > Backing up from my only cluster (PGDATA) on disk E, to a backup > directory on an other disk (F:) using this command: > > pg_basebackup -D "F:\\db_base_backup" -Fp -Xs -R -P > --label="basebackup20150205" --username=postgres > > What's weird is that I did some successful tests last week on > the same > system (backing up, archiving, recovering) using the same procedure. > Only difference was the cluster, which was much smaller for testing > purposes, but located at the same place (i.e. E:\data) and > PostgresSQL > installed in C:\Programs\... > > > 4) Which cluster does not start, the master or the child > you created > with pg_basebackup? > > > > The master. I haven't tried the child yet. But I saw that the > message > about role "208375PT$" is in logs from before the backup too. > > > This is the local domain of my machine. I log onto my machine with a > local admin account and using domain name 208375PT (I didn't set > this > part of my machine, the IT guys here at work did). The thing is: > I don't > understand why it's there in the log file?? > > > Not sure. > > What are you using for an authentication method for database login? > At this moment, for my tests I use md5 for user 'postgres' and trust for user 'all'. > > > > > And after that, I went back to the log file and there's new > information > added: > > 2015-02-06 07:51:05 EST LOG: processus serveur (PID > 184) a été > arrêté > par l'exception 0x80000004 > 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué > exécutait : > SELECT version(); > 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier > d'en-tête C « > ntstatus.h » pour une description de la valeur > hexadécimale. > > > Well according to here: > > https://msdn.microsoft.com/en-____us/library/cc704588.aspx > <https://msdn.microsoft.com/en-__us/library/cc704588.aspx> > <https://msdn.microsoft.com/__en-us/library/cc704588.aspx > <https://msdn.microsoft.com/en-us/library/cc704588.aspx>> > > 0x80000004 > STATUS_SINGLE_STEP > > > {EXCEPTION} Single Step A single step or trace operation > has just > been completed. > > A developer is going to have explain what that means. > > > > > My suspicion is you copied at least partly over a running > server. > > > How would that be possible? Using the pg_basebackup command I wrote > above, it is clear that I wrote the backup on disk F and not E. > > > I was just speculating, I would not put too much stock in it. > > > > While writing this post, I started my backup using: > > pg_ctl start -D "F:\db_basebackup" > > Similar stuff happened with pgAdmin and the log (message about > symbolic > link is related to my post from yesterday. I don't know if this > could be > involved in the current problem): > > 2015-02-06 12:13:58 EST LOG: le système de bases de données a été > interrompu ; dernier lancement connu à 2015-02-05 14:30:34 EST > 2015-02-06 12:13:58 EST LOG: création du répertoire manquant « > pg_xlog/archive_status » pour les journaux de transactions > 2015-02-06 12:13:58 EST LOG: la ré-exécution commence à > 24B/28000090 > 2015-02-06 12:13:58 EST LOG: n'a pas pu supprimer le lien > symbolique « > pg_tblspc/940585 » : No such file or directory > 2015-02-06 12:13:58 EST CONTEXTE : xlog redo drop tablespace: > 940585 > 2015-02-06 12:13:58 EST LOG: état de restauration cohérent > atteint à > 24B/290000B8 > 2015-02-06 12:13:58 EST LOG: ré-exécution faite à 24B/290000B8 > 2015-02-06 12:13:58 EST LOG: la dernière transaction a eu lieu à > 2015-02-05 09:06:04.892-05 (moment de la journalisation) > 2015-02-06 12:13:59 EST LOG: le système de bases de données est > prêt > pour accepter les connexions > 2015-02-06 12:13:59 EST LOG: lancement du processus autovacuum > 2015-02-06 12:14:42 EST LOG: processus serveur (PID 1784) a été > arrêté > par l'exception 0x80000004 > 2015-02-06 12:14:42 EST DÉTAIL: Le processus qui a échoué > exécutait : > SELECT version(); > 2015-02-06 12:14:42 EST ASTUCE : Voir le fichier d'en-tête C « > ntstatus.h » pour une description de la valeur > hexadécimale. > 2015-02-06 12:14:42 EST LOG: arrêt des autres processus serveur > actifs > 2015-02-06 12:14:42 EST ATTENTION: arrêt de la connexion à cause de > l'arrêt brutal d'un autre processus serveur > 2015-02-06 12:14:42 EST DÉTAIL: Le postmaster a commandé à ce > processus > serveur d'annuler la transaction > courante et de quitter car un autre processus serveur a quitté > anormalement > et qu'il existe probablement de la mémoire partagée corrompue. > 2015-02-06 12:14:42 EST ASTUCE : Dans un moment, vous devriez être > capable de vous reconnecter à la base de > données et de relancer votre commande. > 2015-02-06 12:14:42 EST LOG: tous les processus serveur se sont > arrêtés, réinitialisation > > > Any ideas where to go from here? > > > In both cases the database got to the point below, which would seem > to indicate everything was alright. > > 2015-02-06 7:11:38 ET LOG: the re-execution is not required > 2015-02-06 7:11:38 ET LOG: the database system is ready for > accept connections > > Also from what I can see the server crashed at this point: > > 2015-02-06 12:13:59 LOG IS: launch autovacuum processes > 2015-02-06 12:14:42 EST LOG: server process (PID 1784) was arrested > by the exception 0x80000004 > > > Now 0x80000004 is supposed to mean: > > STATUS_SINGLE_STEP > > > {EXCEPTION} Single Step A single step or trace operation has just > been completed. > > Some digging indicates this is the result of debugger command. Have > no idea how that would invoked in Postgres running production code. > This leads to my default question when I see unexplained behavior on > a Windows machine; do you have anti-virus machine running against > the drives? > > Yes I do and I'm not allowed to turn it off (I don't have such privileges). But the anti-virus software is running on my other machine (same setup) and I've never had such problems. Even on this machine that's giving me problems, I spent the two last weeks making tests with point-in-time-recovery and everything went fine. > > > > Thanks a lot again. > > > Thanks a lot for helping! Guillaume > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.__com > <mailto:adrian.klaver@aklaver.com>> > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > -- Adrian Klaver adrian.klaver@aklaver.com
В списке pgsql-general по дате отправления: