Re: Cluster seems broken after pg_basebackup

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Cluster seems broken after pg_basebackup
Дата
Msg-id 54D551EF.3090406@aklaver.com
обсуждение исходный текст
Ответ на Re: Cluster seems broken after pg_basebackup  (Guillaume Drolet <droletguillaume@gmail.com>)
Список pgsql-general
On 02/06/2015 09:17 AM, Guillaume Drolet wrote:
> Dear Adrian,
>
> Thanks for helping me. Sorry for the lack of details, I had said to
> myself I had to not forget to give these details but I hit the send
> button too fast. You know how it is...
>
> I added more info in your reply below.
>
>
>     First some questions:
>
>     1) What Postgres version?
>
>
> 9.3
>

> Windows 7
>
>
>     3) Where were you backing up from and to?
>
>
> Backing up from my only cluster (PGDATA) on disk E, to a backup
> directory on an other disk (F:) using this command:
>
> pg_basebackup -D "F:\\db_base_backup" -Fp -Xs  -R -P
> --label="basebackup20150205" --username=postgres
>
> What's weird is that I did some successful tests last week on the same
> system (backing up, archiving, recovering) using the same procedure.
> Only difference was the cluster, which was much smaller for testing
> purposes, but located at the same place (i.e. E:\data) and PostgresSQL
> installed in C:\Programs\...
>
>
>     4) Which cluster does not start, the master or the child you created
>     with pg_basebackup?
>
>
>
> The master. I haven't tried the child yet. But I saw that the message
> about role "208375PT$" is in logs from before the backup too.
>

> This is the local domain of my machine. I log onto my machine with a
> local admin account and using domain name 208375PT (I didn't set this
> part of my machine, the IT guys here at work did). The thing is: I don't
> understand why it's there in the log file??

Not sure.

What are you using for an authentication method for database login?

>

>         And after that, I went back to the log file and there's new
>         information
>         added:
>
>         2015-02-06 07:51:05 EST LOG:  processus serveur (PID 184) a été
>         arrêté
>         par l'exception 0x80000004
>         2015-02-06 07:51:05 EST DÉTAIL:  Le processus qui a échoué
>         exécutait :
>         SELECT version();
>         2015-02-06 07:51:05 EST ASTUCE :  Voir le fichier d'en-tête C «
>         ntstatus.h » pour une description de la valeur
>               hexadécimale.
>
>
>     Well according to here:
>
>     https://msdn.microsoft.com/en-__us/library/cc704588.aspx
>     <https://msdn.microsoft.com/en-us/library/cc704588.aspx>
>
>     0x80000004
>     STATUS_SINGLE_STEP
>
>
>     {EXCEPTION} Single Step A single step or trace operation has just
>     been completed.
>
>     A developer is going to have explain what that means.
>

>
>
>     My suspicion is you copied at least partly over a running server.
>
>
> How would that be possible? Using the pg_basebackup command I wrote
> above, it is clear that I wrote the backup on disk F and not E.

I was just speculating, I would not put too much stock in it.

>
> While writing this post, I started my backup using:
>
> pg_ctl start -D "F:\db_basebackup"
>
> Similar stuff happened with pgAdmin and the log (message about symbolic
> link is related to my post from yesterday. I don't know if this could be
> involved in the current problem):
>
> 2015-02-06 12:13:58 EST LOG:  le système de bases de données a été
> interrompu ; dernier lancement connu à 2015-02-05 14:30:34 EST
> 2015-02-06 12:13:58 EST LOG:  création du répertoire manquant «
> pg_xlog/archive_status » pour les journaux de transactions
> 2015-02-06 12:13:58 EST LOG:  la ré-exécution commence à 24B/28000090
> 2015-02-06 12:13:58 EST LOG:  n'a pas pu supprimer le lien symbolique «
> pg_tblspc/940585 » : No such file or directory
> 2015-02-06 12:13:58 EST CONTEXTE :  xlog redo drop tablespace: 940585
> 2015-02-06 12:13:58 EST LOG:  état de restauration cohérent atteint à
> 24B/290000B8
> 2015-02-06 12:13:58 EST LOG:  ré-exécution faite à 24B/290000B8
> 2015-02-06 12:13:58 EST LOG:  la dernière transaction a eu lieu à
> 2015-02-05 09:06:04.892-05 (moment de la journalisation)
> 2015-02-06 12:13:59 EST LOG:  le système de bases de données est prêt
> pour accepter les connexions
> 2015-02-06 12:13:59 EST LOG:  lancement du processus autovacuum
> 2015-02-06 12:14:42 EST LOG:  processus serveur (PID 1784) a été arrêté
> par l'exception 0x80000004
> 2015-02-06 12:14:42 EST DÉTAIL:  Le processus qui a échoué exécutait :
> SELECT version();
> 2015-02-06 12:14:42 EST ASTUCE :  Voir le fichier d'en-tête C «
> ntstatus.h » pour une description de la valeur
>      hexadécimale.
> 2015-02-06 12:14:42 EST LOG:  arrêt des autres processus serveur actifs
> 2015-02-06 12:14:42 EST ATTENTION:  arrêt de la connexion à cause de
> l'arrêt brutal d'un autre processus serveur
> 2015-02-06 12:14:42 EST DÉTAIL:  Le postmaster a commandé à ce processus
> serveur d'annuler la transaction
>      courante et de quitter car un autre processus serveur a quitté
> anormalement
>      et qu'il existe probablement de la mémoire partagée corrompue.
> 2015-02-06 12:14:42 EST ASTUCE :  Dans un moment, vous devriez être
> capable de vous reconnecter à la base de
>      données et de relancer votre commande.
> 2015-02-06 12:14:42 EST LOG:  tous les processus serveur se sont
> arrêtés, réinitialisation
>
>
> Any ideas where to go from here?

In both cases the database got to the point below, which would seem to
indicate everything was alright.

2015-02-06 7:11:38 ET LOG: the re-execution is not required
2015-02-06 7:11:38 ET LOG: the database system is ready for
accept connections

Also from what I can see the server crashed at this point:

2015-02-06 12:13:59 LOG IS: launch autovacuum processes
2015-02-06 12:14:42 EST LOG: server process (PID 1784) was arrested by
the exception 0x80000004


Now 0x80000004 is supposed to mean:

STATUS_SINGLE_STEP


{EXCEPTION} Single Step A single step or trace operation has just been
completed.

Some digging indicates this is the result of debugger command. Have no
idea how that would invoked in Postgres running production code. This
leads to my default question when I see unexplained behavior on a
Windows machine; do you have anti-virus machine running against the drives?

>
> Thanks a lot again.
>
>
>         Thanks a lot for helping! Guillaume
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Jerry Sievers
Дата:
Сообщение: Re: Temporarily suspend a user account?
Следующее
От: Elijah Zupancic
Дата:
Сообщение: Fwd: [BUGS] pg_dump search path issue