Re: [GENERAL] Recovery Assistance

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: [GENERAL] Recovery Assistance
Дата
Msg-id d1ad4bc1-4497-a4ba-0616-3b571bfa8b1b@aklaver.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Recovery Assistance  (Brian Mills <brian@trybooking.com>)
Ответы Re: [GENERAL] Recovery Assistance  (Brian Mills <brian@trybooking.com>)
Список pgsql-general
On 01/27/2017 05:40 PM, Brian Mills wrote:
> First of all, Thank you for your time to assist me learning. I really
> appreciate it.
>
> root# ps ax | grep post
>  1364 ?        Ss     0:00 /usr/lib/postfix/master
>  5198 pts/3    S      0:00 su postgres
>  5221 pts/3    S      0:00 /usr/lib/postgresql/9.3/bin/postgres -D
> /etc/postgresql/9.3/main
>  5222 ?        Ss     0:10 postgres: startup process   recovering
> 0000000100000005000000A3
>  7930 pts/4    S+     0:00 grep --color=auto post

So if you check back does the recovering XXXXXXXX part change?

If so Postgres is walking through the WAL files as it should.

>
> Its a single machine postgres database server, so I'm assuming there is
> no cluster log. If there is one, where would I look for it?
> The only log in /var/log/postgres is postgresql-9.3-main.log

That would be it. A single Postgres instance has multiple databases in
it, by default it starts with template0, template1 and postgres
databases. Then add whatever databases you created and you have a
cluster of databases.

> which shows (tail):
>
> 2017-01-27 20:27:01 AEDT LOG:  database system was shut down at
> 2017-01-27 20:26:29 AEDT
> 2017-01-27 20:27:01 AEDT LOG:  MultiXact member wraparound protections
> are now enabled
> 2017-01-27 20:27:01 AEDT LOG:  autovacuum launcher started
> 2017-01-27 20:27:01 AEDT LOG:  database system is ready to accept
> connections
> 2017-01-27 20:27:02 AEDT LOG:  incomplete startup packet
> 2017-01-27 20:28:54 AEDT ERROR:  unexpected chunk size 104 (expected
> 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
> 2017-01-27 20:28:54 AEDT STATEMENT:  COPY public.bodycontent
> (bodycontentid, body, contentid, bodytypeid) TO stdout;
> 2017-01-27 20:30:13 AEDT LOG:  received fast shutdown request
> 2017-01-27 20:30:13 AEDT LOG:  aborting any active transactions
> 2017-01-27 20:30:13 AEDT LOG:  autovacuum launcher shutting down
> 2017-01-27 20:30:13 AEDT LOG:  shutting down
> 2017-01-27 20:30:13 AEDT LOG:  database system is shut down

That would be your Attempt 1 log.
>
> I ran the screen utility so I could leave the db started using
> the pg_ctl command. The later logs in that session have not progressed,
> its last entry is still
> 2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up
> Which is still later datetime than the /var/log/postgres... log.

So it is just logging to stdout and not to the log file.

>
> Connection attempt shows the same log
> postgres@atlassian:/home/myuser$ psql
> psql: FATAL:  the database system is starting up
>
> Nothing in the syslog that seems connected.
>
> *Brian Mills*
> CTO
>
>
> *Mob: *0410660003 <tel:0410660003>
> *Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
> <tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
> <tel:02%208064%203600> *|*  *Brisbane* 07 3173 1570 <tel:07%203173%201570>
> Level 1 *|*  600 Chapel Street *|* South
> Yarra*|*  VIC *|*  3141 *|*  Australia
>
> <https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
>
> On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 01/27/2017 01:31 PM, Brian Mills wrote:
>
>         Hi,
>
>         I have a Atlassian Confluence Wiki that depends on postgres, but I
>         haven't much experience with postgres other than for this purpose.
>
>         A few days ago, the hard disk filled, so all services stopped
>         working.
>         When the admin realised this he increased the disk size (its in
>         a cloud,
>         so that was easy to do) however I think the way it shutdown left
>         Postgres in an inconsistent state for some reason.
>         Now when I start it up I get this message in the log over and
>         over again:
>         "FATAL:  the database system is starting up"
>
>         I have a backup, which I have successfully recovered, but it is
>         24 hours
>         old, the next backup was the cause of the disk filling. So I'm using
>         this as exercise in learning a bit more about postgres.
>
>         I did some research and found a number of options. I took a file
>         level
>         backup with the service not running then tried 2 things. I
>         haven't found
>         much else on what to do though.
>
>         *Attempt 1 - Reset Log *
>
>         It sounded like this shouldn't be my first option (it wasn't)
>         but it did
>         sound like what I needed to do.
>         I ran this command
>         ./pg_resetxlog /var/lib/postgresql/9.3/main -f
>         It worked a treat, the database did startup ok.
>         However when I tried to dump the DB:
>         root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
>         $now-confluencedb.sql
>         pg_dump: Dumping the contents of table "bodycontent" failed:
>         PQgetResult() failed.
>         pg_dump: Error message from server: ERROR:  unexpected chunk
>         size 104
>         (expected 1996) in chunk 3 of 22 for toast value 48862 in
>         pg_toast_20028
>         pg_dump: The command was: COPY public.bodycontent
>         (bodycontentid, body,
>         contentid, bodytypeid) TO stdout;
>         The dump failed, so I assume this did leave my database in an
>         inconsistent state.
>
>
>         *Attempt 2 -  startup manually and let it try recovery*
>
>         I restored my file level backup and started again.
>         This time I tried to startup manually on the command line to see the
>         output (I'd done it as a service startup a number of times to
>         nearly the
>         same effect too)
>
>         postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
>         /etc/postgresql/9.3/main start
>         server starting
>         postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
>         20:36:08 AEDT
>         LOG:  database system was interrupted while in recovery at
>         2017-01-27
>         20:13:26 AEDT
>         2017-01-27 20:36:08 AEDT HINT:  This probably means that some
>         data is
>         corrupted and you will have to use the last backup for recovery.
>         2017-01-27 20:36:08 AEDT LOG:  database system was not properly shut
>         down; automatic recovery in progress
>         2017-01-27 20:36:08 AEDT LOG:  redo starts at 5/528B4558
>         2017-01-27 20:36:18 AEDT LOG:  redo done at 5/A3FFF9E8
>         2017-01-27 20:36:18 AEDT LOG:  last completed transaction was at log
>         time 2017-01-24 02:08:00.023064+11
>         2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up
>         2017-01-27 23:00:01 AEDT FATAL:  the database system is starting up
>
>
>     What does ps ax | grep post show?
>
>     Is the cluster set up to log to a file, if so what does it show?
>
>     Does the system log show anything relevant?
>
>
>         I've left it that way for 12 hours, and its still not allowing
>         connections.
>         I assume its doing some kind of consistency check?
>
>
>     What does it say when you attempt a connection?
>
>
>
>         Does anyone have any suggestions on what I should be doing to
>         try and
>         restore this database?
>
>         - The amount of change is minimal in the DB after 6pm it should be
>         basically no change overnight.
>         - The log above seems to suggest it has completed a redo ok, is
>         that right?
>         - The last completed transaction time 2017-01-24
>         02:08:00.023064+11 the
>         log mentions would be fine to use, so loosing even a few hours
>         before
>         that would be more than adequate.
>
>         I'm just not clear what the database is doing now, or how I
>         should be
>         trying to recover it.
>
>         Any help anyone can suggest would be great! I've given myself this
>         weekend to attempt to recover more than the last backup, after
>         that I
>         need to restore the service for my team to use and suck up the
>         lost last
>         day of updates.
>
>         Thanks,
>         Brian
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: bym@byeh.net
Дата:
Сообщение: [GENERAL] WindowAgg optimized out of subquery but not out of CTE, generating different results.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] WindowAgg optimized out of subquery but not out of CTE, generating different results.