Re: postgres crash SOS

Поиск
Список
Период
Сортировка
От Felde Norbert
Тема Re: postgres crash SOS
Дата
Msg-id AANLkTikAGQWmafXx8Jl8WZgj4dLOFzL4jnld22Ta4vbQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: postgres crash SOS  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: postgres crash SOS  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-general
Hi,

This are the informations I could collect:


We use cobian to create the backup.
There are two volumes in use, on C is the volume where everything is
installed and here is the postgres data dir too.
The postgres backup that runs everynight places the backup file on
this volume too, it runs before daily backup is started.
There is an another volume where the cobian places the daily backups.

So to be precise:
C:
    postgres
    postgres\data
    postgres dump before daily backup is started
D:
    daily backups including postgres dump from C

The D volume was full on the 06-06 and stayd so for 5 days.

The first virtual memory log entry happend on the 06-09 05:41 and the
last came 06-10 16:18
The log entries are about the same:
Windows successfully diagnosed a low virtual memory condition.
The following programs consumed the most virtual memory:
cbService.exe (2348) consumed 2058158080 bytes,
explorer.exe (7136) consumed 245456896 bytes,
and McScript_InUse.exe (1908) consumed 218529792 bytes


In the postgres log at that time is this:
Postgres log
2010-06-10 16:58:14 LOG:  database system was interrupted at 2010-06-10 16:16:36
2010-06-10 16:58:14 LOG:  checkpoint record is at 0/9FBE5158
2010-06-10 16:58:14 LOG:  redo record is at 0/9FBE5158; undo record is
at 0/0; shutdown FALSE
2010-06-10 16:58:14 LOG:  next transaction ID: 0/3620193; next OID: 6744703
2010-06-10 16:58:14 LOG:  next MultiXactId: 2; next MultiXactOffset: 3
2010-06-10 16:58:14 LOG:  database system was not properly shut down;
automatic recovery in progress
2010-06-10 16:58:14 LOG:  redo starts at 0/9FBE51A8
2010-06-10 16:58:14 FATAL:  the database system is starting up
2010-06-10 16:58:14 LOG:  record with zero length at 0/9FEEDF60
2010-06-10 16:58:14 LOG:  redo done at 0/9FEEDF30
2010-06-10 16:58:15 FATAL:  the database system is starting up
2010-06-10 16:58:16 FATAL:  the database system is starting up
2010-06-10 16:58:17 FATAL:  the database system is starting up
2010-06-10 16:58:17 LOG:  database system is ready
Before this I can not find any interesting entries in the postgres log.


The first postgres backup that failed was on 06-11 00:30. The log is
filled with that message:
2010-06-11 00:31:19 ERROR:  xlog flush request 0/9FF74848 is not
satisfied --- flushed only to 0/9FEEDFB0
2010-06-11 00:31:19 CONTEXT:  writing block 17942 of relation
1663/4192208/4192534
2010-06-11 00:31:19 STATEMENT:  FETCH 100 FROM _pg_dump_cursor.
This message appears in 1 sec intervals and only the writing blocks
blocknumber changes.



About the informations you asked:
There are 2 SCSI drives and they are mirrored using windows mirroring.
As I could find out, the mirroring is done with default settings.
The fsync settings are the default.

fenor



2010/6/17 Merlin Moncure <mmoncure@gmail.com>:
> On Thu, Jun 17, 2010 at 4:51 PM, Felde Norbert <fenor77@gmail.com> wrote:
>> The first error message was what I got after postgres crashed and I
>> tried to make a dump, run vacuum or tried somthing else.
>> The second message I got when I tried to repaire the problem, so it
>> dous not matter because I did something wrong i see.
>>
>> If I could choose I would use a linux server too, but if the partner
>> sais there is a windows server and you have to use that than there is
>> no discuss.
>>
>> Why I was not specific how to this state came is I do not know.
>> I could not find anything about a power failer and disk space seemed
>> to be more than needed. There was entries in log for full virtual
>> memory.
>
> This came before the crash?  Are you sure the server didn't reset
> following the virtual memory full?
>
> Memory full is a very dangerous condition for a database server and
> may have contributed to your problem or been a symptom of another
> problem.  The main things we need to know (any data corruption issue
> is worth trying to diagnose after the fact) are:
>
> *) what is the setting for fsync?
> *) Are you using a raid controller?  how is the cache configured?
> *) If not, is your drive configured to buffer writes?
> *) How much free space is left on your various volumes on the computer?
>
> Did you check the system event log for interesting events at or around
> the time you saw virtual memory full.  Can we see the log message
> reporting memory full condition as well as surrounding messages?
>
> merlin
>

В списке pgsql-general по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: postgres crash SOS
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: UUID/GUID