Re: [BUGS] Crash observed during the start of the Postgres process

Поиск
Список
Период
Сортировка
От K S, Sandhya (Nokia - IN/Bangalore)
Тема Re: [BUGS] Crash observed during the start of the Postgres process
Дата
Msg-id AM5PR0701MB26425A2C55EA46164D5EDEDFD6E20@AM5PR0701MB2642.eurprd07.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [BUGS] Crash observed during the start of the Postgres process  ("K S, Sandhya (Nokia - IN/Bangalore)" <sandhya.k_s@nokia.com>)
Ответы Re: [BUGS] Crash observed during the start of the Postgres process
Список pgsql-bugs

Hi Merlin,

 

I have filtered the logs based on PID (19825) to see if this helps to debug the issue further. Here are some observations:

 

  1. Though the restoration is complete (0000001C000000000000012F.done file created), we see some error logs related to restoration failure as below:

 

                1: [19825-58f473ff.4d71-1] 2017-04-17 07:51:27.919 GMT < > LOG:  00000: database system was interrupted while in recovery at log time 2017-04-16 20:09:38 GMT

                  2: [19825-58f473ff.4d71-2] 2017-04-17 07:51:27.919 GMT < > HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

                  3: [19825-58f473ff.4d71-3] 2017-04-17 07:51:27.919 GMT < > LOCATION:  StartupXLOG, xlog.c:4988

                  4: [19825-58f473ff.4d71-4] 2017-04-17 07:51:28.111 GMT < > DEBUG:  00000: standby_mode = 'true'

                  5: [19825-58f473ff.4d71-5] 2017-04-17 07:51:28.111 GMT < > LOCATION:  readRecoveryCommandFile, xlog.c:4312

                  6: [19825-58f473ff.4d71-6] 2017-04-17 07:51:28.111 GMT < > DEBUG:  00000: primary_conninfo = 'host=CFPU-1 port=5433 user=_qnrnwdbman'

                  7: [19825-58f473ff.4d71-7] 2017-04-17 07:51:28.111 GMT < > LOCATION:  readRecoveryCommandFile, xlog.c:4319

                  8: [19825-58f473ff.4d71-8] 2017-04-17 07:51:28.112 GMT < > DEBUG:  00000: trigger_file = '/mnt/DB_name/db_data/trigger_file'

                  9: [19825-58f473ff.4d71-9] 2017-04-17 07:51:28.112 GMT < > LOCATION:  readRecoveryCommandFile, xlog.c:4326

                  10: [19825-58f473ff.4d71-10] 2017-04-17 07:51:28.112 GMT < > DEBUG:  00000: restore_command = 'exit 1'

                  11: [19825-58f473ff.4d71-11] 2017-04-17 07:51:28.112 GMT < > LOCATION:  readRecoveryCommandFile, xlog.c:4186

                  12: [19825-58f473ff.4d71-12] 2017-04-17 07:51:28.112 GMT < > LOG:  00000: entering standby mode

                  13: [19825-58f473ff.4d71-13] 2017-04-17 07:51:28.112 GMT < > LOCATION:  StartupXLOG, xlog.c:5057

                  14: [19825-58f473ff.4d71-14] 2017-04-17 07:51:28.112 GMT < > DEBUG:  00000: executing restore command "exit 1"

                  15: [19825-58f473ff.4d71-15] 2017-04-17 07:51:28.112 GMT < > LOCATION:  RestoreArchivedFile, xlogarchive.c:201

                  16: [19825-58f473ff.4d71-16] 2017-04-17 07:51:28.133 GMT < > DEBUG:  00000: could not restore file "0000001C.history" from archive: return code 256

                  17: [19825-58f473ff.4d71-17] 2017-04-17 07:51:28.133 GMT < > LOCATION:  RestoreArchivedFile, xlogarchive.c:304

                  18: [19825-58f473ff.4d71-18] 2017-04-17 07:51:28.134 GMT < > DEBUG:  00000: executing restore command "exit 1"

                  19: [19825-58f473ff.4d71-19] 2017-04-17 07:51:28.134 GMT < > LOCATION:  RestoreArchivedFile, xlogarchive.c:201

                  20: [19825-58f473ff.4d71-20] 2017-04-17 07:51:28.156 GMT < > DEBUG:  00000: could not restore file "0000001C000000000000012F" from archive: return code 256

                  21: [19825-58f473ff.4d71-21] 2017-04-17 07:51:28.156 GMT < > LOCATION:  RestoreArchivedFile, xlogarchive.c:304

                  22: [19825-58f473ff.4d71-22] 2017-04-17 07:51:28.156 GMT < > DEBUG:  00000: checkpoint record is at 0/4BDE6A88

                  23: [19825-58f473ff.4d71-23] 2017-04-17 07:51:28.156 GMT < > LOCATION:  StartupXLOG, xlog.c:5182

                  24: [19825-58f473ff.4d71-24] 2017-04-17 07:51:28.156 GMT < > DEBUG:  00000: redo record is at 0/4BC2A998; shutdown FALSE

                  25: [19825-58f473ff.4d71-25] 2017-04-17 07:51:28.156 GMT < > LOCATION:  StartupXLOG, xlog.c:5259

                  26: [19825-58f473ff.4d71-26] 2017-04-17 07:51:28.156 GMT < > DEBUG:  00000: next transaction ID: 0/6252; next OID: 71601

                  27: [19825-58f473ff.4d71-27] 2017-04-17 07:51:28.156 GMT < > LOCATION:  StartupXLOG, xlog.c:526

 

  1. Also, we see the below highlighted logs repeated multiple (count=7984) times during the DB operation. Can you please explain what these logs mean from Postgres perspective?

 

                  46: [19825-58f473ff.4d71-46] 2017-04-17 07:51:28.189 GMT < > DEBUG:  00000: record known xact 6252 latestObservedXid 6251

                  47: [19825-58f473ff.4d71-47] 2017-04-17 07:51:28.189 GMT < > CONTEXT:  xlog redo AccessExclusive locks: xid 6252 db 16386 rel 16899

                  48: [19825-58f473ff.4d71-48] 2017-04-17 07:51:28.189 GMT < > LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2673

                  49: [19825-58f473ff.4d71-49] 2017-04-17 07:51:28.190 GMT < > DEBUG:  00000: record known xact 6252 latestObservedXid 6252

                  50: [19825-58f473ff.4d71-50] 2017-04-17 07:51:28.190 GMT < > CONTEXT:  xlog redo AccessExclusive locks: xid 6252 db 16386 rel 16941

                  51: [19825-58f473ff.4d71-51] 2017-04-17 07:51:28.190 GMT < > LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2673

 

  1. After these repeated messages, crash is observed with the below log message:

 

                  24491: [19825-58f473ff.4d71-24036] 2017-04-17 07:51:30.196 GMT < > CONTEXT:  xlog redo insert: rel 1663/16386/11691; tid 13/266

                  24492: [19825-58f473ff.4d71-24037] 2017-04-17 07:51:30.196 GMT < > LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2673

                  24493: [19825-58f473ff.4d71-24038] 2017-04-17 07:51:30.196 GMT < > DEBUG:  00000: record known xact 6271 latestObservedXid 6271

                  24494: [19825-58f473ff.4d71-24039] 2017-04-17 07:51:30.196 GMT < > CONTEXT:  xlog redo delete: index 1663/16386/11692; iblk 9, heap 1663/16386/11689;

                  24495: [19825-58f473ff.4d71-24040] 2017-04-17 07:51:30.196 GMT < > LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2673

                  24496: [19825-58f473ff.4d71-24041] 2017-04-17 07:51:30.196 GMT < > PANIC:  XX000: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data

                  24497: [19825-58f473ff.4d71-24042] 2017-04-17 07:51:30.196 GMT < > CONTEXT:  xlog redo delete: index 1663/16386/11692; iblk 9, heap 1663/16386/11689;

                  24498: [19825-58f473ff.4d71-24043] 2017-04-17 07:51:30.196 GMT < > LOCATION:  btree_xlog_delete_get_latestRemovedXid, nbtxlog.c:634

                  24765: [19810-58f473ff.4d62-191] 2017-04-17 07:51:33.348 GMT < > LOG:  00000: startup process (PID 19825) was terminated by signal 6: Aborted

 

I have attached the logfiles related to PID 19825, which can provide more insight on the observations explained as above.

 

 

Regards,

Sandhya

 

-----Original Message-----
From: K S, Sandhya (Nokia - IN/Bangalore)
Sent: Tuesday, April 25, 2017 8:24 PM
To: Merlin Moncure <mmoncure@gmail.com>
Cc: pgsql-hackers@postgresql.org; tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash.itnal@nokia.com>; T, Rasna (Nokia - IN/Bangalore) <rasna.t@nokia.com>
Subject: RE: [BUGS] Crash observed during the start of the Postgres process

 

Hi Merlin,

 

Below is the log captured when the crash was encountered.

 

STATEMENT:  select count(1) from pg_ls_dir(current_setting('data_directory')) where pg_ls_dir = 'backup_label'

LOG:  00000: duration: 4.083 ms

LOCATION:  exec_simple_query, postgres.c:1145

DEBUG:  00000: shmem_exit(0): 7 callbacks to make

LOCATION:  shmem_exit, ipc.c:212

DEBUG:  00000: proc_exit(0): 3 callbacks to make

LOCATION:  proc_exit_prepare, ipc.c:184

DEBUG:  00000: exit(0)

LOCATION:  proc_exit, ipc.c:135

DEBUG:  00000: shmem_exit(-1): 0 callbacks to make

LOCATION:  shmem_exit, ipc.c:212

DEBUG:  00000: proc_exit(-1): 0 callbacks to make

LOCATION:  proc_exit_prepare, ipc.c:184

DEBUG:  00000: reaping dead processes

LOCATION:  reaper, postmaster.c:2669

DEBUG:  00000: server process (PID 11104) exited with exit code 0

LOCATION:  LogChildExit, postmaster.c:3385

DEBUG:  00000: reaping dead processes

LOCATION:  reaper, postmaster.c:2669

LOG:  00000: startup process (PID 10265) was terminated by signal 6: Aborted

LOCATION:  LogChildExit, postmaster.c:3407

LOG:  00000: terminating any other active server processes

LOCATION:  HandleChildCrash, postmaster.c:3134

DEBUG:  00000: sending SIGQUIT to process 10994

LOCATION:  HandleChildCrash, postmaster.c:3233

DEBUG:  00000: sending SIGQUIT to process 10269

LOCATION:  HandleChildCrash, postmaster.c:3263

DEBUG:  00000: sending SIGQUIT to process 10268

LOCATION:  HandleChildCrash, postmaster.c:3275

WARNING:  57P02: terminating connection because of crash of another server process

DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.

 

 

Backtrace of the core generated:

(gdb) bt

#0  0x0000005563bcf9c0 in raise () from /lib64/libc.so.6

#1  0x0000005563bd42bc in abort () from /lib64/libc.so.6

#2  0x000000012039e228 in errfinish ()

#3  0x000000012039ef08 in elog_finish ()

#4  0x000000012009eb08 in btree_redo ()

#5  0x00000001200caff8 in StartupXLOG ()

#6  0x0000000120259958 in StartupProcessMain ()

#7  0x00000001200d590c in AuxiliaryProcessMain ()

#8  0x0000000120253434 in ?? ()

 

Let me know if any further clarification/information is needed.

 

Regards,

Sandhya

 

-----Original Message-----

From: Merlin Moncure [mailto:mmoncure@gmail.com]

Sent: Tuesday, April 25, 2017 8:01 PM

To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya.k_s@nokia.com>

Cc: pgsql-hackers@postgresql.org; tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash.itnal@nokia.com>; T, Rasna (Nokia - IN/Bangalore) <rasna.t@nokia.com>

Subject: Re: [BUGS] Crash observed during the start of the Postgres process

 

On Tue, Apr 25, 2017 at 8:44 AM, K S, Sandhya (Nokia - IN/Bangalore)

<sandhya.k_s@nokia.com> wrote:

> Hello,

> Did you get a chance to take a look into the issue?

> Please consider it with high priority. We will be awaiting your inputs.

 

This email is heavily cross posted, which is obnoxious.  Can you paste

the relevant log snippet?

 

merlin

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: sebastian.calbaza@hgdata.com
Дата:
Сообщение: [BUGS] BUG #14648: counts for queries using array unnesting is incorrect
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression