Обсуждение: 7.0.2 crash (maybe linux kernel bug??)

Поиск
Список
Период
Сортировка

7.0.2 crash (maybe linux kernel bug??)

От
Michael J Schout
Дата:
Hi.

Ive had a crash in postgresql 7.0.2.  Looking at what happened, I actually
suspect that this is a filesystem bug, and not a postgresql bug necessarily,
but I wanted to report it here and see if anyone else had any opinions.

The platform this happened on was linux (redhat 6.2), kernel 2.2.16 (SMP) dual
pentium III 500MHz cpus, Mylex DAC960 raid controller running in raid5 mode.

During regular activity, I got a kernel oops.  Looking at the call trace from
the kernel, as well as the EIP, I think maybe there is a bug here int the fs
buffer code, and that htis is a linux kernel problem (not a postgresql
problem).

Bug I'm no expert here.. Does this sould correct looking at the kernel erros
below?

Sorry if this is off topic.  I just want to make sure this is a kernel bug and
not a postgresql bug.

Mike

The oopses:

kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000134 
kernel: current->tss.cr3 = 1a325000, %%cr3 = 1a325000 
kernel: *pde = 00000000 
kernel: Oops: 0002 
kernel: CPU:    0 
kernel: EIP:    0010:[remove_from_queues+169/328] 
kernel: EFLAGS: 00010206 
kernel: eax: 00000100   ebx: 00000002   ecx: df022e40   edx: efba76b8 
kernel: esi: df022e40   edi: 00000000   ebp: 00000000   esp: da327ea4 
kernel: ds: 0018   es: 0018   ss: 0018 
kernel: Process postmaster (pid: 11527, process nr: 51, stackpage=da327000) 
kernel: Stack: df022e40 c012be79 df022e40 df022e40 00001000 c0142cb8 c0142cc7 df022e40  
kernel:        ec247140 ffffffea ec0b026c da326000 df022e40 df022e40 df022e40 000a4000  
kernel:        00000000 da327f08 00000000 00000000 eff29200 00001000 000000a5 000a5000  
kernel: Call Trace: [refile_buffer+77/184] [ext2_file_write+996/1584] [ext2_file_write+1011/1584] [kfree_skbmem+51/64]
[__kfree_skb+162/168][lockd:__insmod_lockd_O/lib/modules/2.2.16-3smp/fs/lockd.o_M394EA7+-76392/76]
[handle_IRQ_event+90/140] 
 
kernel:        [sys_write+240/292] [ext2_file_write+0/1584] [system_call+52/56] [startup_32+43/164]  
kernel: Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d  
kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000100 
kernel: current->tss.cr3 = 1ba46000, %%cr3 = 1ba46000 
kernel: *pde = 00000000 
kernel: Oops: 0000 
kernel: CPU:    1 
kernel: EIP:    0010:[find_buffer+104/144] 
kernel: EFLAGS: 00010206 
kernel: eax: 00000100   ebx: 00000007   ecx: 00069dae   edx: 00000100 
kernel: esi: 0000000d   edi: 00003006   ebp: 0005ce4b   esp: e53a19f4 
kernel: ds: 0018   es: 0018   ss: 0018 
kernel: Process postmaster (pid: 5545, process nr: 37, stackpage=e53a1000) 
kernel: Stack: 0005ce4b 00003006 00069dae c012b953 00003006 0005ce4b 00001000 c012bcc6  
kernel:        00003006 0005ce4b 00001000 00003006 eff29200 00003006 00004e4b ef18c960  
kernel:        c0141ee7 00003006 0005ce4b 00001000 0005ce4b e53a1bb0 edc3c660 edc3c660  
kernel: Call Trace: [get_hash_table+23/36] [getblk+30/324] [ext2_new_block+2291/2756] [getblk+271/324]
[ext2_alloc_block+344/356][block_getblk+305/624] [ext2_getblk+256/524]  
 
kernel:        [ext2_file_write+1308/1584] [__brelse+19/84] [permission+36/248] [dump_seek+53/104] [dump_seek+53/104]
[dump_write+48/84][elf_core_dump+3104/3216] [do_IRQ+82/92]  
 
kernel:        [tcp_write_xmit+407/472] [__release_sock+36/124] [tcp_do_sendmsg+2125/2144] [inet_sendmsg+0/144]
[cprt+1553/20096][cprt+1553/20096] [cprt+1553/20096] [do_signal+458/724]  
 
kernel:        [force_sig_info+168/180] [force_sig+17/24] [do_general_protection+54/160] [error_code+45/52]
[signal_return+20/24] 
 
kernel: Code: 8b 00 39 6a 04 75 15 8b 4c 24 20 39 4a 08 75 0c 66 39 7a 0c  



Re: 7.0.2 crash (maybe linux kernel bug??)

От
Alfred Perlstein
Дата:
* Michael J Schout <mschout@gkg.net> [001031 11:22] wrote:
> Hi.
> 
> Ive had a crash in postgresql 7.0.2.  Looking at what happened, I actually
> suspect that this is a filesystem bug, and not a postgresql bug necessarily,
> but I wanted to report it here and see if anyone else had any opinions.
> 
> The platform this happened on was linux (redhat 6.2), kernel 2.2.16 (SMP) dual
> pentium III 500MHz cpus, Mylex DAC960 raid controller running in raid5 mode.
> 
> During regular activity, I got a kernel oops.  Looking at the call trace from
> the kernel, as well as the EIP, I think maybe there is a bug here int the fs
> buffer code, and that htis is a linux kernel problem (not a postgresql
> problem).
> 
> Bug I'm no expert here.. Does this sould correct looking at the kernel erros
> below?
> 
> Sorry if this is off topic.  I just want to make sure this is a kernel bug and
> not a postgresql bug.
> 
> Mike
> 
> The oopses:
> 
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000134 
> kernel: current->tss.cr3 = 1a325000, %%cr3 = 1a325000 
> kernel: *pde = 00000000 
> kernel: Oops: 0002 
> kernel: CPU:    0 
> kernel: EIP:    0010:[remove_from_queues+169/328] 
> kernel: EFLAGS: 00010206 
> kernel: eax: 00000100   ebx: 00000002   ecx: df022e40   edx: efba76b8 
> kernel: esi: df022e40   edi: 00000000   ebp: 00000000   esp: da327ea4 
> kernel: ds: 0018   es: 0018   ss: 0018 
> kernel: Process postmaster (pid: 11527, process nr: 51, stackpage=da327000) 
> kernel: Stack: df022e40 c012be79 df022e40 df022e40 00001000 c0142cb8 c0142cc7 df022e40  
> kernel:        ec247140 ffffffea ec0b026c da326000 df022e40 df022e40 df022e40 000a4000  
> kernel:        00000000 da327f08 00000000 00000000 eff29200 00001000 000000a5 000a5000  
> kernel: Call Trace: [refile_buffer+77/184] [ext2_file_write+996/1584] [ext2_file_write+1011/1584]
[kfree_skbmem+51/64][__kfree_skb+162/168] [lockd:__insmod_lockd_O/lib/modules/2.2.16-3smp/fs/lockd.o_M394EA7+-76392/76]
[handle_IRQ_event+90/140] 
 
> kernel:        [sys_write+240/292] [ext2_file_write+0/1584] [system_call+52/56] [startup_32+43/164]  
> kernel: Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d  
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000100 

Yes, your kernel basically segfaulted, I would get a traceback from your
crashdump and discuss it with the kernel developers.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


> kernel: current->tss.cr3 = 1ba46000, %%cr3 = 1ba46000 
> kernel: *pde = 00000000 
> kernel: Oops: 0000 
> kernel: CPU:    1 
> kernel: EIP:    0010:[find_buffer+104/144] 
> kernel: EFLAGS: 00010206 
> kernel: eax: 00000100   ebx: 00000007   ecx: 00069dae   edx: 00000100 
> kernel: esi: 0000000d   edi: 00003006   ebp: 0005ce4b   esp: e53a19f4 
> kernel: ds: 0018   es: 0018   ss: 0018 
> kernel: Process postmaster (pid: 5545, process nr: 37, stackpage=e53a1000) 
> kernel: Stack: 0005ce4b 00003006 00069dae c012b953 00003006 0005ce4b 00001000 c012bcc6  
> kernel:        00003006 0005ce4b 00001000 00003006 eff29200 00003006 00004e4b ef18c960  
> kernel:        c0141ee7 00003006 0005ce4b 00001000 0005ce4b e53a1bb0 edc3c660 edc3c660  
> kernel: Call Trace: [get_hash_table+23/36] [getblk+30/324] [ext2_new_block+2291/2756] [getblk+271/324]
[ext2_alloc_block+344/356][block_getblk+305/624] [ext2_getblk+256/524]  
 
> kernel:        [ext2_file_write+1308/1584] [__brelse+19/84] [permission+36/248] [dump_seek+53/104] [dump_seek+53/104]
[dump_write+48/84][elf_core_dump+3104/3216] [do_IRQ+82/92]  
 
> kernel:        [tcp_write_xmit+407/472] [__release_sock+36/124] [tcp_do_sendmsg+2125/2144] [inet_sendmsg+0/144]
[cprt+1553/20096][cprt+1553/20096] [cprt+1553/20096] [do_signal+458/724]  
 
> kernel:        [force_sig_info+168/180] [force_sig+17/24] [do_general_protection+54/160] [error_code+45/52]
[signal_return+20/24] 
 
> kernel: Code: 8b 00 39 6a 04 75 15 8b 4c 24 20 39 4a 08 75 0c 66 39 7a 0c