Обсуждение: BUG #4772: The server is hanging

Поиск
Список
Период
Сортировка

BUG #4772: The server is hanging

От
"gs"
Дата:
The following bug has been logged online:

Bug reference:      4772
Logged by:          gs
Email address:      gs@freenet.am
PostgreSQL version: 8.3.1
Operating system:   Fedora 9
Description:        The server is hanging
Details:

Hello, the Linux server is going down, we don't know what is the problem -
the kernel or PostgreSQL, here is some part from /var/log/messages, more
detailed file can be downloaded from here:
http://www.sendspace.com/file/7q4zwc


Log:

Apr 22 12:28:05 icon kernel: VFS: brelse: Trying to free free buffer
Apr 22 12:28:05 icon kernel: ------------[ cut here ]------------
Apr 22 12:28:05 icon kernel: WARNING: at fs/buffer.c:1215
__brelse+0x28/0x2b() (Not tainted)
Apr 22 12:28:05 icon kernel: Modules linked in: sunrpc ipv6 ipt_REJECT
xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables cpufreq_ondemand acpi_cpufreq loop dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt
firewire_ohci iTCO_vendor_support snd_page_alloc firewire_core snd_hwdep
crc_itu_t button snd pcspkr i2c_i801 i2c_core r8169 soundcore sg floppy
ata_generic pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Apr 22 12:28:05 icon kernel: Pid: 1359, comm: kjournald Not tainted
2.6.25-14.fc9.i686 #1
Apr 22 12:28:05 icon kernel:  [warn_on_slowpath+71/115]
warn_on_slowpath+0x47/0x73
Apr 22 12:28:05 icon kernel:  [__wake_up+49/59] ? __wake_up+0x31/0x3b
Apr 22 12:28:05 icon kernel:  [wake_up_klogd+46/49] ?
wake_up_klogd+0x2e/0x31
Apr 22 12:28:05 icon kernel:  [release_console_sem+405/413] ?
release_console_sem+0x195/0x19d
Apr 22 12:28:05 icon kernel:  [__brelse+40/43] __brelse+0x28/0x2b
Apr 22 12:28:05 icon kernel:  [__find_get_block+358/376]
__find_get_block+0x166/0x178
Apr 22 12:28:05 icon kernel:  [__getblk+202/728] __getblk+0xca/0x2d8
Apr 22 12:28:05 icon kernel:  [<f885bc04>] ?
__journal_remove_journal_head+0xe3/0xea [jbd]
Apr 22 12:28:05 icon kernel:  [kmem_cache_free+125/147] ?
kmem_cache_free+0x7d/0x93
Apr 22 12:28:05 icon kernel:  [<f885bc04>] ?
__journal_remove_journal_head+0xe3/0xea [jbd]
Apr 22 12:28:05 icon kernel:  [<f885cdd1>]
journal_get_descriptor_buffer+0x2e/0x94 [jbd]
Apr 22 12:28:05 icon kernel:  [<f8859cfb>]
journal_commit_transaction+0x5a9/0xc2a [jbd]
Apr 22 12:28:05 icon kernel:  [lock_timer_base+31/62] ?
lock_timer_base+0x1f/0x3e
Apr 22 12:28:05 icon kernel:  [try_to_del_timer_sync+72/79] ?
try_to_del_timer_sync+0x48/0x4f
Apr 22 12:28:05 icon kernel:  [<f885c800>] kjournald+0xb2/0x1c7 [jbd]
Apr 22 12:28:05 icon kernel:  [autoremove_wake_function+0/51] ?
autoremove_wake_function+0x0/0x33
Apr 22 12:28:05 icon kernel:  [<f885c74e>] ? kjournald+0x0/0x1c7 [jbd]
Apr 22 12:28:05 icon kernel:  [kthread+59/97] kthread+0x3b/0x61
Apr 22 12:28:05 icon kernel:  [kthread+0/97] ? kthread+0x0/0x61
Apr 22 12:28:05 icon kernel:  [kernel_thread_helper+7/16]
kernel_thread_helper+0x7/0x10
Apr 22 12:28:05 icon kernel:  =======================
Apr 22 12:28:05 icon kernel: ---[ end trace 8bc5c17b43483d60 ]---
Apr 22 12:28:06 icon kernel: VFS: brelse: Trying to free free buffer
Apr 22 12:28:06 icon kernel: ------------[ cut here ]------------
Apr 22 12:28:06 icon kernel: WARNING: at fs/buffer.c:1215
__brelse+0x28/0x2b() (Not tainted)
Apr 22 12:28:06 icon kernel: Modules linked in: sunrpc ipv6 ipt_REJECT
xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables cpufreq_ondemand acpi_cpufreq loop dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt
firewire_ohci iTCO_vendor_support snd_page_alloc firewire_core snd_hwdep
crc_itu_t button snd pcspkr i2c_i801 i2c_core r8169 soundcore sg floppy
ata_generic pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Apr 22 12:28:06 icon kernel: Pid: 1359, comm: kjournald Not tainted
2.6.25-14.fc9.i686 #1
Apr 22 12:28:06 icon kernel:  [warn_on_slowpath+71/115]
warn_on_slowpath+0x47/0x73
Apr 22 12:28:06 icon kernel:  [__wake_up+49/59] ? __wake_up+0x31/0x3b
Apr 22 12:28:06 icon kernel:  [wake_up_klogd+46/49] ?
wake_up_klogd+0x2e/0x31
Apr 22 12:28:06 icon kernel:  [release_console_sem+405/413] ?
release_console_sem+0x195/0x19d
Apr 22 12:28:06 icon kernel:  [__brelse+40/43] __brelse+0x28/0x2b
Apr 22 12:28:06 icon kernel:  [__find_get_block+358/376]
__find_get_block+0x166/0x178
Apr 22 12:28:06 icon kernel:  [__getblk+202/728] __getblk+0xca/0x2d8
Apr 22 12:28:06 icon kernel:  [<f885bc04>] ?
__journal_remove_journal_head+0xe3/0xea [jbd]
Apr 22 12:28:06 icon kernel:  [kmem_cache_free+125/147] ?
kmem_cache_free+0x7d/0x93
Apr 22 12:28:06 icon kernel:  [<f885bc04>] ?
__journal_remove_journal_head+0xe3/0xea [jbd]
Apr 22 12:28:06 icon kernel:  [<f885cdd1>]
journal_get_descriptor_buffer+0x2e/0x94 [jbd]
Apr 22 12:28:06 icon kernel:  [<f8859cfb>]
journal_commit_transaction+0x5a9/0xc2a [jbd]
Apr 22 12:28:06 icon kernel:  [lock_timer_base+31/62] ?
lock_timer_base+0x1f/0x3e
Apr 22 12:28:06 icon kernel:  [try_to_del_timer_sync+72/79] ?
try_to_del_timer_sync+0x48/0x4f
Apr 22 12:28:06 icon kernel:  [<f885c800>] kjournald+0xb2/0x1c7 [jbd]
Apr 22 12:28:06 icon kernel:  [autoremove_wake_function+0/51] ?
autoremove_wake_function+0x0/0x33
Apr 22 12:28:06 icon kernel:  [<f885c74e>] ? kjournald+0x0/0x1c7 [jbd]
Apr 22 12:28:06 icon kernel:  [kthread+59/97] kthread+0x3b/0x61
Apr 22 12:28:06 icon kernel:  [kthread+0/97] ? kthread+0x0/0x61
Apr 22 12:28:06 icon kernel:  [kernel_thread_helper+7/16]
kernel_thread_helper+0x7/0x10
Apr 22 12:28:06 icon kernel:  =======================
Apr 22 12:28:06 icon kernel: ---[ end trace 8bc5c17b43483d60 ]---
Apr 22 13:22:20 icon kernel: Eeek! page_mapcount(page) went negative! (-68)
Apr 22 13:22:20 icon kernel:   page pfn = 393d7
Apr 22 13:22:20 icon kernel:   page->flags = 8000007c
Apr 22 13:22:20 icon kernel:   page->count = 2
Apr 22 13:22:20 icon kernel:   page->mapping = f701180c
Apr 22 13:22:20 icon kernel:   vma->vm_ops = shm_vm_ops+0x0/0x18
Apr 22 13:22:20 icon kernel:   vma->vm_ops->nopage = 0x0
Apr 22 13:22:20 icon kernel:   vma->vm_ops->fault = shm_fault+0x0/0x11
Apr 22 13:22:20 icon kernel:   vma->vm_file->f_op->mmap = shm_mmap+0x0/0x3f
Apr 22 13:22:20 icon kernel: ------------[ cut here ]------------
Apr 22 13:22:20 icon kernel: kernel BUG at mm/rmap.c:669!
Apr 22 13:22:20 icon kernel: invalid opcode: 0000 [#1] SMP
Apr 22 13:22:20 icon kernel: Modules linked in: sunrpc ipv6 ipt_REJECT
xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables cpufreq_ondemand acpi_cpufreq loop dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt
firewire_ohci iTCO_vendor_support snd_page_alloc firewire_core snd_hwdep
crc_itu_t button snd pcspkr i2c_i801 i2c_core r8169 soundcore sg floppy
ata_generic pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Apr 22 13:22:20 icon kernel:
Apr 22 13:22:20 icon kernel: Pid: 4612, comm: postmaster Not tainted
(2.6.25-14.fc9.i686 #1)
Apr 22 13:22:20 icon kernel: EIP: 0060:[page_remove_rmap+207/236] EFLAGS:
00210246 CPU: 0
Apr 22 13:22:20 icon kernel: EIP is at page_remove_rmap+0xcf/0xec
Apr 22 13:22:20 icon kernel: EAX: 00000032 EBX: c1727ae0 ECX: c06cf434 EDX:
00200046
Apr 22 13:22:20 icon kernel: ESI: f58f13c8 EDI: b5ab7000 EBP: f5a22e9c ESP:
f5a22e94
Apr 22 13:22:20 icon kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Apr 22 13:22:20 icon kernel: Process postmaster (pid: 4612, ti=f5a22000
task=f2e0ae90 task.ti=f5a22000)
Apr 22 13:22:20 icon kernel: Stack: 393d7027 c1727ae0 f5a22f1c c046e70f
00000008 c0791250 b7d2cfff 00000000
Apr 22 13:22:20 icon kernel:        f58f13c8 f5a22f34 393d7027 00000001
00362fce 00000000 b7d2d000 f6ddab58
Apr 22 13:22:20 icon kernel:        f6ddab58 c043b6d6 013106d1 c17c8260
f5a22f00 c043b6ed 00000000 ffffff63
Apr 22 13:22:20 icon kernel: Call Trace:
Apr 22 13:22:20 icon kernel:  [unmap_vmas+684/1337] ?
unmap_vmas+0x2ac/0x539
Apr 22 13:22:20 icon kernel:  [ktime_get_ts+69/73] ? ktime_get_ts+0x45/0x49
Apr 22 13:22:20 icon kernel:  [ktime_get+19/47] ? ktime_get+0x13/0x2f
Apr 22 13:22:20 icon kernel:  [exit_mmap+99/214] ? exit_mmap+0x63/0xd6
Apr 22 13:22:20 icon kernel:  [mmput+55/121] ? mmput+0x37/0x79
Apr 22 13:22:20 icon kernel:  [exit_mm+218/223] ? exit_mm+0xda/0xdf
Apr 22 13:22:20 icon kernel:  [do_exit+494/1364] ? do_exit+0x1ee/0x554
Apr 22 13:22:20 icon kernel:  [do_group_exit+110/133] ?
do_group_exit+0x6e/0x85
Apr 22 13:22:20 icon kernel:  [sys_exit_group+19/21] ?
sys_exit_group+0x13/0x15
Apr 22 13:22:20 icon kernel:  [syscall_call+7/11] ? syscall_call+0x7/0xb
Apr 22 13:22:20 icon kernel:  [acpi_pci_root_add+559/672] ?
acpi_pci_root_add+0x22f/0x2a0
Apr 22 13:22:20 icon kernel:  =======================
Apr 22 13:22:20 icon kernel: Code: 8b 46 40 8b 50 08 b8 16 f4 6c c0 e8 aa 31
fd ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 34 f4 6c c0 e8
8f 31 fd ff <0f> 0b eb fe 8b 53 10 89 d8 83 e2 01 f7 da 83 c2 04 e8 c8 76 ff

Apr 22 13:22:20 icon kernel: EIP: [page_remove_rmap+207/236]
page_remove_rmap+0xcf/0xec SS:ESP 0068:f5a22e94
Apr 22 13:22:20 icon kernel: ---[ end trace 8bc5c17b43483d60 ]---
Apr 22 13:22:20 icon kernel: Fixing recursive fault but reboot is needed!
Apr 22 13:22:20 icon kernel: BUG: scheduling while atomic:
postmaster/4612/0x00000001
Apr 22 13:22:20 icon kernel: Pid: 4612, comm: postmaster Tainted: G      D
2.6.25-14.fc9.i686 #1
Apr 22 13:22:20 icon kernel:  [__schedule_bug+75/80]
__schedule_bug+0x4b/0x50
Apr 22 13:22:20 icon kernel:  [schedule+151/1755] schedule+0x97/0x6db
Apr 22 13:22:20 icon kernel:  [put_io_context+76/86] ?
put_io_context+0x4c/0x56
Apr 22 13:22:20 icon kernel:  [kmem_cache_free+125/147] ?
kmem_cache_free+0x7d/0x93
Apr 22 13:22:20 icon kernel:  [put_io_context+76/86] ?
put_io_context+0x4c/0x56
Apr 22 13:22:20 icon kernel:  [put_io_context+76/86] ?
put_io_context+0x4c/0x56
Apr 22 13:22:20 icon kernel:  [do_exit+168/1364] do_exit+0xa8/0x554
Apr 22 13:22:20 icon kernel:  [die+348/356] die+0x15c/0x164
Apr 22 13:22:20 icon kernel:  [do_trap+138/163] do_trap+0x8a/0xa3
Apr 22 13:22:20 icon kernel:  [do_invalid_op+0/118] ?
do_invalid_op+0x0/0x76
Apr 22 13:22:20 icon kernel:  [do_invalid_op+108/118]
do_invalid_op+0x6c/0x76
Apr 22 13:22:20 icon kernel:  [page_remove_rmap+207/236] ?
page_remove_rmap+0xcf/0xec
Apr 22 13:22:20 icon kernel:  [_call_console_drivers+87/91] ?
_call_console_drivers+0x57/0x5b
Apr 22 13:22:20 icon kernel:  [release_console_sem+405/413] ?
release_console_sem+0x195/0x19d
Apr 22 13:22:20 icon kernel:  [error_code+114/120] error_code+0x72/0x78
Apr 22 13:22:20 icon kernel:  [page_remove_rmap+207/236] ?
page_remove_rmap+0xcf/0xec
Apr 22 13:22:20 icon kernel:  [unmap_vmas+684/1337] unmap_vmas+0x2ac/0x539
Apr 22 13:22:20 icon kernel:  [ktime_get_ts+69/73] ? ktime_get_ts+0x45/0x49
Apr 22 13:22:20 icon kernel:  [ktime_get+19/47] ? ktime_get+0x13/0x2f
Apr 22 13:22:20 icon kernel:  [exit_mmap+99/214] exit_mmap+0x63/0xd6
Apr 22 13:22:20 icon kernel:  [mmput+55/121] mmput+0x37/0x79
Apr 22 13:22:20 icon kernel:  [exit_mm+218/223] exit_mm+0xda/0xdf
Apr 22 13:22:20 icon kernel:  [do_exit+494/1364] do_exit+0x1ee/0x554
Apr 22 13:22:20 icon kernel:  [do_group_exit+110/133]
do_group_exit+0x6e/0x85
Apr 22 13:22:20 icon kernel:  [sys_exit_group+19/21]
sys_exit_group+0x13/0x15
Apr 22 13:22:20 icon kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Apr 22 13:22:20 icon kernel:  [acpi_pci_root_add+559/672] ?
acpi_pci_root_add+0x22f/0x2a0
Apr 22 13:22:20 icon kernel:  =======================
Apr 22 13:22:20 icon kernel: Eeek! page_mapcount(page) went negative! (-68)
Apr 22 13:22:20 icon kernel:   page pfn = 393d7
Apr 22 13:22:20 icon kernel:   page->flags = 8000007c
Apr 22 13:22:20 icon kernel:   page->count = 3
Apr 22 13:22:20 icon kernel:   page->mapping = f701180c
Apr 22 13:22:20 icon kernel:   vma->vm_ops = shm_vm_ops+0x0/0x18
Apr 22 13:22:20 icon kernel:   vma->vm_ops->nopage = 0x0
Apr 22 13:22:20 icon kernel:   vma->vm_ops->fault = shm_fault+0x0/0x11
Apr 22 13:22:20 icon kernel:   vma->vm_file->f_op->mmap = shm_mmap+0x0/0x3f
Apr 22 13:22:20 icon kernel: ------------[ cut here ]------------
Apr 22 13:22:20 icon kernel: kernel BUG at mm/rmap.c:669!
Apr 22 13:22:20 icon kernel: invalid opcode: 0000 [#2] SMP
Apr 22 13:22:20 icon kernel: Modules linked in: sunrpc ipv6 ipt_REJECT
xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables cpufreq_ondemand acpi_cpufreq loop dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt
firewire_ohci iTCO_vendor_support snd_page_alloc firewire_core snd_hwdep
crc_itu_t button snd pcspkr i2c_i801 i2c_core r8169 soundcore sg floppy
ata_generic pata_acpi ata_piix libata sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Re: BUG #4772: The server is hanging

От
John R Pierce
Дата:
gs wrote:
> The following bug has been logged online:
>
> Bug reference:      4772
> Logged by:          gs
> Email address:      gs@freenet.am
> PostgreSQL version: 8.3.1
> Operating system:   Fedora 9
> Description:        The server is hanging
> Details:
>
> Hello, the Linux server is going down, we don't know what is the problem -
> the kernel or PostgreSQL, here is some part from /var/log/messages, more
> detailed file can be downloaded from here:
> http://www.sendspace.com/file/7q4zwc
>
>
> Log:
>
> Apr 22 12:28:05 icon kernel: VFS: brelse: Trying to free free buffer
> Apr 22 12:28:05 icon kernel: ------------[ cut here ]------------
> Apr 22 12:28:05 icon kernel: WARNING: at fs/buffer.c:1215
> __brelse+0x28/0x2b() (Not tainted)
> Apr 22 12:28:05 icon kernel: Modules linked in: sunrpc ipv6 ipt_REJECT
>
....

that looks like a kernel fault to me,  nothing Postgres is doing should
cause the kernel to try to free an already free IO buffer.  this could
be triggered by a hardware problem, or a buggy kernel or kernel driver.

now, 8.3.1 is a early version in the 8.3 series (they are up to
8.3.7).     before persuing too many other avenues, I think I'd
temporarily disable postgres (boot up in safe mode, and disable the
postgres service via `chkconfig` or whatever), and make sure your fedora
9 is updated (yum update), and update postgres to 8.3.7.

Re: BUG #4772: The server is hanging

От
Devrim GÜNDÜZ
Дата:
On Wed, 2009-04-22 at 04:41 -0700, John R Pierce wrote:
> now, 8.3.1 is a early version in the 8.3 series (they are up to
> 8.3.7).     before persuing too many other avenues, I think I'd
> temporarily disable postgres (boot up in safe mode, and disable the
> postgres service via `chkconfig` or whatever), and make sure your
> fedora
> 9 is updated (yum update), and update postgres to 8.3.7.

I bet gs won't be able to update his box if (s)he does not know about
the newkey update for Fedora 9. That might be the reason of using 8.3.1
on Fedora 9.

gs,please take a look at here:
https://fedoraproject.org/wiki/Enabling_new_signing_key#Install_new_fedora-release

Regards,
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr                  http://www.gunduz.org