Re: BUG #16990: Random PANIC in qemu user context

Поиск
Список
Период
Сортировка
От Paul Guyot
Тема Re: BUG #16990: Random PANIC in qemu user context
Дата
Msg-id 86C24765-95F7-464F-9677-B09A396A5F69@kallisys.net
обсуждение исходный текст
Ответ на Re: BUG #16990: Random PANIC in qemu user context  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #16990: Random PANIC in qemu user context  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
> Not sure what to tell you, other than "make sure qemu and your
> build toolchain are up-to-date".

In this scenario, I use postgresql 11.11 that was compiled by raspbian folks. I also used the qemu binary provided by
ubuntufor focal, which happens to be 4.2 (not the latest). 

I found out the corresponding function using readelf to locate the string constant.

For the record, the C function is here:
https://github.com/postgres/postgres/blob/REL_11_STABLE/src/backend/storage/lmgr/lwlock.c#L811

The tight read loop is as follows:
  32b548:    e28d0004     add        r0, sp, #4
  32b54c:    eb000679     bl        32cf38 <perform_spin_delay@@Base>
  32b550:    e5943004     ldr        r3, [r4, #4]
  32b554:    e3130201     tst        r3, #268435456    ; 0x10000000
  32b558:    1afffffa         bne        32b548 <RememberSimpleDeadLock@@Base+0xc4>

At address 32b550, it does perform a read, honoring the volatile pointer.

I guess the lock is acquired by the same function:
https://github.com/postgres/postgres/blob/REL_11_STABLE/src/backend/storage/lmgr/lwlock.c#L824

The corresponding code is the following
  32b508:    ee070fba     mcr    15, 0, r0, cr7, cr10, {5}
  32b50c:    e1953f9f     ldrex    r3, [r5]
  32b510:    e3832201     orr        r2, r3, #268435456    ; 0x10000000
  32b514:    e1851f92     strex    r1, r2, [r5]
  32b518:    e3510000     cmp    r1, #0
  32b51c:    1afffffa         bne        32b50c <RememberSimpleDeadLock@@Base+0x88>
  32b520:    e3130201     tst        r3, #268435456    ; 0x10000000
  32b524:    ee070fba     mcr    15, 0, r0, cr7, cr10, {5}
  32b528:    0a00000e     beq        32b568 <RememberSimpleDeadLock@@Base+0xe4>

mcr    15, 0, r0, cr7, cr10, {5} is __sync_synchronize() and based on the previous instructions, r5 is equal to r4+4 as
usedin the tight loop. 

I also guess the corresponding unlock function just follows, and disassembling it reveals the same use of
__sync_synchronize().
  32b644:    ee070fba     mcr    15, 0, r0, cr7, cr10, {5}
  32b648:    e1932f9f     ldrex    r2, [r3]
  32b64c:    e3c22201     bic        r2, r2, #268435456    ; 0x10000000
  32b650:    e1831f92     strex    r1, r2, [r3]
  32b654:    e3510000     cmp    r1, #0
  32b658:    1afffffa         bne        32b648 <RememberSimpleDeadLock@@Base+0x1c4>
  32b65c:    ee070fba     mcr    15, 0, r0, cr7, cr10, {5}
  32b660:    e8bd8070     pop        {r4, r5, r6, pc}

QEMU user emulation documentation mentions something specific to threading on ARM.
https://qemu.readthedocs.io/en/latest/user/main.html
> Threading:
> On Linux, QEMU can emulate the clone syscall and create a real host thread (with a separate virtual CPU) for each
emulatedthread. Note that not all targets currently emulate atomic operations correctly. x86 and Arm use a global lock
inorder to preserve their semantics. 

I have yet to determine what impact it could have here. Can we imagine a situation where the memory barrier was not
honoredand an unlock would be overwritten with a lock? 

Eventually, I have tried to run the whole script with taskset -c 0 (which is fine with the tests as the target system,
aRaspberry Pi Zero, is single core, while GitHub Linux runners have 2 vCPUs). 
https://github.com/pguyot/pynab/commit/91011e68e446c69e317fd1198c58f85ff0cd5fb1
https://github.com/pguyot/pynab/runs/2486051700?check_suite_focus=true

I ran it four times so far, and no postgresql PANIC happens. So your hypothesis of a bug (limitation) of qemu 4.2 seems
probable…
FYI, newer ARM architectures, starting with armv7l, have a dedicated instruction for memory barriers which is not used
hereas it is not recognized by Raspberry PI Zero CPU. 

Paul




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16990: Random PANIC in qemu user context
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16990: Random PANIC in qemu user context