Обсуждение: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

Поиск
Список
Период
Сортировка

Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
Hi, 
after upgrading from 10.5 to 10.6.2, database now says this:

Mar  7 13:55:25 <local0.info> edge postgres[1820]: [7-1] :[] LOG: database system was shut down at 2019-03-07 13:43:29
CET
Mar  7 13:55:25 <local0.info> edge postgres[1816]: [7-1] :[] LOG: database system is ready to accept connections
Mar  7 13:58:43 <local0.info> edge postgres[1816]: [8-1] :[] LOG:  worker process: parallel worker for PID 3526 (PID
3527)was terminated by signal 10: Bus error
 
Mar  7 13:58:43 <local0.info> edge postgres[1816]: [9-1] :[] LOG:  terminating any other active server processes
Mar  7 13:58:43 <local0.info> edge postgres[1816]: [10-1] :[] LOG:  archiver process (PID 1824) exited with exit code
1
Mar  7 13:58:43 <local0.info> edge postgres[1816]: [11-1] :[] LOG:  all server processes terminated; reinitializing
Mar  7 13:58:45 <local0.info> edge postgres[3531]: [12-1] :[] LOG:  database system was interrupted; last known up at
2019-03-0713:55:25 CET
 
Mar  7 13:58:51 <local0.info> edge postgres[3534]: [12-1] [unknown]:[unknown][192.168.98.3(45111)] LOG:  connection
received:host=192.168.98.3 port=45111
 
Mar  7 13:58:51 <local0.err> edge postgres[3534]: [13-1] rapppmcf:fin[192.168.98.3(45111)] FATAL:  the database system
isin recovery mode
 
Mar  7 13:58:51 <local0.info> edge postgres[3535]: [12-1] [unknown]:[unknown][192.168.98.3(45112)] LOG:  connection
received:host=192.168.98.3 port=45112
 
Mar  7 13:58:51 <local0.err> edge postgres[3535]: [13-1] rapppmcf:fin[192.168.98.3(45112)] FATAL:  the database system
isin recovery mode
 
Mar  7 13:58:57 <local0.info> edge postgres[3531]: [13-1] :[] LOG:  database system was not properly shut down;
automaticrecovery in progress
 
Mar  7 13:58:58 <local0.info> edge postgres[3531]: [14-1] :[] LOG:  redo starts at 2C/C600008C
Mar  7 13:59:10 <local0.info> edge postgres[1816]: [12-1] :[] LOG:  startup process (PID 3531) was terminated by signal
10:Bus error
 
Mar  7 13:59:10 <local0.info> edge postgres[1816]: [13-1] :[] LOG:  aborting startup due to startup process failure
Mar  7 13:59:11 <local0.info> edge postgres[1816]: [14-1] :[] LOG:  database system is shut down

This is repeatable. DB starts normally, I start first application fine,
start second application fine, start third application: KABOOM!

Reinstalled 10.5 for now, so it runs again.

Time to read the relnotes: 
> When building on i386 with the clang compiler, require -msse2 to be 
> used (Andres Freund)
> This avoids problems with missed floating point overflow checks.

What the hell does that mean? Does it concern the build process? Or 
the operation? Why does it only concern the Clang? And what is SSE2
concerned with?

Or, is this a strangely cryptic statement, which, after proper decryption,
should actually read: 
// "Beginning with the upgrade from 10.5 to 10.6, postgreSQL can no 
// longer run on platforms that do not provide SSE2" ???

The point here is, my third application works with lots of floating
point stuff. The other two do not. 

Further investigation to followup ASAP.


PMc


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

 Peter> Hi, 
 Peter> after upgrading from 10.5 to 10.6.2, database now says this:

 Peter> Mar 7 13:58:43 <local0.info> edge postgres[1816]: [8-1] :[] LOG:
 Peter> worker process: parallel worker for PID 3526 (PID 3527) was
 Peter> terminated by signal 10: Bus error

I'm assuming from the CC that this is on FreeBSD, but on what
architecture?

Did it drop a core file (look in the data dir for postgres.core) and if
so can you get a backtrace?

 Peter> Time to read the relnotes: 

 >> When building on i386 with the clang compiler, require -msse2 to be
 >> used (Andres Freund) This avoids problems with missed floating point
 >> overflow checks.

 Peter> What the hell does that mean? Does it concern the build process?
 Peter> Or the operation? Why does it only concern the Clang? And what
 Peter> is SSE2 concerned with?

It concerns only overflow checks in floating-point computations.

Clang's __builtin_isinf(x) function, which is supposed to test if x is
infinite, does not work reliably on i386 when the x87 registers are used
for floating point. It does work if the SSE2 registers are used instead,
which clang will do if the -msse2 option is used. The downside of course
is that the code will no longer run on CPUs that are old enough to vote.

This is only a problem on clang because gcc has other options we can use
to force working infinity tests.

You can check whether your CPU supports SSE2 by looking at the Features=
line in /var/run/dmesg.boot. It seems unlikely that it does not, because
SSE2 was introduced in 2000 with the Pentium 4.

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Palle Girgensohn
Дата:


7 mars 2019 kl. 18:20 skrev Andrew Gierth <andrew@tao11.riddles.org.uk>:

"Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

Peter> Hi,
Peter> after upgrading from 10.5 to 10.6.2, database now says this:

Peter> Mar 7 13:58:43 <local0.info> edge postgres[1816]: [8-1] :[] LOG:
Peter> worker process: parallel worker for PID 3526 (PID 3527) was
Peter> terminated by signal 10: Bus error

I'm assuming from the CC that this is on FreeBSD, but on what
architecture?

Did it drop a core file (look in the data dir for postgres.core) and if
so can you get a backtrace?

Peter> Time to read the relnotes:

When building on i386 with the clang compiler, require -msse2 to be
used (Andres Freund) This avoids problems with missed floating point
overflow checks.

Peter> What the hell does that mean? Does it concern the build process?
Peter> Or the operation? Why does it only concern the Clang? And what
Peter> is SSE2 concerned with?

It concerns only overflow checks in floating-point computations.

Clang's __builtin_isinf(x) function, which is supposed to test if x is
infinite, does not work reliably on i386 when the x87 registers are used
for floating point. It does work if the SSE2 registers are used instead,
which clang will do if the -msse2 option is used. The downside of course
is that the code will no longer run on CPUs that are old enough to vote.


For this reason, we build i386 with gcc starting with version 10.6. the CFLAGS+=--msse2 was not realiable for all CPU:s since not all i386 CPU:s support SSE2.

We had one report of a user who had SSE2 flag still on (int /etc/make.conf) when building, and got the same problem. [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236025]


This is only a problem on clang because gcc has other options we can use
to force working infinity tests.

You can check whether your CPU supports SSE2 by looking at the Features=
line in /var/run/dmesg.boot. It seems unlikely that it does not, because
SSE2 was introduced in 2000 with the Pentium 4.

--
Andrew (irc:RhodiumToad)

Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:
>  Peter> What the hell does that mean? Does it concern the build process?
>  Peter> Or the operation? Why does it only concern the Clang? And what
>  Peter> is SSE2 concerned with?

> It concerns only overflow checks in floating-point computations.

It seems pretty unlikely that that'd have anything to do with a
bus-error failure, anyway.  But this report contains far too little
information to let anyone do anything but speculate.

            regards, tom lane


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Palle" == Palle Girgensohn <girgen@FreeBSD.org> writes:

 >> Clang's __builtin_isinf(x) function, which is supposed to test if x is
 >> infinite, does not work reliably on i386 when the x87 registers are used
 >> for floating point. It does work if the SSE2 registers are used instead,
 >> which clang will do if the -msse2 option is used. The downside of course
 >> is that the code will no longer run on CPUs that are old enough to vote.

 Palle> For this reason, we build i386 with gcc starting with version
 Palle> 10.6. the CFLAGS+=--msse2 was not realiable for all CPU:s since
 Palle> not all i386 CPU:s support SSE2.

 Palle> We had one report of a user who had SSE2 flag still on (int
 Palle> /etc/make.conf) when building, and got the same problem.
 Palle> [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236025
 Palle> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236025>]

The user in that bug report was using a Pentium 4, which supports SSE2,
so it's not the lack of SSE2 that caused the problem.

Furthermore, the crash was in XLogReadRecord, which does not use floats.

So I'm going to guess that your bug 236025 is actually an alignment
problem, with the compiler making some assumption about alignment that
we're violating. I'll investigate and see what I can find.

-- 
Andrew.


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Ron
Дата:
On 3/7/19 5:30 PM, Andrew Gierth wrote:
>>>>>> "Palle" == Palle Girgensohn <girgen@FreeBSD.org> writes:
>   >> Clang's __builtin_isinf(x) function, which is supposed to test if x is
>   >> infinite, does not work reliably on i386 when the x87 registers are used
>   >> for floating point. It does work if the SSE2 registers are used instead,
>   >> which clang will do if the -msse2 option is used. The downside of course
>   >> is that the code will no longer run on CPUs that are old enough to vote.
>
>   Palle> For this reason, we build i386 with gcc starting with version
>   Palle> 10.6. the CFLAGS+=--msse2 was not realiable for all CPU:s since
>   Palle> not all i386 CPU:s support SSE2.
>
>   Palle> We had one report of a user who had SSE2 flag still on (int
>   Palle> /etc/make.conf) when building, and got the same problem.
>   Palle> [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236025
>   Palle> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236025>]
>
> The user in that bug report was using a Pentium 4, which supports SSE2,
> so it's not the lack of SSE2 that caused the problem.

But if it's compiled for i386 instead of i686?

> Furthermore, the crash was in XLogReadRecord, which does not use floats.
>
> So I'm going to guess that your bug 236025 is actually an alignment
> problem, with the compiler making some assumption about alignment that
> we're violating. I'll investigate and see what I can find.


-- 
Angular momentum makes the world go 'round.


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
Hi Tom, Andrew,

 much thanks for the replies! Alright, lets fill in some concrete
data:

> I'm assuming from the CC that this is on FreeBSD, but on what
> architecture?

When on my evening errands I recognized that I should have mentioned
this - FreeBSD is correct; it is built on amd64 for i386, and run on
i386.

Version: 
  FreeBSD 11.2-RELEASE-p9 #0 r343946M#C51:82
Build-Options:
  OPTIONS_FILE_UNSET+=DEBUG
  OPTIONS_FILE_UNSET+=DOCS
  OPTIONS_FILE_UNSET+=DTRACE
  OPTIONS_FILE_SET+=GSSAPI
  OPTIONS_FILE_SET+=INTDATE
  OPTIONS_FILE_UNSET+=LDAP
  OPTIONS_FILE_SET+=NLS
  OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
  OPTIONS_FILE_UNSET+=PAM
  OPTIONS_FILE_SET+=SSL
  OPTIONS_FILE_SET+=TZDATA
  OPTIONS_FILE_SET+=XML
Extra Compiler-Options:
  -march=pentium3
Init-Options:
  --data-checksums --encoding=utf-8 --lc-collate=de_DE.UTF-8
  --lc-ctype=de_DE.UTF-8 --lc-messages=en_US.UTF-8
  --lc-monetary=en_US.UTF-8 --lc-numeric=en_US.UTF-8
  --lc-time=en_US.UTF-8
Run-Options:
  -w -m fast -o --config_file=/usr/local/etc/postgresql/postgresql.conf

Furthermore, FreeBSD did impose a change for R. 10.6: it forces the
use of gcc on i386 (gcc-8 in this case). Earlier versions were built
with system compiler Clang. The commitlog says this about the matter:

! r484807 | girgen | 2018-11-12 16:54:19 +0100 (Mon, 12 Nov 2018) | 5 lines
!
! Fix build problems on i386
!
! Use GCC seems to be proper way to do it. SSE2 would not be available
! for all CPU:s.

> Did it drop a core file (look in the data dir for postgres.core) and if
> so can you get a backtrace?

Looking... yes, there is a core. Lets grab a first-fault core,
as that one obviousely is from the failed recover:

! (gdb) core postgres.core.1st
! Core was generated by `postgres: bgworker: parallel worker for PID 68755 '.
! Program terminated with signal 10, Bus error.
! Reading symbols from <etc etc>
! #0  0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0  0x0838bdf2 in pg_checksum_page ()
! #1  0x0838a2b8 in PageIsVerified ()
! #2  0x5a824500 in ?? ()
! #3  0x00000000 in ?? ()

The second one looks this way:

! (gdb) core postgres.core 
! Core was generated by `postgres: startup process recovering 000000010000002C000000C6'.
! Program terminated with signal 10, Bus error.
! Reading symbols from <lots of files>
! #0  0x0838bdf2 in pg_checksum_page ()
! (gdb) bt
! #0  0x0838bdf2 in pg_checksum_page ()
! #1  0x0838a2b8 in PageIsVerified ()
! #2  0x59e14500 in ?? ()
! #3  0x00000000 in ?? ()

Anything more I can do here? (Advice on how to build with debugging
support is appreciated.)

> You can check whether your CPU supports SSE2 by looking at the Features=
> line in /var/run/dmesg.boot. It seems unlikely that it does not, because
> SSE2 was introduced in 2000 with the Pentium 4.

No need to check; I am absolutely certain that it does NOT.
https://www.asus.com/supportonly/CUV4X-DLS/HelpDesk_CPU/

But, Your explanation seems not to answer the fundamental question: if 
the database at 10.6 is still supposed to be able to run without SSE2?

> It seems pretty unlikely that that'd have anything to do with a
> bus-error failure, anyway.  But this report contains far too little
> information to let anyone do anything but speculate.

Whateever information You like to have, just ask and I will gladly do
my best to obtain it, as I get around. (This is a reproducible on a 
very well maintained piece of software - this is rather fun.)


Some more experiments & observations:

The crash happens at a specific query - I get parse,bind, but no execute 
timing.
Furthermore, when I try and set

! max_parallel_workers_per_gather = 0     

then the query goes thru and delivers proper results. But then after
few minutes I get this one:

! postgres[71256]: [8-1] :[] LOG: 00000: checkpointer process (PID 71258) 
! was terminated by signal 10: Bus error


Different approach, same result:

! dynamic_shared_memory_type = posix   -> crash immediate
! dynamic_shared_memory_type = sysv    -> crash immediate
! dynamic_shared_memory_type = mmap    -> crash immediate
! dynamic_shared_memory_type = none    -> crash later in checkpointer


regards,
PMc


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

 >> You can check whether your CPU supports SSE2 by looking at the Features=
 >> line in /var/run/dmesg.boot. It seems unlikely that it does not, because
 >> SSE2 was introduced in 2000 with the Pentium 4.

 Peter> No need to check; I am absolutely certain that it does NOT.
 Peter> https://www.asus.com/supportonly/CUV4X-DLS/HelpDesk_CPU/

 Peter> But, Your explanation seems not to answer the fundamental
 Peter> question: if the database at 10.6 is still supposed to be able
 Peter> to run without SSE2?

Yes, the database is supposed to be able to run without SSE2, as long as
it is built with gcc and not clang, and without any architecture flags
that imply SSE2 support.

I'm pretty sure nothing in our buildfarm is i386 without SSE2 though.

 Peter> Whateever information You like to have, just ask and I will
 Peter> gladly do my best to obtain it, as I get around. (This is a
 Peter> reproducible on a very well maintained piece of software - this
 Peter> is rather fun.)

Your backtrace implies that you are running with checksums enabled;
true?

You should be able to build the port with debugging enabled by setting
WITH_DEBUG=1 in the environment or on the make command line. (I have not
yet tried this myself - I rarely build from the port.)

 Peter> The crash happens at a specific query - I get parse,bind, but no
 Peter> execute timing.

What is the exact query?

I'm doing some investigations of my own, I may have more questions
later.

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

 Peter> Looking... yes, there is a core. Lets grab a first-fault core,
 Peter> as that one obviousely is from the failed recover:

 Peter> ! (gdb) core postgres.core.1st
 Peter> ! Core was generated by `postgres: bgworker: parallel worker for PID 68755 '.
 Peter> ! Program terminated with signal 10, Bus error.
 Peter> ! Reading symbols from <etc etc>
 Peter> ! #0  0x0838bdf2 in pg_checksum_page ()
 Peter> ! (gdb) bt
 Peter> ! #0  0x0838bdf2 in pg_checksum_page ()
 Peter> ! #1  0x0838a2b8 in PageIsVerified ()
 Peter> ! #2  0x5a824500 in ?? ()
 Peter> ! #3  0x00000000 in ?? ()

Can you do the command 'info reg' on this core, and also the command
'disass pg_checksum_page'

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
On Fri, Mar 08, 2019 at 02:48:12AM +0000, Andrew Gierth wrote:
! >>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:
! 
!  Peter> Looking... yes, there is a core. Lets grab a first-fault core,
!  Peter> as that one obviousely is from the failed recover:
! 
!  Peter> ! (gdb) core postgres.core.1st
!  Peter> ! Core was generated by `postgres: bgworker: parallel worker for PID 68755 '.
!  Peter> ! Program terminated with signal 10, Bus error.
!  Peter> ! Reading symbols from <etc etc>
!  Peter> ! #0  0x0838bdf2 in pg_checksum_page ()
!  Peter> ! (gdb) bt
!  Peter> ! #0  0x0838bdf2 in pg_checksum_page ()
!  Peter> ! #1  0x0838a2b8 in PageIsVerified ()
!  Peter> ! #2  0x5a824500 in ?? ()
!  Peter> ! #3  0x00000000 in ?? ()
! 
! Can you do the command 'info reg' on this core, and also the command
! 'disass pg_checksum_page'

Okay - made a new one, slightly different backtrace:

#0  0x0838bdf2 in pg_checksum_page ()
(gdb) bt
#0  0x0838bdf2 in pg_checksum_page ()
#1  0x0838a2b8 in PageIsVerified ()
#2  0x5a538500 in ?? ()
#3  0x00000000 in ?? ()

(gdb) info reg
eax            0x7fbfc1ec       2143273452
ecx            0x7f347fc4       2134147012
edx            0x2b5d7d81       727547265
ebx            0x2b5d3ca2       727530658
esp            0x7fbfc15c       0x7fbfc15c
ebp            0x7f344cd0       0x7f344cd0
esi            0x5a538500       1515422976
edi            0x7fbfc1ec       2143273452
eip            0x838bdf2        0x838bdf2
eflags         0x210246 2163270
cs             0x33     51
ss             0x3b     59
ds             0x3b     59
es             0x3b     59
fs             0x3b     59
gs             0x1b     27

Dump of assembler code for function pg_checksum_page:
0x0838ba90 <pg_checksum_page+0>:        push   %ebp
0x0838ba91 <pg_checksum_page+1>:        push   %edi
0x0838ba92 <pg_checksum_page+2>:        push   %esi
0x0838ba93 <pg_checksum_page+3>:        push   %ebx
0x0838ba94 <pg_checksum_page+4>:        xor    %ebx,%ebx
0x0838ba96 <pg_checksum_page+6>:        sub    $0x9c,%esp
0x0838ba9c <pg_checksum_page+12>:       mov    0xb0(%esp),%esi
0x0838baa3 <pg_checksum_page+19>:       movzwl 0x8(%esi),%eax
0x0838baa7 <pg_checksum_page+23>:       movw   $0x0,0x8(%esi)
0x0838baad <pg_checksum_page+29>:       mov    %eax,0x8(%esp)
0x0838bab1 <pg_checksum_page+33>:       mov    0x86122a0(%ebx),%edx
0x0838bab7 <pg_checksum_page+39>:       mov    0x86122a4(%ebx),%ecx
0x0838babd <pg_checksum_page+45>:       mov    0x86122a8(%ebx),%edi
0x0838bac3 <pg_checksum_page+51>:       mov    0x86122ac(%ebx),%ebp
0x0838bac9 <pg_checksum_page+57>:       mov    %edx,0x10(%esp,%ebx,1)
0x0838bacd <pg_checksum_page+61>:       mov    0x86122b0(%ebx),%eax
0x0838bad3 <pg_checksum_page+67>:       mov    %ecx,0x14(%esp,%ebx,1)
0x0838bad7 <pg_checksum_page+71>:       mov    0x86122b4(%ebx),%edx
0x0838badd <pg_checksum_page+77>:       mov    %edi,0x18(%esp,%ebx,1)
0x0838bae1 <pg_checksum_page+81>:       mov    0x86122b8(%ebx),%ecx
0x0838bae7 <pg_checksum_page+87>:       mov    %ebp,0x1c(%esp,%ebx,1)
0x0838baeb <pg_checksum_page+91>:       mov    0x86122bc(%ebx),%edi
0x0838baf1 <pg_checksum_page+97>:       mov    %eax,0x20(%esp,%ebx,1)
0x0838baf5 <pg_checksum_page+101>:      mov    %edx,0x24(%esp,%ebx,1)
0x0838baf9 <pg_checksum_page+105>:      mov    %ecx,0x28(%esp,%ebx,1)
0x0838bafd <pg_checksum_page+109>:      mov    %edi,0x2c(%esp,%ebx,1)
0x0838bb01 <pg_checksum_page+113>:      add    $0x20,%ebx
0x0838bb04 <pg_checksum_page+116>:      cmp    $0x80,%ebx
0x0838bb0a <pg_checksum_page+122>:      jb     0x838bab1 <pg_checksum_page+33>
0x0838bb0c <pg_checksum_page+124>:      lea    0x2000(%esi),%ebx
0x0838bb12 <pg_checksum_page+130>:      mov    %esi,%edx
0x0838bb14 <pg_checksum_page+132>:      mov    %ebx,0xc(%esp)
0x0838bb18 <pg_checksum_page+136>:      lea    0x10(%esp),%ecx
0x0838bb1c <pg_checksum_page+140>:      lea    0x0(%esi),%esi
0x0838bb20 <pg_checksum_page+144>:      xor    %eax,%eax
0x0838bb22 <pg_checksum_page+146>:      mov    (%ecx,%eax,4),%ebp
0x0838bb25 <pg_checksum_page+149>:      mov    (%edx,%eax,4),%edi
0x0838bb28 <pg_checksum_page+152>:      xor    %edi,%ebp
0x0838bb2a <pg_checksum_page+154>:      imul   $0x1000193,%ebp,%ebx
0x0838bb30 <pg_checksum_page+160>:      shr    $0x11,%ebp
0x0838bb33 <pg_checksum_page+163>:      lea    0x1(%eax),%edi
0x0838bb36 <pg_checksum_page+166>:      xor    %ebx,%ebp
0x0838bb38 <pg_checksum_page+168>:      mov    %ebp,(%ecx,%eax,4)
0x0838bb3b <pg_checksum_page+171>:      mov    (%edx,%edi,4),%ebp
0x0838bb3e <pg_checksum_page+174>:      mov    (%ecx,%edi,4),%ebx
0x0838bb41 <pg_checksum_page+177>:      xor    %ebp,%ebx
0x0838bb43 <pg_checksum_page+179>:      imul   $0x1000193,%ebx,%ebp
0x0838bb49 <pg_checksum_page+185>:      shr    $0x11,%ebx
0x0838bb4c <pg_checksum_page+188>:      xor    %ebp,%ebx
0x0838bb4e <pg_checksum_page+190>:      mov    %ebx,(%ecx,%edi,4)
0x0838bb51 <pg_checksum_page+193>:      lea    0x2(%eax),%edi
0x0838bb54 <pg_checksum_page+196>:      mov    (%ecx,%edi,4),%ebx
0x0838bb57 <pg_checksum_page+199>:      mov    (%edx,%edi,4),%ebp
0x0838bb5a <pg_checksum_page+202>:      xor    %ebp,%ebx
0x0838bb5c <pg_checksum_page+204>:      imul   $0x1000193,%ebx,%ebp
0x0838bb62 <pg_checksum_page+210>:      shr    $0x11,%ebx
0x0838bb65 <pg_checksum_page+213>:      xor    %ebp,%ebx
0x0838bb67 <pg_checksum_page+215>:      mov    %ebx,(%ecx,%edi,4)
0x0838bb6a <pg_checksum_page+218>:      lea    0x3(%eax),%edi
0x0838bb6d <pg_checksum_page+221>:      mov    (%ecx,%edi,4),%ebx
0x0838bb70 <pg_checksum_page+224>:      mov    (%edx,%edi,4),%ebp
0x0838bb73 <pg_checksum_page+227>:      xor    %ebp,%ebx
0x0838bb75 <pg_checksum_page+229>:      imul   $0x1000193,%ebx,%ebp
0x0838bb7b <pg_checksum_page+235>:      shr    $0x11,%ebx
0x0838bb7e <pg_checksum_page+238>:      xor    %ebp,%ebx
0x0838bb80 <pg_checksum_page+240>:      mov    %ebx,(%ecx,%edi,4)
0x0838bb83 <pg_checksum_page+243>:      lea    0x4(%eax),%edi
0x0838bb86 <pg_checksum_page+246>:      mov    (%ecx,%edi,4),%ebx
0x0838bb89 <pg_checksum_page+249>:      mov    (%edx,%edi,4),%ebp
0x0838bb8c <pg_checksum_page+252>:      xor    %ebp,%ebx
0x0838bb8e <pg_checksum_page+254>:      imul   $0x1000193,%ebx,%ebp
0x0838bb94 <pg_checksum_page+260>:      shr    $0x11,%ebx
0x0838bb97 <pg_checksum_page+263>:      xor    %ebp,%ebx
0x0838bb99 <pg_checksum_page+265>:      mov    %ebx,(%ecx,%edi,4)
0x0838bb9c <pg_checksum_page+268>:      lea    0x5(%eax),%edi
0x0838bb9f <pg_checksum_page+271>:      mov    (%ecx,%edi,4),%ebx
0x0838bba2 <pg_checksum_page+274>:      mov    (%edx,%edi,4),%ebp
0x0838bba5 <pg_checksum_page+277>:      xor    %ebp,%ebx
0x0838bba7 <pg_checksum_page+279>:      imul   $0x1000193,%ebx,%ebp
0x0838bbad <pg_checksum_page+285>:      shr    $0x11,%ebx
0x0838bbb0 <pg_checksum_page+288>:      xor    %ebp,%ebx
0x0838bbb2 <pg_checksum_page+290>:      mov    %ebx,(%ecx,%edi,4)
0x0838bbb5 <pg_checksum_page+293>:      lea    0x6(%eax),%edi
0x0838bbb8 <pg_checksum_page+296>:      mov    (%ecx,%edi,4),%ebx
0x0838bbbb <pg_checksum_page+299>:      mov    (%edx,%edi,4),%ebp
0x0838bbbe <pg_checksum_page+302>:      xor    %ebp,%ebx
0x0838bbc0 <pg_checksum_page+304>:      imul   $0x1000193,%ebx,%ebp
0x0838bbc6 <pg_checksum_page+310>:      shr    $0x11,%ebx
0x0838bbc9 <pg_checksum_page+313>:      xor    %ebp,%ebx
0x0838bbcb <pg_checksum_page+315>:      mov    %ebx,(%ecx,%edi,4)
0x0838bbce <pg_checksum_page+318>:      lea    0x7(%eax),%edi
0x0838bbd1 <pg_checksum_page+321>:      mov    (%ecx,%edi,4),%ebx
0x0838bbd4 <pg_checksum_page+324>:      mov    (%edx,%edi,4),%ebp
0x0838bbd7 <pg_checksum_page+327>:      xor    %ebp,%ebx
0x0838bbd9 <pg_checksum_page+329>:      imul   $0x1000193,%ebx,%ebp
0x0838bbdf <pg_checksum_page+335>:      shr    $0x11,%ebx
0x0838bbe2 <pg_checksum_page+338>:      xor    %ebp,%ebx
0x0838bbe4 <pg_checksum_page+340>:      add    $0x8,%eax
0x0838bbe7 <pg_checksum_page+343>:      mov    %ebx,(%ecx,%edi,4)
---Type <return> to continue, or q <return> to quit---
0x0838bbea <pg_checksum_page+346>:      cmp    $0x20,%eax
0x0838bbed <pg_checksum_page+349>:      jne    0x838bb22 <pg_checksum_page+146>
0x0838bbf3 <pg_checksum_page+355>:      sub    $0xffffff80,%edx
0x0838bbf6 <pg_checksum_page+358>:      cmp    0xc(%esp),%edx
0x0838bbfa <pg_checksum_page+362>:      jne    0x838bb20 <pg_checksum_page+144>
0x0838bc00 <pg_checksum_page+368>:      mov    %ecx,%eax
0x0838bc02 <pg_checksum_page+370>:      mov    %ecx,%edx
0x0838bc04 <pg_checksum_page+372>:      lea    0x90(%esp),%edi
0x0838bc0b <pg_checksum_page+379>:      mov    (%edx),%ebx
0x0838bc0d <pg_checksum_page+381>:      add    $0x20,%edx
0x0838bc10 <pg_checksum_page+384>:      imul   $0x1000193,%ebx,%ebp
0x0838bc16 <pg_checksum_page+390>:      shr    $0x11,%ebx
0x0838bc19 <pg_checksum_page+393>:      xor    %ebp,%ebx
0x0838bc1b <pg_checksum_page+395>:      mov    %ebx,-0x20(%edx)
0x0838bc1e <pg_checksum_page+398>:      mov    -0x1c(%edx),%ebx
0x0838bc21 <pg_checksum_page+401>:      imul   $0x1000193,%ebx,%ebp
0x0838bc27 <pg_checksum_page+407>:      shr    $0x11,%ebx
0x0838bc2a <pg_checksum_page+410>:      xor    %ebp,%ebx
0x0838bc2c <pg_checksum_page+412>:      mov    %ebx,-0x1c(%edx)
0x0838bc2f <pg_checksum_page+415>:      mov    -0x18(%edx),%ebx
0x0838bc32 <pg_checksum_page+418>:      imul   $0x1000193,%ebx,%ebp
0x0838bc38 <pg_checksum_page+424>:      shr    $0x11,%ebx
0x0838bc3b <pg_checksum_page+427>:      xor    %ebp,%ebx
0x0838bc3d <pg_checksum_page+429>:      mov    %ebx,-0x18(%edx)
0x0838bc40 <pg_checksum_page+432>:      mov    -0x14(%edx),%ebx
0x0838bc43 <pg_checksum_page+435>:      imul   $0x1000193,%ebx,%ebp
0x0838bc49 <pg_checksum_page+441>:      shr    $0x11,%ebx
0x0838bc4c <pg_checksum_page+444>:      xor    %ebp,%ebx
0x0838bc4e <pg_checksum_page+446>:      mov    %ebx,-0x14(%edx)
0x0838bc51 <pg_checksum_page+449>:      mov    -0x10(%edx),%ebx
0x0838bc54 <pg_checksum_page+452>:      imul   $0x1000193,%ebx,%ebp
0x0838bc5a <pg_checksum_page+458>:      shr    $0x11,%ebx
0x0838bc5d <pg_checksum_page+461>:      xor    %ebp,%ebx
0x0838bc5f <pg_checksum_page+463>:      mov    %ebx,-0x10(%edx)
0x0838bc62 <pg_checksum_page+466>:      mov    -0xc(%edx),%ebx
0x0838bc65 <pg_checksum_page+469>:      imul   $0x1000193,%ebx,%ebp
0x0838bc6b <pg_checksum_page+475>:      shr    $0x11,%ebx
0x0838bc6e <pg_checksum_page+478>:      xor    %ebp,%ebx
0x0838bc70 <pg_checksum_page+480>:      mov    %ebx,-0xc(%edx)
0x0838bc73 <pg_checksum_page+483>:      mov    -0x8(%edx),%ebx
0x0838bc76 <pg_checksum_page+486>:      imul   $0x1000193,%ebx,%ebp
0x0838bc7c <pg_checksum_page+492>:      shr    $0x11,%ebx
0x0838bc7f <pg_checksum_page+495>:      xor    %ebp,%ebx
0x0838bc81 <pg_checksum_page+497>:      mov    %ebx,-0x8(%edx)
0x0838bc84 <pg_checksum_page+500>:      mov    -0x4(%edx),%ebx
0x0838bc87 <pg_checksum_page+503>:      imul   $0x1000193,%ebx,%ebp
0x0838bc8d <pg_checksum_page+509>:      shr    $0x11,%ebx
0x0838bc90 <pg_checksum_page+512>:      xor    %ebp,%ebx
0x0838bc92 <pg_checksum_page+514>:      mov    %ebx,-0x4(%edx)
0x0838bc95 <pg_checksum_page+517>:      cmp    %edx,%edi
0x0838bc97 <pg_checksum_page+519>:      jne    0x838bc0b <pg_checksum_page+379>
0x0838bc9d <pg_checksum_page+525>:      mov    %edi,%edx
0x0838bc9f <pg_checksum_page+527>:      sub    %ecx,%edx
0x0838bca1 <pg_checksum_page+529>:      sub    $0x4,%edx
0x0838bca4 <pg_checksum_page+532>:      shr    $0x2,%edx
0x0838bca7 <pg_checksum_page+535>:      inc    %edx
0x0838bca8 <pg_checksum_page+536>:      and    $0x7,%edx
0x0838bcab <pg_checksum_page+539>:      je     0x838bd60 <pg_checksum_page+720>
0x0838bcb1 <pg_checksum_page+545>:      cmp    $0x1,%edx
0x0838bcb4 <pg_checksum_page+548>:      je     0x838bd44 <pg_checksum_page+692>
0x0838bcba <pg_checksum_page+554>:      cmp    $0x2,%edx
0x0838bcbd <pg_checksum_page+557>:      je     0x838bd31 <pg_checksum_page+673>
0x0838bcbf <pg_checksum_page+559>:      cmp    $0x3,%edx
0x0838bcc2 <pg_checksum_page+562>:      je     0x838bd1e <pg_checksum_page+654>
0x0838bcc4 <pg_checksum_page+564>:      cmp    $0x4,%edx
0x0838bcc7 <pg_checksum_page+567>:      je     0x838bd0b <pg_checksum_page+635>
0x0838bcc9 <pg_checksum_page+569>:      cmp    $0x5,%edx
0x0838bccc <pg_checksum_page+572>:      je     0x838bcf8 <pg_checksum_page+616>
0x0838bcce <pg_checksum_page+574>:      cmp    $0x6,%edx
0x0838bcd1 <pg_checksum_page+577>:      je     0x838bce5 <pg_checksum_page+597>
0x0838bcd3 <pg_checksum_page+579>:      mov    (%ecx),%eax
0x0838bcd5 <pg_checksum_page+581>:      imul   $0x1000193,%eax,%ebx
0x0838bcdb <pg_checksum_page+587>:      shr    $0x11,%eax
0x0838bcde <pg_checksum_page+590>:      xor    %ebx,%eax
0x0838bce0 <pg_checksum_page+592>:      mov    %eax,(%ecx)
0x0838bce2 <pg_checksum_page+594>:      lea    0x4(%ecx),%eax
0x0838bce5 <pg_checksum_page+597>:      mov    (%eax),%ebp
0x0838bce7 <pg_checksum_page+599>:      add    $0x4,%eax
0x0838bcea <pg_checksum_page+602>:      imul   $0x1000193,%ebp,%ecx
0x0838bcf0 <pg_checksum_page+608>:      shr    $0x11,%ebp
0x0838bcf3 <pg_checksum_page+611>:      xor    %ecx,%ebp
0x0838bcf5 <pg_checksum_page+613>:      mov    %ebp,-0x4(%eax)
0x0838bcf8 <pg_checksum_page+616>:      mov    (%eax),%edx
0x0838bcfa <pg_checksum_page+618>:      add    $0x4,%eax
0x0838bcfd <pg_checksum_page+621>:      imul   $0x1000193,%edx,%ebx
0x0838bd03 <pg_checksum_page+627>:      shr    $0x11,%edx
0x0838bd06 <pg_checksum_page+630>:      xor    %ebx,%edx
0x0838bd08 <pg_checksum_page+632>:      mov    %edx,-0x4(%eax)
0x0838bd0b <pg_checksum_page+635>:      mov    (%eax),%ebp
0x0838bd0d <pg_checksum_page+637>:      add    $0x4,%eax
0x0838bd10 <pg_checksum_page+640>:      imul   $0x1000193,%ebp,%ecx
0x0838bd16 <pg_checksum_page+646>:      shr    $0x11,%ebp
0x0838bd19 <pg_checksum_page+649>:      xor    %ecx,%ebp
0x0838bd1b <pg_checksum_page+651>:      mov    %ebp,-0x4(%eax)
0x0838bd1e <pg_checksum_page+654>:      mov    (%eax),%edx
0x0838bd20 <pg_checksum_page+656>:      add    $0x4,%eax
0x0838bd23 <pg_checksum_page+659>:      imul   $0x1000193,%edx,%ebx
0x0838bd29 <pg_checksum_page+665>:      shr    $0x11,%edx
0x0838bd2c <pg_checksum_page+668>:      xor    %ebx,%edx
0x0838bd2e <pg_checksum_page+670>:      mov    %edx,-0x4(%eax)
---Type <return> to continue, or q <return> to quit---
0x0838bd31 <pg_checksum_page+673>:      mov    (%eax),%ebp
0x0838bd33 <pg_checksum_page+675>:      add    $0x4,%eax
0x0838bd36 <pg_checksum_page+678>:      imul   $0x1000193,%ebp,%ecx
0x0838bd3c <pg_checksum_page+684>:      shr    $0x11,%ebp
0x0838bd3f <pg_checksum_page+687>:      xor    %ecx,%ebp
0x0838bd41 <pg_checksum_page+689>:      mov    %ebp,-0x4(%eax)
0x0838bd44 <pg_checksum_page+692>:      mov    (%eax),%edx
0x0838bd46 <pg_checksum_page+694>:      add    $0x4,%eax
0x0838bd49 <pg_checksum_page+697>:      imul   $0x1000193,%edx,%ebx
0x0838bd4f <pg_checksum_page+703>:      shr    $0x11,%edx
0x0838bd52 <pg_checksum_page+706>:      xor    %ebx,%edx
0x0838bd54 <pg_checksum_page+708>:      mov    %edx,-0x4(%eax)
0x0838bd57 <pg_checksum_page+711>:      cmp    %edi,%eax
0x0838bd59 <pg_checksum_page+713>:      je     0x838bdf2 <pg_checksum_page+866>
0x0838bd5f <pg_checksum_page+719>:      nop    
0x0838bd60 <pg_checksum_page+720>:      mov    (%eax),%ebp
0x0838bd62 <pg_checksum_page+722>:      add    $0x20,%eax
0x0838bd65 <pg_checksum_page+725>:      mov    -0x1c(%eax),%edx
0x0838bd68 <pg_checksum_page+728>:      imul   $0x1000193,%ebp,%ecx
0x0838bd6e <pg_checksum_page+734>:      imul   $0x1000193,%edx,%ebx
0x0838bd74 <pg_checksum_page+740>:      shr    $0x11,%ebp
0x0838bd77 <pg_checksum_page+743>:      shr    $0x11,%edx
0x0838bd7a <pg_checksum_page+746>:      xor    %ecx,%ebp
0x0838bd7c <pg_checksum_page+748>:      mov    %ebp,-0x20(%eax)
0x0838bd7f <pg_checksum_page+751>:      xor    %ebx,%edx
0x0838bd81 <pg_checksum_page+753>:      mov    -0x18(%eax),%ebp
0x0838bd84 <pg_checksum_page+756>:      mov    %edx,-0x1c(%eax)
0x0838bd87 <pg_checksum_page+759>:      mov    -0x14(%eax),%edx
0x0838bd8a <pg_checksum_page+762>:      imul   $0x1000193,%ebp,%ecx
0x0838bd90 <pg_checksum_page+768>:      imul   $0x1000193,%edx,%ebx
0x0838bd96 <pg_checksum_page+774>:      shr    $0x11,%ebp
0x0838bd99 <pg_checksum_page+777>:      shr    $0x11,%edx
0x0838bd9c <pg_checksum_page+780>:      xor    %ecx,%ebp
0x0838bd9e <pg_checksum_page+782>:      mov    %ebp,-0x18(%eax)
0x0838bda1 <pg_checksum_page+785>:      xor    %ebx,%edx
0x0838bda3 <pg_checksum_page+787>:      mov    -0x10(%eax),%ebp
0x0838bda6 <pg_checksum_page+790>:      mov    %edx,-0x14(%eax)
0x0838bda9 <pg_checksum_page+793>:      mov    -0xc(%eax),%edx
0x0838bdac <pg_checksum_page+796>:      imul   $0x1000193,%ebp,%ecx
0x0838bdb2 <pg_checksum_page+802>:      imul   $0x1000193,%edx,%ebx
0x0838bdb8 <pg_checksum_page+808>:      shr    $0x11,%ebp
0x0838bdbb <pg_checksum_page+811>:      shr    $0x11,%edx
0x0838bdbe <pg_checksum_page+814>:      xor    %ecx,%ebp
0x0838bdc0 <pg_checksum_page+816>:      mov    %ebp,-0x10(%eax)
0x0838bdc3 <pg_checksum_page+819>:      xor    %ebx,%edx
0x0838bdc5 <pg_checksum_page+821>:      mov    -0x8(%eax),%ebp
0x0838bdc8 <pg_checksum_page+824>:      mov    %edx,-0xc(%eax)
0x0838bdcb <pg_checksum_page+827>:      mov    -0x4(%eax),%edx
0x0838bdce <pg_checksum_page+830>:      imul   $0x1000193,%ebp,%ecx
0x0838bdd4 <pg_checksum_page+836>:      imul   $0x1000193,%edx,%ebx
0x0838bdda <pg_checksum_page+842>:      shr    $0x11,%ebp
0x0838bddd <pg_checksum_page+845>:      shr    $0x11,%edx
0x0838bde0 <pg_checksum_page+848>:      xor    %ecx,%ebp
0x0838bde2 <pg_checksum_page+850>:      mov    %ebp,-0x8(%eax)
0x0838bde5 <pg_checksum_page+853>:      xor    %ebx,%edx
0x0838bde7 <pg_checksum_page+855>:      mov    %edx,-0x4(%eax)
0x0838bdea <pg_checksum_page+858>:      cmp    %edi,%eax
0x0838bdec <pg_checksum_page+860>:      jne    0x838bd60 <pg_checksum_page+720>
0x0838bdf2 <pg_checksum_page+866>:      movaps 0x20(%esp),%xmm0
0x0838bdf7 <pg_checksum_page+871>:      mov    $0x80008001,%ebx
0x0838bdfc <pg_checksum_page+876>:      xorps  0x10(%esp),%xmm0
0x0838be01 <pg_checksum_page+881>:      xorps  0x30(%esp),%xmm0
0x0838be06 <pg_checksum_page+886>:      xorps  0x40(%esp),%xmm0
0x0838be0b <pg_checksum_page+891>:      xorps  0x50(%esp),%xmm0
0x0838be10 <pg_checksum_page+896>:      xorps  0x60(%esp),%xmm0
0x0838be15 <pg_checksum_page+901>:      xorps  0x70(%esp),%xmm0
0x0838be1a <pg_checksum_page+906>:      xorps  0x80(%esp),%xmm0
0x0838be22 <pg_checksum_page+914>:      mov    0x8(%esp),%edi
0x0838be26 <pg_checksum_page+918>:      movaps %xmm0,%xmm1
0x0838be29 <pg_checksum_page+921>:      mov    0xb4(%esp),%ebp
0x0838be30 <pg_checksum_page+928>:      shufps $0x55,%xmm0,%xmm1
0x0838be34 <pg_checksum_page+932>:      movaps %xmm0,%xmm2
0x0838be37 <pg_checksum_page+935>:      movss  %xmm1,0x8(%esp)
0x0838be3d <pg_checksum_page+941>:      unpckhps %xmm0,%xmm2
0x0838be40 <pg_checksum_page+944>:      mov    %di,0x8(%esi)
0x0838be44 <pg_checksum_page+948>:      mov    0x8(%esp),%esi
0x0838be48 <pg_checksum_page+952>:      movss  %xmm0,0x8(%esp)
0x0838be4e <pg_checksum_page+958>:      shufps $0xff,%xmm0,%xmm0
0x0838be52 <pg_checksum_page+962>:      mov    0x8(%esp),%eax
0x0838be56 <pg_checksum_page+966>:      movss  %xmm2,0x8(%esp)
0x0838be5c <pg_checksum_page+972>:      mov    0x8(%esp),%ecx
0x0838be60 <pg_checksum_page+976>:      movss  %xmm0,0x8(%esp)
0x0838be66 <pg_checksum_page+982>:      xor    %eax,%esi
0x0838be68 <pg_checksum_page+984>:      xor    %ebp,%esi
0x0838be6a <pg_checksum_page+986>:      mov    0x8(%esp),%edx
0x0838be6e <pg_checksum_page+990>:      add    $0x9c,%esp
0x0838be74 <pg_checksum_page+996>:      xor    %ecx,%esi
0x0838be76 <pg_checksum_page+998>:      xor    %edx,%esi
0x0838be78 <pg_checksum_page+1000>:     mov    %esi,%eax
0x0838be7a <pg_checksum_page+1002>:     mul    %ebx
0x0838be7c <pg_checksum_page+1004>:     pop    %ebx
0x0838be7d <pg_checksum_page+1005>:     shr    $0xf,%edx
0x0838be80 <pg_checksum_page+1008>:     mov    %edx,%edi
0x0838be82 <pg_checksum_page+1010>:     shl    $0x10,%edi
0x0838be85 <pg_checksum_page+1013>:     sub    %edx,%edi
0x0838be87 <pg_checksum_page+1015>:     sub    %edi,%esi
0x0838be89 <pg_checksum_page+1017>:     mov    %esi,%eax
0x0838be8b <pg_checksum_page+1019>:     pop    %esi
0x0838be8c <pg_checksum_page+1020>:     inc    %eax
0x0838be8d <pg_checksum_page+1021>:     pop    %edi
---Type <return> to continue, or q <return> to quit---
0x0838be8e <pg_checksum_page+1022>:     pop    %ebp
0x0838be8f <pg_checksum_page+1023>:     ret    
End of assembler dump.


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
On Fri, Mar 08, 2019 at 02:35:33AM +0000, Andrew Gierth wrote:

! Yes, the database is supposed to be able to run without SSE2, as long as
! it is built with gcc and not clang, and without any architecture flags
! that imply SSE2 support.

Okay, thank You, thats what I was worrying - as some developers make a
strategic decision here.
 
! I'm pretty sure nothing in our buildfarm is i386 without SSE2 though.

*laugh* no problem with that. There probably wouldn't be any reason
to have such. 
Here I have a couple of good reasons: that machine does for what
other people buy a little plastic box from the shelf called internet 
access router - and besides being stupid, these pieces are full of 
bugs and get hacked (I doubt anybody would bother to write a spectre 
exploit for pentium3, although it should be possible).
And on the other side, this is a server board built for 365/24 
running on regECC mem. A new one of that class would inevitably carry 
quite a big Xeon, and so would do nothing than idle here. I see no 
point in such investment, at least not until Intel comes up with a
really nice new design getting rid of the crap.
https://www.techradar.com/news/spoiler-flaw-in-intel-cpus-is-similar-to-spectre-yet-dangerously-different

!  Peter> Whateever information You like to have, just ask and I will
!  Peter> gladly do my best to obtain it, as I get around. (This is a
!  Peter> reproducible on a very well maintained piece of software - this
!  Peter> is rather fun.)
! 
! Your backtrace implies that you are running with checksums enabled;
! true?

Correct.

!  Peter> The crash happens at a specific query - I get parse,bind, but no
!  Peter> execute timing.
! 
! What is the exact query?

This will be some work. It includes about six different sql functions
(and some of these are probably also old enough to vote).

! I'm doing some investigations of my own, I may have more questions
! later.

You're welcome.

rgds,
PMc


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

 Peter> esp            0x7fbfc15c       0x7fbfc15c

And there we go; the stack is misaligned. (only 4 byte alignment where
16 is expected).

 Peter> eip            0x838bdf2        0x838bdf2

 Peter> 0x0838bdf2 <pg_checksum_page+866>:      movaps 0x20(%esp),%xmm0

MOVAPS is an SSE (not SSE2) instruction; it's enabled by virtue of the
fact that you used -march=pentium3 (the pentium3 supports SSE but not
SSE2). The "A" stands for "aligned"; an unaligned source address causes
an exception. %esp+0x20 is not correctly aligned for the instruction.

GCC defaults to using a 16-byte stack alignment, but it relies on the
caller to align the stack too, so if a GCC-compiled function is called
from code that doesn't align the stack, then this kind of error can
result. I do not know offhand (but I plan to find out) what clang's
default stack alignment on i386 is.

You can tell GCC to realign the stack itself using the -mstackrealign
option.

This problem shows up only with GCC and not with clang because clang
does not attempt to use SSE to vectorize this particular piece of code.
The non-vectorized implementation generated by clang has no special
requirements for stack alignment. But at the end of the day this is not
a problem with PostgreSQL - it would show up with any code compiled with
GCC where the compiler had elected to use SSE instructions for
optimization.

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Alvaro Herrera
Дата:
On 2019-Mar-08, Peter wrote:

> On Fri, Mar 08, 2019 at 02:35:33AM +0000, Andrew Gierth wrote:

> ! I'm pretty sure nothing in our buildfarm is i386 without SSE2 though.
> 
> *laugh* no problem with that. There probably wouldn't be any reason
> to have such. 

Actually there *is* a very good reason to have one, which is that we
would have discovered this bug right away.  (Just ask Tom Lane for a
tally of bugs that have been discovered due to his old HPUX 10.20
dinosaur he keeps running just for that purpose).  It seems difficult to
grab hold of such hardware, however.  I don't suppose you have many
spare cycles on that machine of yours to run a buildfarm animal?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Mar-08, Peter wrote:
>> On Fri, Mar 08, 2019 at 02:35:33AM +0000, Andrew Gierth wrote:
>>> ! I'm pretty sure nothing in our buildfarm is i386 without SSE2 though.

>> *laugh* no problem with that. There probably wouldn't be any reason
>> to have such. 

> Actually there *is* a very good reason to have one, which is that we
> would have discovered this bug right away.  (Just ask Tom Lane for a
> tally of bugs that have been discovered due to his old HPUX 10.20
> dinosaur he keeps running just for that purpose).  It seems difficult to
> grab hold of such hardware, however.  I don't suppose you have many
> spare cycles on that machine of yours to run a buildfarm animal?

IIUC, this bug isn't actually down to the old hardware: any SSE-capable
chip ought to exhibit the same problem.  The bug is in the toolchain
somewhere, in that some compiler or run-time infrastructure is failing
to maintain 16-byte stack alignment as required by the ABI.  Or,
possibly, there is disagreement among relevant toolchain elements as
to exactly what ABI they're using.

            regards, tom lane


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> So I'm going to guess that your bug 236025 is actually an
 Andrew> alignment problem, with the compiler making some assumption
 Andrew> about alignment that we're violating. I'll investigate and see
 Andrew> what I can find.

OK, I have completed my analysis of both reports.

The bottom line is that this is a disagreement between gcc and the
(clang-compiled) system libraries over what the stack alignment should
be; GCC wants and assumes 16 byte alignment, but clang won't provide
that. It's not any kind of bug in PostgreSQL.

For most applications there is no issue because GCC aligns the stack
itself on entry into main(), so the only time it becomes an issue is if
two conditions are met: (1) the application must call into an outside
(non-GCC-compiled) library which then calls _back_ into the application,
AND (2) the subsequent code executes instructions that rely on the stack
alignment for correctness (and not just performance).

PostgreSQL compiled by GCC on i386 without architecture options will not
rely on the alignment of the stack so condition (2) is not met. Only if
you specify an architecture such as -march=pentium3 (which enables SSE)
will any instructions be used which require strict alignment.

It may not be obvious how condition (1) is met, but notice that the
report from Peter has the crash happening in either a background worker
or the checkpointer process; this is significant because those are
spawned from postmaster while in a signal handler, and the signal
handler's stack frame has disturbed the stack alignment (and with the
system libraries compiled with clang and not gcc, no attempt is made to
adjust that).

So the implications for the postgresql port on freebsd/i386 are:

1. If you compile with GCC and no architecture options you should have
no problems on any cpu.

This presumably covers the case of the packaged binaries.

2. If you compile with GCC and any of -msse, -msse2, -march=pentium3 or
later, or any similar flag that enables use of SSE or later (I believe
that no MMX instructions require special alignment), then you will also
need -mstackrealign (or patch the source to add the equivalent attribute
to every signal handler function or other callback, which I don't really
recommend). (Maybe the port should add this option defensively?)

The crash in (freebsd) bug #236025 is explained by the fact that the
user had -msse2 set when compiling with GCC. Peter's crash is explained
by the use of -march=pentium3 when compiling with GCC.

3. If you compile with clang and -msse2 then there should be no stack
alignment issues (since clang doesn't assume the stack is aligned) but
obviously you then can't run the binary on a pre-pentium4 cpu.

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Andrew Gierth
Дата:
>>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:

 Peter> Extra Compiler-Options:
 Peter>   -march=pentium3

I CC'd you on my response to Palle, but to spell it out, what you need
to do to fix this is either:

1. Remove the -march=pentium3 option.

2. Add the -mstackrealign option as well.

Either way should work. Could you try (one or both, your preference) and
report back?

-- 
Andrew (irc:RhodiumToad)


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
Hi Andrew,

   many thanks for Your efforts!

Lets see what I get out of this. First, it seems I can reproduce the
fault on my build machine (IvyBridge core-i5) in the i386-chroot as
well - which is not a surprize according to Your explanations.

On Fri, Mar 08, 2019 at 04:51:47PM +0000, Andrew Gierth wrote:

! MOVAPS is an SSE (not SSE2) instruction; it's enabled by virtue of the
! fact that you used -march=pentium3 (the pentium3 supports SSE but not
! SSE2). The "A" stands for "aligned"; an unaligned source address causes
! an exception. %esp+0x20 is not correctly aligned for the instruction.

Okay so far. I was occasionally wondering if that pentium3 option
would effect anything at all. Now we see, it does. ;)

! GCC defaults to using a 16-byte stack alignment, but it relies on the
! caller to align the stack too, so if a GCC-compiled function is called
! from code that doesn't align the stack, then this kind of error can
! result. I do not know offhand (but I plan to find out) what clang's
! default stack alignment on i386 is.

Well, what caused me a headache this evening is: who would be the 
caller in this case, as -from my understanding- it is just postgreSQL 
running?
Now from Your newer mail this riddle does clear up well.

In my build environment, I can now create and start a new db-cluster 
and issue only the single command "CREATE ROLE bacula;" and it will
crash - but then again I have to wait for the next checkpointer.

! You can tell GCC to realign the stack itself using the -mstackrealign
! option.

Yepp, that appears to solve it.

So, as there is a fix now, I'm pondering about who would be the
responsible to apply it?
 * the system owner (alongside with the CPU definition)
 * the port maintainer (alongside with the compiler choice)
 * the postgres configure script

! This problem shows up only with GCC and not with clang because clang
! does not attempt to use SSE to vectorize this particular piece of code.
! The non-vectorized implementation generated by clang has no special
! requirements for stack alignment. But at the end of the day this is not
! a problem with PostgreSQL - it would show up with any code compiled with
! GCC where the compiler had elected to use SSE instructions for
! optimization.

Well, its clearly my fault, coming up with that pentium3 option. *gg*


rgds, P.


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
On Fri, Mar 08, 2019 at 08:22:33PM +0000, Andrew Gierth wrote:
! >>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:
! 
!  Peter> Extra Compiler-Options:
!  Peter>   -march=pentium3
! 
! I CC'd you on my response to Palle, but to spell it out, what you need
! to do to fix this is either:
! 
! 1. Remove the -march=pentium3 option.
! 
! 2. Add the -mstackrealign option as well.

Yes, thank You, this got clear to me in the afternoon.
 
! Either way should work. Could you try (one or both, your preference) and
! report back?

Yes, -mstackrealign helps. So, as my local problem is now solved, 
I am thinking about that "greater good for mankind" thing. ;)

At least one other person did run into the issue, and from Your
explanation I understand that this is not a postgres issue, but 
could happen to any piece of software that decides to use gcc.

Now it seems easy to put this compiler option into the port's
specific makefile for postgreSQL, but it should actually be fixed
systemwide.

But, while it gets recommended to set the cpu-type systemwide on
FreeBSD, it is not recommend for the system owner to set CFLAGS 
systemwide (even less specific ones for gcc - and I didn't find 
a proper way to do that, anyway).

So my proposal is that this one belongs into /usr/ports/Mk/bsd.gcc.mk

+ .if ${MACHINE_CPU:Msse} && ${MACHINE_ARCH} == i386
+ CFLAGS+= -mstackrealign
+ .endif

That seems to work. Palle, Your opinion?

rgds, 
PMc


Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Palle Girgensohn
Дата:


9 mars 2019 kl. 01:17 skrev Peter <pmc@citylink.dinoex.sub.org>:

On Fri, Mar 08, 2019 at 08:22:33PM +0000, Andrew Gierth wrote:
! >>>>> "Peter" == Peter  <pmc@citylink.dinoex.sub.org> writes:
!
!  Peter> Extra Compiler-Options:
!  Peter>   -march=pentium3
!
! I CC'd you on my response to Palle, but to spell it out, what you need
! to do to fix this is either:
!
! 1. Remove the -march=pentium3 option.
!
! 2. Add the -mstackrealign option as well.

Yes, thank You, this got clear to me in the afternoon.

! Either way should work. Could you try (one or both, your preference) and
! report back?

Yes, -mstackrealign helps. So, as my local problem is now solved,
I am thinking about that "greater good for mankind" thing. ;)

At least one other person did run into the issue, and from Your
explanation I understand that this is not a postgres issue, but
could happen to any piece of software that decides to use gcc.

Now it seems easy to put this compiler option into the port's
specific makefile for postgreSQL, but it should actually be fixed
systemwide.

But, while it gets recommended to set the cpu-type systemwide on
FreeBSD, it is not recommend for the system owner to set CFLAGS
systemwide (even less specific ones for gcc - and I didn't find
a proper way to do that, anyway).

So my proposal is that this one belongs into /usr/ports/Mk/bsd.gcc.mk

+ .if ${MACHINE_CPU:Msse} && ${MACHINE_ARCH} == i386
+ CFLAGS+= -mstackrealign
+ .endif

That seems to work. Palle, Your opinion?

Well, we ideally want the binary packages built with the ports framework, for a specific platform, in this case i386, to work on all i386 machines. This makes it a bit more complicated. We would have to check if the user has any of -msse, -msse2, -march=pentium3 or later, or any similar flag that enables use of SSE or later, and if the underlying userland is built with clang, and in that case add -mstackrealign as well.

Perhaps, as Andrew suggests, just adding -mstackrealign defensively in the port for i386 is a reasonable tradeoff? It would not help for other ports though.

I'm cc:ing gerald@FreeBSD.org, maintainer of Mk/bsd.gcc.mk, to see if he has any i thoughts about this. I will forward Andrew's analyze to Gerald as well.

Palle



Re: Upgrade 10.5->10.6 : db crash BUS ERROR (sig 10), reproducible

От
Peter
Дата:
Hi Palle!

On Sat, Mar 09, 2019 at 01:42:55PM +0100, Palle Girgensohn wrote:
! > 9 mars 2019 kl. 01:17 skrev Peter <pmc@citylink.dinoex.sub.org>:
! > Yes, -mstackrealign helps. So, as my local problem is now solved, 
! > I am thinking about that "greater good for mankind" thing. ;)
! > 
! > At least one other person did run into the issue, and from Your
! > explanation I understand that this is not a postgres issue, but 
! > could happen to any piece of software that decides to use gcc.
! > 
! > Now it seems easy to put this compiler option into the port's
! > specific makefile for postgreSQL, but it should actually be fixed
! > systemwide.
! > 
! > But, while it gets recommended to set the cpu-type systemwide on
! > FreeBSD, it is not recommend for the system owner to set CFLAGS 
! > systemwide (even less specific ones for gcc - and I didn't find 
! > a proper way to do that, anyway).
! > 
! > So my proposal is that this one belongs into /usr/ports/Mk/bsd.gcc.mk
! > 
! > + .if ${MACHINE_CPU:Msse} && ${MACHINE_ARCH} == i386
! > + CFLAGS+= -mstackrealign
! > + .endif
! > 
! > That seems to work. Palle, Your opinion?
! 
! Well, we ideally want the binary packages built with the ports
! framework, for a specific platform, in this case i386, to work on all
! i386 machines. This makes it a bit more complicated.

Yes, that was my intention! The MACHINE_CPU variable gets already 
populated with "sse" (from somewhere in /usr/share/mk).

What I usually do, I do only set the proper CPU type in
/etc/make.conf, and then the "-march=pentium3" (or whatever CPU there is)
appears magically in all builds. I didn't configure this!

/etc/make.conf:
! # Maschine ist Pentium-3
! CPUTYPE?=       p3

Nothing more is needed, and now in any ports' directory, we get

! >postgresql10-server$ make -V MACHINE_CPU
! sse i686 mmx i586 i486

Contrarily, if I remove the CPUTYPE from /etc/make.conf, then it says:

! >postgresql10-server$ make -V MACHINE_CPU
! i486

And I suppose these strings appear to be tested for such cases.

! We would have to
! check if the user has any of -msse, -msse2, -march=pentium3 or later,
! or any similar flag that enables use of SSE or later, 

If the user explicitely drop in those options, then it gets indeed
complicated. I don't do that - I do NOT set any -march or whatever
options, I only set CPUTYPE in make.conf, as it gets recommended 
by developers, e.g. here:
https://forums.freebsd.org/threads/whats-in-your-make-conf.36150/post-199595

And that's why I am thinking of a global fix - because this fault is
happening on recommended practices ground!

! and if the
! underlying userland is built with clang

Ouch, that one I did neglect. One could have built the whole system
with gcc, and then no action is needed. Hmm...

! Perhaps, as Andrew suggests, just adding -mstackrealign defensively
! in the port for i386 is a reasonable tradeoff? It would not help for
! other ports though.

That is reasonable, certainly. :)

PMc