Обсуждение: Old Postgresql version on i7-1165g7

Поиск
Список
Период
Сортировка

Old Postgresql version on i7-1165g7

От
Yura Sokolov
Дата:
Good day, hackers.

I've got HP ProBook 640g8 with i7-1165g7. I've installed Ubuntu 20.04 
LTS on it
and started to play with PostgreSQL sources.

Occasinally I found I'm not able to `make check` old Postgresql 
versions.
At least 9.6 and 10. They are failed at the initdb stage in the call to 
postgresql.

Raw postgresql version 9.6.8 and 10.0 fails in boostrap stage:

     running bootstrap script ... 2021-04-09 12:33:26.424 MSK [161121] 
FATAL:  could not find tuple for opclass 1
     2021-04-09 12:33:26.424 MSK [161121] PANIC:  cannot abort 
transaction 1, it was already committed
     Aborted (core dumped)
     child process exited with exit code 134

Our modified custom version 9.6 fails inside of libc __strncmp_avx2 
during post-bootstrap
with segmentation fault:

     Program terminated with signal SIGSEGV, Segmentation fault.
     #0  __strncmp_avx2 ()
     #1  0x0000557168a7eeda in nameeq
     #2  0x0000557168b4c4a0 in FunctionCall2Coll
     #3  0x0000557168659555 in heapgettup_pagemode
     #4  0x000055716865a617 in heap_getnext
     #5  0x0000557168678cf1 in systable_getnext
     #6  0x0000557168b5651c in GetDatabaseTuple
     #7  0x0000557168b574a4 in InitPostgres
     #8  0x00005571689dcb7d in PostgresMain
     #9  0x00005571688844d5 in main

I've bisected between REL_11_0 and "Rename pg_rewind's 
copy_file_range()" and
found 372728b0d49552641f0ea83d9d2e08817de038fa
> Replace our traditional initial-catalog-data format with a better 
> design.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=372728b0d49552641f0ea83d9d2e08817de038fa

This is first commit where `make check` doesn't fail during initdb on my 
machine.
Therefore 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 is the last one where 
`make check` fails.

I've tried with gcc9, gcc10 and clang10.
I've configured either without parameters or with `CFLAGS=-O0 
./configure --enable-debug`.

Thing doesn't happen on Intel CPU of 10th series (i7-10510U and 
i9-10900K).
Unfortunately, I have no fellows or colleagues with Intel CPU  11 
series,
therefore I couldn't tell if this bug of 11 series or bug of concrete 
CPU installed
in the notebook.

It will be great if some with i7-11* could try to make check and report
if it also fails or not.

With regards,
Yura Sokolov
PostgresPro



Re: Old Postgresql version on i7-1165g7

От
Yura Sokolov
Дата:
Yura Sokolov писал 2021-04-09 16:28:
> Good day, hackers.
> 
> I've got HP ProBook 640g8 with i7-1165g7. I've installed Ubuntu 20.04 
> LTS on it
> and started to play with PostgreSQL sources.
> 
> Occasinally I found I'm not able to `make check` old Postgresql 
> versions.
> At least 9.6 and 10. They are failed at the initdb stage in the call
> to postgresql.
> 
> Raw postgresql version 9.6.8 and 10.0 fails in boostrap stage:
> 
>     running bootstrap script ... 2021-04-09 12:33:26.424 MSK [161121]
> FATAL:  could not find tuple for opclass 1
>     2021-04-09 12:33:26.424 MSK [161121] PANIC:  cannot abort
> transaction 1, it was already committed
>     Aborted (core dumped)
>     child process exited with exit code 134
> 
> Our modified custom version 9.6 fails inside of libc __strncmp_avx2
> during post-bootstrap
> with segmentation fault:
> 
>     Program terminated with signal SIGSEGV, Segmentation fault.
>     #0  __strncmp_avx2 ()
>     #1  0x0000557168a7eeda in nameeq
>     #2  0x0000557168b4c4a0 in FunctionCall2Coll
>     #3  0x0000557168659555 in heapgettup_pagemode
>     #4  0x000055716865a617 in heap_getnext
>     #5  0x0000557168678cf1 in systable_getnext
>     #6  0x0000557168b5651c in GetDatabaseTuple
>     #7  0x0000557168b574a4 in InitPostgres
>     #8  0x00005571689dcb7d in PostgresMain
>     #9  0x00005571688844d5 in main
> 
> I've bisected between REL_11_0 and "Rename pg_rewind's 
> copy_file_range()" and
> found 372728b0d49552641f0ea83d9d2e08817de038fa
>> Replace our traditional initial-catalog-data format with a better 
>> design.
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=372728b0d49552641f0ea83d9d2e08817de038fa
> 
> This is first commit where `make check` doesn't fail during initdb on
> my machine.
> Therefore 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 is the last one
> where `make check` fails.
> 
> I've tried with gcc9, gcc10 and clang10.
> I've configured either without parameters or with `CFLAGS=-O0
> ./configure --enable-debug`.
> 
> Thing doesn't happen on Intel CPU of 10th series (i7-10510U and 
> i9-10900K).
> Unfortunately, I have no fellows or colleagues with Intel CPU  11 
> series,
> therefore I couldn't tell if this bug of 11 series or bug of concrete
> CPU installed
> in the notebook.
> 
> It will be great if some with i7-11* could try to make check and report
> if it also fails or not.

BTW, problem remains in Debian stable (10.4) inside docker on same 
machine.

> 
> With regards,
> Yura Sokolov
> PostgresPro



Re: Old Postgresql version on i7-1165g7

От
Justin Pryzby
Дата:
On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote:
> Good day, hackers.
> 
> I've got HP ProBook 640g8 with i7-1165g7. I've installed Ubuntu 20.04 LTS on
> it
> and started to play with PostgreSQL sources.
> 
> Occasinally I found I'm not able to `make check` old Postgresql versions.

Do you mean that HEAD works consistently, but v9.6 and v10 sometimes work but
sometimes fail ?

>     #5  0x0000557168678cf1 in systable_getnext
>     #6  0x0000557168b5651c in GetDatabaseTuple
>     #7  0x0000557168b574a4 in InitPostgres
>     #8  0x00005571689dcb7d in PostgresMain
>     #9  0x00005571688844d5 in main
> 
> I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()"
> and
> found 372728b0d49552641f0ea83d9d2e08817de038fa
> > Replace our traditional initial-catalog-data format with a better
> > design.
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=372728b0d49552641f0ea83d9d2e08817de038fa
> 
> This is first commit where `make check` doesn't fail during initdb on my
> machine. Therefore 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 is the last one where
> `make check` fails.

This doesn't make much sense or help much, since 372728b doesn't actually
change the catalogs, or any .c file.

> I've tried with gcc9, gcc10 and clang10.
> I've configured either without parameters or with `CFLAGS=-O0 ./configure
> --enable-debug`.

You used make clean too, right ?

I would also use --with-cassert, since it might catch problems you'd otherwise
miss.

If that doesn't expose anything, maybe try to #define USE_VALGRIND in
src/include/pg_config_manual.h, and run with valgrind --trace-children=yes

-- 
Justin



Re: Old Postgresql version on i7-1165g7

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote:
>> Occasinally I found I'm not able to `make check` old Postgresql versions.

>> I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()"
>> and
>> found 372728b0d49552641f0ea83d9d2e08817de038fa
>>> Replace our traditional initial-catalog-data format with a better
>>> design.
>> This is first commit where `make check` doesn't fail during initdb on my
>> machine.

> This doesn't make much sense or help much, since 372728b doesn't actually
> change the catalogs, or any .c file.

It could make sense if some part of the toolchain that was previously
used to generate postgres.bki doesn't work right on that machine.
Overall though I'd have thought that 372728b would increase not
decrease our toolchain footprint.  It also seems unlikely that a
recent Ubuntu release would contain toolchain bugs that we hadn't
already heard about.

> You used make clean too, right ?

Really, when bisecting, you need to use "make distclean" or even
"git clean -dfx" between steps, or you may get bogus results,
because our makefiles aren't that great about tracking dependencies,
especially when you move backwards in the history.

So perhaps a more plausible theory is that this bisection result
is wrong because you weren't careful enough.

            regards, tom lane



Re: Old Postgresql version on i7-1165g7

От
Yura Sokolov
Дата:
Tom Lane писал 2021-04-13 17:45:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote:
>>> Occasinally I found I'm not able to `make check` old Postgresql 
>>> versions.
> 
>>> I've bisected between REL_11_0 and "Rename pg_rewind's 
>>> copy_file_range()"
>>> and
>>> found 372728b0d49552641f0ea83d9d2e08817de038fa
>>>> Replace our traditional initial-catalog-data format with a better
>>>> design.
>>> This is first commit where `make check` doesn't fail during initdb on 
>>> my
>>> machine.
> 
>> This doesn't make much sense or help much, since 372728b doesn't 
>> actually
>> change the catalogs, or any .c file.
> 
> It could make sense if some part of the toolchain that was previously
> used to generate postgres.bki doesn't work right on that machine.
> Overall though I'd have thought that 372728b would increase not
> decrease our toolchain footprint.  It also seems unlikely that a
> recent Ubuntu release would contain toolchain bugs that we hadn't
> already heard about.
> 
>> You used make clean too, right ?
> 
> Really, when bisecting, you need to use "make distclean" or even
> "git clean -dfx" between steps, or you may get bogus results,
> because our makefiles aren't that great about tracking dependencies,
> especially when you move backwards in the history.
> 
> So perhaps a more plausible theory is that this bisection result
> is wrong because you weren't careful enough.
> 
>             regards, tom lane

Sorry for missing mail for a week.

I believe I cleaned before each step since I'm building in external 
directory
and cleanup is just `rm * -r`.

But I'll repeat bisecting tomorrow to be sure.

I don't think it is really PostgreSQL or toolchain bug. I believe it is 
some
corner case that were changed in new Intel CPU.

With regards,
Yura Sokolov.



Re: Old Postgresql version on i7-1165g7

От
Yura Sokolov
Дата:
Yura Sokolov писал 2021-04-18 23:29:
> Tom Lane писал 2021-04-13 17:45:
>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>> On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote:
>>>> Occasinally I found I'm not able to `make check` old Postgresql 
>>>> versions.
>> 
>>>> I've bisected between REL_11_0 and "Rename pg_rewind's 
>>>> copy_file_range()"
>>>> and
>>>> found 372728b0d49552641f0ea83d9d2e08817de038fa
>>>>> Replace our traditional initial-catalog-data format with a better
>>>>> design.
>>>> This is first commit where `make check` doesn't fail during initdb 
>>>> on my
>>>> machine.
>> 
>>> This doesn't make much sense or help much, since 372728b doesn't 
>>> actually
>>> change the catalogs, or any .c file.
>> 
>> It could make sense if some part of the toolchain that was previously
>> used to generate postgres.bki doesn't work right on that machine.
>> Overall though I'd have thought that 372728b would increase not
>> decrease our toolchain footprint.  It also seems unlikely that a
>> recent Ubuntu release would contain toolchain bugs that we hadn't
>> already heard about.
>> 
>>> You used make clean too, right ?
>> 
>> Really, when bisecting, you need to use "make distclean" or even
>> "git clean -dfx" between steps, or you may get bogus results,
>> because our makefiles aren't that great about tracking dependencies,
>> especially when you move backwards in the history.

Yep, "git clean -dfx" did the job. "make distclean" didn't, btw.
I've had "src/backend/catalog/schemapg.h" file in source tree
generated with "make submake-generated-headers" on REL_13_0.
It were not shown with "git status", therefore I didn't notice its
existence. It were not deleted neither with "make distclean", nor with
"git clean -dx" I tried before. Only "git clean -dfx" deletes it.

Thank you for the suggestion, Tom. You've saved my sanity.

Regards,
Yura Sokolov.