Обсуждение: [BUGS] BUG #14897: Segfault on statitics SQL request

Поиск
Список
Период
Сортировка

[BUGS] BUG #14897: Segfault on statitics SQL request

От
vincent.lachenal@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14897
Logged by:          Vincent Lachenal
Email address:      vincent.lachenal@gmail.com
PostgreSQL version: 10.1
Operating system:   Linux (Archlinux)
Description:

Hi,

First of all, thanks for your work. PostgreSQL is my favorite SQL
database.

I contact you because I just upgrade PostgreSQL from 9.6.5 to 10.1.
Migration is OK. My databases seems to work.

But one of the resquest I use causes segmentation fault to server process.

The request is :
SELECTs.protocol,s.mapper,c.method,s.nb_threads,avg(c.client_end - c.client_start) / 1000000 AS total,avg(c.server_end
-c.server_start) / 1000000 AS server,avg(c.server_start - c.client_start) / 1000000 AS
client_to_server,avg(c.client_end- c.server_end) / 1000000 AS server_to_client
 
FROM testsuite s
INNER JOIN testcall c ON s.id = c.test_suite_id
GROUP BY (s.protocol, s.mapper, c.method, s.nb_threads)
ORDER BY s.nb_threads, c.method, s.mapper, s.protocol;


In systemctl logs, I have (sorry it is in french):
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.705 CET [5462] LOG:  processus serveur (PID 5601) a été arrêté par
le signal 11 : Segmentation fault
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.705 CET [5462] DÉTAIL:  Le processus qui a échoué exécutait :
SELECT s.protocol,s.mapper,c.method,s.nb_threads,avg(c.client_end -
c.client_start) / 1000000 AS total,avg(c.server_end - c.server_start) /
1000000 AS server,avg(c.server_start - c.client_start) / 1000000 AS
client_to_server,avg(c.client_end - c.server_end) / 1000000 AS
server_to_client FROM testsuite s INNER JOIN testcall c ON s.id =
c.test_suite_id GROUP BY (s.protocol, s.mapper, c.method, s.nb_threads)
ORDER BY s.nb_threads, c.method, s.mapper, s.protocol;
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.705 CET [5462] LOG:  arrêt des autres processus serveur actifs
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.707 CET [5551] ATTENTION:  arrêt de la connexion à cause de l'arrêt
brutal d'un autre processus serveur
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.707 CET [5551] DÉTAIL:  Le postmaster a commandé à ce processus
serveur d'annuler la transaction
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]:         courante et
de quitter car un autre processus serveur a quitté anormalement
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]:         et qu'il
existe probablement de la mémoire partagée corrompue.
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.707 CET [5551] ASTUCE :  Dans un moment, vous devriez être capable
de vous reconnecter à la base de
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]:         données et de
relancer votre commande.
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.708 CET [5462] LOG:  tous les processus serveur se sont arrêtés,
réinitialisation
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.729 CET [5605] LOG:  le système de bases de données a été
interrompu ; dernier lancement connu à 2017-11-10 19:26:23 CET
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.810 CET [5605] LOG:  le système de bases de données n'a pas été
arrêté proprement ; restauration
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]:         automatique
en cours
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.813 CET [5605] LOG:  la ré-exécution commence à 0/3160FD0
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.813 CET [5605] LOG:  longueur invalide de l'enregistrement à
0/3161008 : voulait 24, a eu 0
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.813 CET [5605] LOG:  ré-exécution faite à 0/3160FD0
nov. 10 19:27:18 daidoji.rokugan.local postgres[5460]: 2017-11-10
19:27:18.830 CET [5462] LOG:  le système de bases de données est prêt pour
accepter les connexions


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
vincent.lachenal@gmail.com writes:
> But one of the resquest I use causes segmentation fault to server process.

> The request is :
> SELECT
>     s.protocol,
>     s.mapper,
>     c.method,
>     s.nb_threads,
>     avg(c.client_end - c.client_start) / 1000000 AS total,
>     avg(c.server_end - c.server_start) / 1000000 AS server,
>     avg(c.server_start - c.client_start) / 1000000 AS client_to_server,
>     avg(c.client_end - c.server_end) / 1000000 AS server_to_client
> FROM testsuite s
> INNER JOIN testcall c ON s.id = c.test_suite_id
> GROUP BY (s.protocol, s.mapper, c.method, s.nb_threads)
> ORDER BY s.nb_threads, c.method, s.mapper, s.protocol;

The query alone doesn't help us much.  Can you put together a
self-contained test case, including table declarations and some
sample data?  Also, are you using any non-default configuration
settings?

Alternatively, if you can get a stack trace from the crash,
that might be enough info to diagnose it (no promises though).

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
Sorry. I didn't find the way to create an attachment in the interface.
You can find the database dump as attachment.

I didn't tweak database. I use default Archlinux configuration (which should be PostgreSQL default configuration).

I will try to get a stack trace.

Regards.

Vincent

Le ven. 10 nov. 2017 à 20:08, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
vincent.lachenal@gmail.com writes:
> But one of the resquest I use causes segmentation fault to server process.

> The request is :
> SELECT
>       s.protocol,
>       s.mapper,
>       c.method,
>       s.nb_threads,
>       avg(c.client_end - c.client_start) / 1000000 AS total,
>       avg(c.server_end - c.server_start) / 1000000 AS server,
>       avg(c.server_start - c.client_start) / 1000000 AS client_to_server,
>       avg(c.client_end - c.server_end) / 1000000 AS server_to_client
> FROM testsuite s
> INNER JOIN testcall c ON s.id = c.test_suite_id
> GROUP BY (s.protocol, s.mapper, c.method, s.nb_threads)
> ORDER BY s.nb_threads, c.method, s.mapper, s.protocol;

The query alone doesn't help us much.  Can you put together a
self-contained test case, including table declarations and some
sample data?  Also, are you using any non-default configuration
settings?

Alternatively, if you can get a stack trace from the crash,
that might be enough info to diagnose it (no promises though).

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane
Вложения

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
Sorry. I didn't find the way to create an attachment in the interface.
You can find the database's dump here : https://github.com/vlachenal/webservices-bench . It is the dump.tar.xz file.

I didn't tweak database. I use default Archlinux configuration (which should be PostgreSQL default configuration).

I will try to get a stack trace.

Regards.

Vincent

Le ven. 10 nov. 2017 à 20:47, Vincent Lachenal <vincent.lachenal@gmail.com> a écrit :
Sorry. I didn't find the way to create an attachment in the interface.
You can find the database dump as attachment.

I didn't tweak database. I use default Archlinux configuration (which should be PostgreSQL default configuration).

I will try to get a stack trace.

Regards.

Vincent

Le ven. 10 nov. 2017 à 20:08, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
vincent.lachenal@gmail.com writes:
> But one of the resquest I use causes segmentation fault to server process.

> The request is :
> SELECT
>       s.protocol,
>       s.mapper,
>       c.method,
>       s.nb_threads,
>       avg(c.client_end - c.client_start) / 1000000 AS total,
>       avg(c.server_end - c.server_start) / 1000000 AS server,
>       avg(c.server_start - c.client_start) / 1000000 AS client_to_server,
>       avg(c.client_end - c.server_end) / 1000000 AS server_to_client
> FROM testsuite s
> INNER JOIN testcall c ON s.id = c.test_suite_id
> GROUP BY (s.protocol, s.mapper, c.method, s.nb_threads)
> ORDER BY s.nb_threads, c.method, s.mapper, s.protocol;

The query alone doesn't help us much.  Can you put together a
self-contained test case, including table declarations and some
sample data?  Also, are you using any non-default configuration
settings?

Alternatively, if you can get a stack trace from the crash,
that might be enough info to diagnose it (no promises though).

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> Sorry. I didn't find the way to create an attachment in the interface.
> You can find the database dump as attachment.

Thanks for sending the data!  However, the given query doesn't crash for
me, so there must be something non-default about your parameter settings.
Could you send the result of 

select name,setting,source from pg_settings where source != 'default';
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
The request returns this result on this database :
select name,setting,source from pg_settings where source != 'default';
           name            |      setting      |        source         
----------------------------+-------------------+----------------------
application_name           | psql              | client
client_encoding            | UTF8              | client
data_checksums             | off               | override
DateStyle                  | ISO, DMY          | configuration file
default_text_search_config | pg_catalog.french | configuration file
dynamic_shared_memory_type | posix             | configuration file
lc_collate                 | fr_FR.utf8        | override
lc_ctype                   | fr_FR.utf8        | override
lc_messages                | fr_FR.utf8        | configuration file
lc_monetary                | fr_FR.utf8        | configuration file
lc_numeric                 | fr_FR.utf8        | configuration file
lc_time                    | fr_FR.utf8        | configuration file
log_timezone               | Europe/Paris      | configuration file
max_connections            | 100               | configuration file
max_stack_depth            | 2048              | environment variable
server_encoding            | UTF8              | override
shared_buffers             | 16384             | configuration file
TimeZone                   | Europe/Paris      | configuration file
transaction_deferrable     | off               | override
transaction_isolation      | read committed    | override
transaction_read_only      | off               | override
wal_buffers                | 512               | override

On postgres database, it returns :
select name,setting,source from pg_settings where source != 'default';
           name            |                setting                 |        source         
----------------------------+----------------------------------------+----------------------
application_name           | psql                                   | client
client_encoding            | UTF8                                   | client
config_file                | /var/lib/postgres/data/postgresql.conf | override
data_checksums             | off                                    | override
data_directory             | /var/lib/postgres/data                 | override
DateStyle                  | ISO, DMY                               | configuration file
default_text_search_config | pg_catalog.french                      | configuration file
dynamic_shared_memory_type | posix                                  | configuration file
hba_file                   | /var/lib/postgres/data/pg_hba.conf     | override
ident_file                 | /var/lib/postgres/data/pg_ident.conf   | override
lc_collate                 | fr_FR.utf8                             | override
lc_ctype                   | fr_FR.utf8                             | override
lc_messages                | fr_FR.utf8                             | configuration file
lc_monetary                | fr_FR.utf8                             | configuration file
lc_numeric                 | fr_FR.utf8                             | configuration file
lc_time                    | fr_FR.utf8                             | configuration file
log_timezone               | Europe/Paris                           | configuration file
max_connections            | 100                                    | configuration file
max_stack_depth            | 2048                                   | environment variable
server_encoding            | UTF8                                   | override
shared_buffers             | 16384                                  | configuration file
TimeZone                   | Europe/Paris                           | configuration file
transaction_deferrable     | off                                    | override
transaction_isolation      | read committed                         | override
transaction_read_only      | off                                    | override
wal_buffers                | 512                                    | override

Regards.

Vincent

Le ven. 10 nov. 2017 à 21:12, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> Sorry. I didn't find the way to create an attachment in the interface.
> You can find the database dump as attachment.

Thanks for sending the data!  However, the given query doesn't crash for
me, so there must be something non-default about your parameter settings.
Could you send the result of

select name,setting,source from pg_settings where source != 'default';

                        regards, tom lane

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> The request returns this result on this database :
> select name,setting,source from pg_settings where source != 'default';

Hmm .... not much help there.  I wonder if this is specific to ArchLinux.
We do have an Arch machine in the buildfarm, but it's an ARM, so that
might not prove much about Arch on other hardware.

One other thing that might be useful: could you send the output of
"pg_config" (that's a command-line program, not a SQL command)?
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
Here is the resut of pg_config:
$ pg_config  
BINDIR = /usr/bin
DOCDIR = /usr/share/doc/postgresql
HTMLDIR = /usr/share/doc/postgresql
INCLUDEDIR = /usr/include
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/man
SHAREDIR = /usr/share/postgresql
SYSCONFDIR = /etc/postgresql
PGXS = /usr/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc' '--with-gssapi' '--with-libxml' '--with-openssl' '--with-pe
rl' '--with-python' 'PYTHON=/usr/bin/python2' '--with-tcl' '--with-pam' '--with-system-tzdata=/usr/share/zoneinfo' '--with-uuid=e2fs' '--enable-nls' '--enable-thread-s
afety' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now' 'CPPFLAGS=-D_FORTI
FY_SOURCE=2'
CC = gcc
CPPFLAGS = -DFRONTEND -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwr
apv -fexcess-precision=standard -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt
CFLAGS_SL = -fPIC
LDFLAGS = -L../../src/common -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed -Wl,-rpath,'/usr/lib',--enable-new-dtags
LDFLAGS_EX =  
LDFLAGS_SL =  
LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm   
VERSION = PostgreSQL 10.1

If it is not help, I will recreate database and inject dump. It could be a migration bug.
Archlinux does not provide debugging symbol package and my other computer is under Gentoo (and it will take a long time to have a stable release). If bug persists, I will open a bug on Archlinux bugtracker.

Regards.

Vincent

Le ven. 10 nov. 2017 à 21:46, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> The request returns this result on this database :
> select name,setting,source from pg_settings where source != 'default';

Hmm .... not much help there.  I wonder if this is specific to ArchLinux.
We do have an Arch machine in the buildfarm, but it's an ARM, so that
might not prove much about Arch on other hardware.

One other thing that might be useful: could you send the output of
"pg_config" (that's a command-line program, not a SQL command)?

                        regards, tom lane

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> Here is the resut of pg_config:

Hmm ... there are some nondefault compiler flags that they're sticking in,
but nothing that isn't used by other distros AFAIK.

> If it is not help, I will recreate database and inject dump. It could be a
> migration bug.

It would certainly be good to see if you can reproduce the crash from a
freshly-loaded copy of that dump file.

> Archlinux does not provide debugging symbol package and my other computer
> is under Gentoo (and it will take a long time to have a stable release). If
> bug persists, I will open a bug on Archlinux bugtracker.

Ugh.  With no symbols we couldn't learn anything very useful from a core
dump.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
Tom,

With a fresh database, bug is still here ...
I will try to compile PostgreSQL on my computer before open a bug on Archlinux (I will also try the testing Gentoo package). If I find something interesting, I will reply in in this thread.

Thanks for your help and your reactivity. The bug seems to be limited to my specific environment which is a good news (Arch has been installed for almost 4 years and is not known for its stablility).

Regards.

Vincent

Le ven. 10 nov. 2017 à 22:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> Here is the resut of pg_config:

Hmm ... there are some nondefault compiler flags that they're sticking in,
but nothing that isn't used by other distros AFAIK.

> If it is not help, I will recreate database and inject dump. It could be a
> migration bug.

It would certainly be good to see if you can reproduce the crash from a
freshly-loaded copy of that dump file.

> Archlinux does not provide debugging symbol package and my other computer
> is under Gentoo (and it will take a long time to have a stable release). If
> bug persists, I will open a bug on Archlinux bugtracker.

Ugh.  With no symbols we couldn't learn anything very useful from a core
dump.

                        regards, tom lane

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Vincent Lachenal <vincent.lachenal@gmail.com> writes:
> With a fresh database, bug is still here ...
> I will try to compile PostgreSQL on my computer before open a bug on
> Archlinux (I will also try the testing Gentoo package). If I find something
> interesting, I will reply in in this thread.

> Thanks for your help and your reactivity. The bug seems to be limited to my
> specific environment which is a good news (Arch has been installed for
> almost 4 years and is not known for its stablility).

FWIW, I tried a build on Fedora 25 using the same compiler flags you
showed.  Works OK there too.  So this suggests possibly an Archlinux-
specific compiler bug, though I wouldn't want to pin blame without
more evidence.

It'd definitely be interesting to see a stack trace, if you manage
to reproduce the problem in a debug-enabled build.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Dmitry Dolgov
Дата:
> On 10 November 2017 at 23:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It'd definitely be interesting to see a stack trace, if you manage
> to reproduce the problem in a debug-enabled build.

Looks like I can reproduce something close to this issue on my Gentoo
installation using the provided dataset, but it looks quite weird for me:

at `numeric.c:4468` we generate `PolyNumAggState *result` as `Int128AggState`

4468    result = makePolyNumAggStateCurrentContext(false);

3972 static Int128AggState *
3973 makeInt128AggStateCurrentContext(bool calcSumX2)
3974 {
3975    Int128AggState *state;
3976
3977    state = (Int128AggState *) palloc0(sizeof(Int128AggState));
3978    state->calcSumX2 = calcSumX2;
3979
3980    return state;
3981 }

And as the result of this function we've got:

>>> p *state
>>> $2 = {
>>>   calcSumX2 = 0 '\000',
>>>   N = 0,
>>>   sumX = 0x00000000000000000000000000000000,
>>>   sumX2 = 0x00000000000000000000000000000000
>>> }

And after that `result->sumX` is passed to `numericvar_to_int128`

4479 #ifdef HAVE_INT128
4480    numericvar_to_int128(&num, &result->sumX);
4481 #else
4482    accum_sum_add(&result->sumX, &num);
4483 #endif

At `numeric.c:6348` we've got a segmentation fault

6348    *result = neg ? -val : val;
>>> p *result
$3 = 0x00000000000000000000000000000000

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> Looks like I can reproduce something close to this issue on my Gentoo
> installation using the provided dataset, but it looks quite weird for me:

Interesting.  I wonder whether __int128 has an alignment requirement that
is more than MAXALIGN.  Intel chips generally don't enforce alignment
requirements, but maybe there's an exception here?

My Fedora box thinks __alignof__(__int128) is 16, which is suspicious,
but it's not crashing.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Andres Freund
Дата:
On 2017-11-10 18:06:55 -0500, Tom Lane wrote:
> Interesting.  I wonder whether __int128 has an alignment requirement that
> is more than MAXALIGN.  Intel chips generally don't enforce alignment
> requirements, but maybe there's an exception here?

As long as no SIMD instructions are used... Which'd be compiler version
and flag specific.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Michael Paquier
Дата:
On Sat, Nov 11, 2017 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
>> Looks like I can reproduce something close to this issue on my Gentoo
>> installation using the provided dataset, but it looks quite weird for me:
>
> Interesting.  I wonder whether __int128 has an alignment requirement that
> is more than MAXALIGN.  Intel chips generally don't enforce alignment
> requirements, but maybe there's an exception here?
>
> My Fedora box thinks __alignof__(__int128) is 16, which is suspicious,
> but it's not crashing.

My laptop uses Arch, and I can see the crash easily when compiling
with gcc 7.2 which is the one bundled in the core package set:
#0  int8_avg_combine (fcinfo=0x55e290767d50) at numeric.c:4355
#1  0x000055e28e571ae3 in advance_combine_function
(pergroupstate=0x55e290764ac0, pertrans=0x55e290767c28,
aggstate=0x55e290756d78) at nodeAgg.c:1264
#2  combine_aggregates (aggstate=0x55e290756d78, pergroup=<optimized
out>) at nodeAgg.c:1198
#3  0x000055e28e5727ad in agg_retrieve_direct
(aggstate=0x55e290756d78) at nodeAgg.c:2438
#4  ExecAgg (pstate=0x55e290756d78) at nodeAgg.c:2155
#5  0x000055e28e5649da in ExecProcNode (node=0x55e290756d78) at
../../../src/include/executor/executor.h:251
#6  ExecutePlan (execute_once=<optimized out>, dest=0x7ff0cec60d98,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT,   use_parallel_mode=<optimized out>, planstate=0x55e290756d78,
estate=0x55e290756b38) at execMain.c:1720
#7  standard_ExecutorRun (queryDesc=0x55e290756728,
direction=<optimized out>, count=0, execute_once=<optimized out>) at
execMain.c:363
#8  0x000055e28e69166d in PortalRunSelect
(portal=portal@entry=0x55e290754718, forward=forward@entry=1 '\001',
count=0, count@entry=9223372036854775807,   dest=dest@entry=0x7ff0cec60d98) at pquery.c:932
#9  0x000055e28e692b4e in PortalRun
(portal=portal@entry=0x55e290754718,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',   run_once=run_once@entry=1 '\001', dest=dest@entry=0x7ff0cec60d98,
altdest=altdest@entry=0x7ff0cec60d98, completionTag=0x7ffd6d2dfd50 "")
at pquery.c:773
#10 0x000055e28e68e882 in exec_simple_query (   query_string=0x55e290686398 "SELECT\n        s.protocol,\n
s.mapper,\n        c.method,\n        s.nb_threads,\n
avg(c.client_end - c.client_start) / 1000000 AS total,\n
avg(c.server_end - c.server_start) / 1000000"...) at postgres.c:1120
#11 0x000055e28e6907f0 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x55e290698dd8, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4139
#12 0x000055e28e3e531c in BackendRun (port=0x55e290690500) at postmaster.c:4364
#13 BackendStartup (port=0x55e290690500) at postmaster.c:4036
#14 ServerLoop () at postmaster.c:1755
#15 0x000055e28e622fe4 in PostmasterMain (argc=3, argv=0x55e290666760)
at postmaster.c:1363
#16 0x000055e28e3e68e8 in main (argc=3, argv=0x55e290666760) at main.c:228
(gdb) p state1->sumX
$1 = 0x00000000000000000000000000000000
(gdb) p state2->sumX
$2 = 0x0000000000000000000000004c170e30
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Dmitry Dolgov
Дата:
> On 11 November 2017 at 09:39, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> My laptop uses Arch, and I can see the crash easily when compiling
> with gcc 7.2 which is the one bundled in the core package set

Yes, I forgot to mention, that I also used quite recent version of gcc (GCC) 8.0.0 20170805 (experimental).


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Vincent Lachenal
Дата:
I retried database migration using another ... and the bug does not happen anymore.
The first time I use "Manual dump and reload" migration method from Archlinux wiki (https://wiki.archlinux.org/index.php/PostgreSQL#Upgrading_PostgreSQL):
# systemctl stop postgresql.service
# mv /var/lib/postgres/data /var/lib/postgres/olddata
# mkdir /var/lib/postgres/data
# chown postgres:postgres /var/lib/postgres/data
[postgres]$ initdb --locale $LANG -E UTF8 -D '/var/lib/postgres/data'
# /opt/pgsql-9.6/bin/pg_ctl -D /var/lib/postgres/olddata/ start
# /opt/pgsql-9.6/bin/pg_dumpall >> old_backup.sql
# /opt/pgsql-9.6/bin/pg_ctl -D /var/lib/postgres/olddata/ stop
# systemctl start postgresql.service
# psql -f old_backup.sql postgres
The second time I use the other way:
# systemctl stop postgresql.service
# mv /var/lib/postgres/data /var/lib/postgres/olddata
# mkdir /var/lib/postgres/data
# chown postgres:postgres /var/lib/postgres/data
[postgres]$ initdb --locale $LANG -E UTF8 -D '/var/lib/postgres/data'
[postgres]$ cd /tmp
[postgres]$ pg_upgrade -b /opt/pgsql-9.6/bin -B /usr/bin -d /var/lib/postgres/olddata -D /var/lib/postgres/data

I have kept the migration data (both of them). So I will compile PostgreSQL with debugging symbol to have a workable stacktrace.

Regards.

Vincent


Le sam. 11 nov. 2017 à 11:34, Dmitry Dolgov <9erthalion6@gmail.com> a écrit :
> On 11 November 2017 at 09:39, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> My laptop uses Arch, and I can see the crash easily when compiling
> with gcc 7.2 which is the one bundled in the core package set

Yes, I forgot to mention, that I also used quite recent version of gcc (GCC) 8.0.0 20170805 (experimental).


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
>> On 11 November 2017 at 09:39, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> My laptop uses Arch, and I can see the crash easily when compiling
>> with gcc 7.2 which is the one bundled in the core package set

> Yes, I forgot to mention, that I also used quite recent version of gcc
> (GCC) 8.0.0 20170805 (experimental).

Would you guys who are seeing the problem note whether the address of
the int128 field is 16-aligned, or only 8-aligned?

Also, it'd be real useful to see some disassembly around the point of
the crash, so that we can check whether the compiler is using SIMD
instructions.  (Or just get the compiler to generate a numeric.s file
with -S.)

I'm pretty suspicious that this is where the problem is, but it would
be good to have confirmation before we put effort into a fix.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Dmitry Dolgov
Дата:
> On 11 November 2017 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Would you guys who are seeing the problem note whether the address of
> the int128 field is 16-aligned, or only 8-aligned?

__alignof__(__int128) returns 16 on my machine.

> Also, it'd be real useful to see some disassembly around the point of
> the crash, so that we can check whether the compiler is using SIMD
> instructions.  (Or just get the compiler to generate a numeric.s file
> with -S.)

Here is the disassembly section I've got in my case:

    0x00000000007fc474 numericvar_to_int128+458 je     0x7fc48c <numericvar_to_int128+482>
    0x00000000007fc476 numericvar_to_int128+460 mov    -0x88(%rbp),%rax
    0x00000000007fc47d numericvar_to_int128+467 movdqa -0x60(%rbp),%xmm0
--> 0x00000000007fc482 numericvar_to_int128+472 movaps %xmm0,(%rax)
    0x00000000007fc485 numericvar_to_int128+475 mov    $0x1,%eax
    0x00000000007fc48a numericvar_to_int128+480 jmp    0x7fc455 <numericvar_to_int128+427>
    0x00000000007fc48c numericvar_to_int128+482 negq   -0x60(%rbp)

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Andres Freund
Дата:
On 2017-11-11 17:52:34 +0100, Dmitry Dolgov wrote:
> > On 11 November 2017 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Would you guys who are seeing the problem note whether the address of
> > the int128 field is 16-aligned, or only 8-aligned?
> 
> __alignof__(__int128) returns 16 on my machine.
> 
> > Also, it'd be real useful to see some disassembly around the point of
> > the crash, so that we can check whether the compiler is using SIMD
> > instructions.  (Or just get the compiler to generate a numeric.s file
> > with -S.)
> 
> Here is the disassembly section I've got in my case:
> 
>     0x00000000007fc474 numericvar_to_int128+458 je     0x7fc48c
> <numericvar_to_int128+482>
>     0x00000000007fc476 numericvar_to_int128+460 mov    -0x88(%rbp),%rax
>     0x00000000007fc47d numericvar_to_int128+467 movdqa -0x60(%rbp),%xmm0
> --> 0x00000000007fc482 numericvar_to_int128+472 movaps %xmm0,(%rax)
>     0x00000000007fc485 numericvar_to_int128+475 mov    $0x1,%eax
>     0x00000000007fc48a numericvar_to_int128+480 jmp    0x7fc455
> <numericvar_to_int128+427>
>     0x00000000007fc48c numericvar_to_int128+482 negq   -0x60(%rbp)

That's using SSE, which requires 16byte alignment IIRC.  I think we need
a function that properly allocate int128 vars with the right alignment -
don't think we want to go for full 16byte alignment for everything.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
>> On 11 November 2017 at 17:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Would you guys who are seeing the problem note whether the address of
>> the int128 field is 16-aligned, or only 8-aligned?

> __alignof__(__int128) returns 16 on my machine.

The question was about the actual value of the pointer being used at
the time of the crash --- %rax in the example below.

> Here is the disassembly section I've got in my case:

>     0x00000000007fc47d numericvar_to_int128+467 movdqa -0x60(%rbp),%xmm0
> --> 0x00000000007fc482 numericvar_to_int128+472 movaps %xmm0,(%rax)

suspicious ...
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> That's using SSE, which requires 16byte alignment IIRC.  I think we need
> a function that properly allocate int128 vars with the right alignment -
> don't think we want to go for full 16byte alignment for everything.

Yeah, changing MAXALIGN is out of the question.  I'm thinking about
another flag bit for MemoryContextAllocExtended.  Do we need to think
about other use-cases besides int128?  Should we just force 16-byte
alignment on all architectures, or does it need to be platform-specific?
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Andres Freund
Дата:
On 2017-11-11 12:41:40 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > That's using SSE, which requires 16byte alignment IIRC.  I think we need
> > a function that properly allocate int128 vars with the right alignment -
> > don't think we want to go for full 16byte alignment for everything.
> 
> Yeah, changing MAXALIGN is out of the question.  I'm thinking about
> another flag bit for MemoryContextAllocExtended.  Do we need to think
> about other use-cases besides int128?  Should we just force 16-byte
> alignment on all architectures, or does it need to be platform-specific?

I'm not sure we want to
a) Rely on one alignment being enough for everybody.
b) Additionally burden already hot code paths with a growing number of  alignment flag tests, and the necessary math.

How about a MemoryContextAllocAligned(context, size, alignto, flags)
that passes on most flags but adds the necessary overhead to size, and
padds the result appropriately?

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-11-11 12:41:40 -0500, Tom Lane wrote:
>> Yeah, changing MAXALIGN is out of the question.  I'm thinking about
>> another flag bit for MemoryContextAllocExtended.  Do we need to think
>> about other use-cases besides int128?  Should we just force 16-byte
>> alignment on all architectures, or does it need to be platform-specific?

> I'm not sure we want to
> a) Rely on one alignment being enough for everybody.
> b) Additionally burden already hot code paths with a growing number of
>    alignment flag tests, and the necessary math.

Well, (a) we could have more flag bits later if there are other use-cases
that require even stricter alignment, and (b) I do not believe that
MemoryContextAllocExtended is a hot code path at present; there are not
enough call sites.

> How about a MemoryContextAllocAligned(context, size, alignto, flags)
> that passes on most flags but adds the necessary overhead to size, and
> padds the result appropriately?

This'd result in hard-wiring the alignment requirement at call sites,
which I think might not be a great idea.  For example, one plausible
future use-case is "align on cacheline boundaries".  I think that would
be better served by a flag like MCXT_ALLOC_ALIGN_CACHELINE than by
having the callers demand a specific numeric alignment value --- it'd
be a lot easier to make the alignment match actual hardware requirements
if it were being inserted at one specific place.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Andres Freund
Дата:
On 2017-11-11 12:54:29 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-11-11 12:41:40 -0500, Tom Lane wrote:
> >> Yeah, changing MAXALIGN is out of the question.  I'm thinking about
> >> another flag bit for MemoryContextAllocExtended.  Do we need to think
> >> about other use-cases besides int128?  Should we just force 16-byte
> >> alignment on all architectures, or does it need to be platform-specific?
> 
> > I'm not sure we want to
> > a) Rely on one alignment being enough for everybody.
> > b) Additionally burden already hot code paths with a growing number of
> >    alignment flag tests, and the necessary math.
> 
> Well, (a) we could have more flag bits later if there are other use-cases
> that require even stricter alignment

Right, that's my point. It's not a scalable interface.


> and (b) I do not believe that MemoryContextAllocExtended is a hot code
> path at present; there are not enough call sites.

It's not that hot, true.


> > How about a MemoryContextAllocAligned(context, size, alignto, flags)
> > that passes on most flags but adds the necessary overhead to size, and
> > padds the result appropriately?
> 
> This'd result in hard-wiring the alignment requirement at call sites,
> which I think might not be a great idea.

Well, it'd require using a macro specifying the alignment (or if we were
using a modern C dialect alignof()), not the alignment spelled out.


> For example, one plausible future use-case is "align on cacheline
> boundaries".

Unless we go to threads that doesn't seem *that* likely, given what
palloc's used for.


> I think that would be better served by a flag like
> MCXT_ALLOC_ALIGN_CACHELINE than by having the callers demand a
> specific numeric alignment value --- it'd be a lot easier to make the
> alignment match actual hardware requirements if it were being inserted
> at one specific place.

Specifying MCXT_ALLOC_ALIGN_CACHELINE rather than CACHELINE_ALIGNMENT
doesn't seem to make it meaningfully harder to adjust. Or are you
thinking of probing the hardware?

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-11-11 12:54:29 -0500, Tom Lane wrote:
>> I think that would be better served by a flag like
>> MCXT_ALLOC_ALIGN_CACHELINE than by having the callers demand a
>> specific numeric alignment value --- it'd be a lot easier to make the
>> alignment match actual hardware requirements if it were being inserted
>> at one specific place.

> Specifying MCXT_ALLOC_ALIGN_CACHELINE rather than CACHELINE_ALIGNMENT
> doesn't seem to make it meaningfully harder to adjust. Or are you
> thinking of probing the hardware?

Yeah, the latter.  The two approaches seem pretty much equivalent if
you're assuming compile-time decisions, but if we ever wanted a run-
time decision, I think having a flag bit that's interpreted inside
MemoryContextAllocExtended would be easier to deal with.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Nov 11, 2017 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My Fedora box thinks __alignof__(__int128) is 16, which is suspicious,
>> but it's not crashing.

> My laptop uses Arch, and I can see the crash easily when compiling
> with gcc 7.2 which is the one bundled in the core package set:

In hopes of reproducing this bug locally, I updated my Fedora laptop
to F26, which has gcc 7.2.1.  But I still can't get it to generate any
SIMD instructions for numeric.c, even after considerable fooling about
with -march and related compiler flags.  I wonder whether Arch is
enabling some bleeding-edge gcc build flags that Red Hat doesn't trust
yet.  It might be interesting to compare outputs of "gcc -v".  Mine is

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr
--mandir=/usr/share/man--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix--enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions--enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only
--with-linker-hash-style=gnu--enable-plugin --enable-initfini-array --with-isl --enable-libmpx
--enable-offload-targets=nvptx-none--without-cuda-driver --enable-gnu-indirect-function --with-tune=generic
--with-arch_32=i686--build=x86_64-redhat-linux 
Thread model: posix
gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Specifying MCXT_ALLOC_ALIGN_CACHELINE rather than CACHELINE_ALIGNMENT
>> doesn't seem to make it meaningfully harder to adjust. Or are you
>> thinking of probing the hardware?

> Yeah, the latter.  The two approaches seem pretty much equivalent if
> you're assuming compile-time decisions, but if we ever wanted a run-
> time decision, I think having a flag bit that's interpreted inside
> MemoryContextAllocExtended would be easier to deal with.

Actually, independently of which way we do that, it seems like we can't
easily hide this inside a palloc wrapper.  If we palloc some extra space
and then adjust the return value to be 16-aligned, what happens when the
caller tries to pfree the block?
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Michael Paquier
Дата:
On Mon, Nov 13, 2017 at 2:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sat, Nov 11, 2017 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My Fedora box thinks __alignof__(__int128) is 16, which is suspicious,
>>> but it's not crashing.
>
>> My laptop uses Arch, and I can see the crash easily when compiling
>> with gcc 7.2 which is the one bundled in the core package set:
>
> In hopes of reproducing this bug locally, I updated my Fedora laptop
> to F26, which has gcc 7.2.1.  But I still can't get it to generate any
> SIMD instructions for numeric.c, even after considerable fooling about
> with -march and related compiler flags.  I wonder whether Arch is
> enabling some bleeding-edge gcc build flags that Red Hat doesn't trust
> yet.  It might be interesting to compare outputs of "gcc -v".  Mine is
>
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=/usr/bin/gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-redhat-linux
> Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto
--prefix=/usr--mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-shared--enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib
--enable-__cxa_atexit--disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only--with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl
--enable-libmpx--enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function
--with-tune=generic--with-arch_32=i686 --build=x86_64-redhat-linux 
> Thread model: posix
> gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)

Here is what my version on Arch tells:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++
--enable-shared --enable-threads=posix --enable-libmpx
--with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu
--disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu
--enable-gnu-indirect-function --disable-multilib --disable-werror
--enable-checking=release --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 7.2.0 (GCC)
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Dmitry Dolgov
Дата:
> On 13 November 2017 at 01:09, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> Here is what my version on Arch tells:
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: /build/gcc/src/gcc/configure --prefix=/usr
> --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
> --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
> --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++
> --enable-shared --enable-threads=posix --enable-libmpx
> --with-system-zlib --with-isl --enable-__cxa_atexit
> --disable-libunwind-exceptions --enable-clocale=gnu
> --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
> --enable-linker-build-id --enable-lto --enable-plugin
> --enable-install-libiberty --with-linker-hash-style=gnu
> --enable-gnu-indirect-function --disable-multilib --disable-werror
> --enable-checking=release --enable-default-pie --enable-default-ssp
> Thread model: posix
> gcc version 7.2.0 (GCC)

I've managed to reproduce this on Ubuntu. I tried multiple gcc versions,
and for gcc-5.4 everything is fine, for gcc-6.4 there is an issue. But I don't
really see any significant difference in the configurations (and even more,
on my Gentoo setup I built gcc manually without anything specific).

$ gcc-5 -v
Using built-in specs.
COLLECT_GCC=gcc-5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
5.5.0-3~16.04.york0' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.1 20171010 (Ubuntu 5.5.0-3~16.04.york0)


$ gcc-6 -v
Using built-in specs.
COLLECT_GCC=gcc-6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
6.4.0-9ubuntu1~16.04.york0'
--with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--with-target-system-zlib --enable-objc-gc=auto --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 6.4.0 20171026 (Ubuntu 6.4.0-9ubuntu1~16.04.york0)

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> I've managed to reproduce this on Ubuntu. I tried multiple gcc versions,
> and for gcc-5.4 everything is fine, for gcc-6.4 there is an issue.

I found out I can reproduce this on Fedora 26 as well; I'd been
fat-fingering the test somehow.

The place where it is actually crashing for me is in int8_avg_combine(),
and *the struct that is misaligned is one that has been passed back from
a worker process*.  This means that the problem is even worse than we
thought before, because there is no reasonable way that the code that
palloc'd that struct within the current process would know that it needs
more than MAXALIGN alignment.  It is not hard to think of other cases
where an int128-containing struct might get passed through code that
doesn't know what is in it.

So what I am now thinking is that the only practical answer is to stop
gcc from believing that it is safe to use 16-aligned instructions on
int128's.  (Some reading on the net suggests that the actual performance
penalty for that is minimal anyway on modern Intel chips.)

It looks like we can do that with very little code impact by attaching
alignment pragmas to the typedefs for [u]int128:

typedef PG_INT128_TYPE int128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(8)
#endif
;
typedef unsigned PG_INT128_TYPE uint128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(8)
#endif
;

When I do that, the coredump goes away for me.  Moreover, this seems
like an easily back-patchable fix, and we do need to do something
back to 9.5.

Now, one small problem is that c.h defines pg_attribute_aligned further
down than where these typedefs currently appear.  This seems to me to
indicate that we ought to rethink the contents, or at least the ordering,
of the "sections" in c.h.  I wonder if anyone has any strong opinions
about exactly how to do that.
        regards, tom lane


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
I wrote:
> So what I am now thinking is that the only practical answer is to stop
> gcc from believing that it is safe to use 16-aligned instructions on
> int128's.  (Some reading on the net suggests that the actual performance
> penalty for that is minimal anyway on modern Intel chips.)

Concretely, the attached patch fixes it for me.  I've verified by
examining the assembly code that this stops gcc from using movdqa or
movaps in numeric.c, except for one place where it apparently can
prove that it's dealing with a sufficiently-aligned local variable.

As I said before, I don't like moving the int128 typedefs into a section
where they don't belong, but that's just cosmetic --- this is good enough
for testing.

BTW, for me, gcc only seems to want to generate 16-aligned instructions
in functions used for parallel aggregation.  That may explain why we've
not seen a report before: the vulnerable code exists in 9.6, but it's not
used by default, since parallelization isn't on by default.

            regards, tom lane

diff --git a/src/include/c.h b/src/include/c.h
index 852551c121..19574a1a16 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -301,17 +301,6 @@ typedef unsigned long long int uint64;
 #define UINT64_FORMAT "%" INT64_MODIFIER "u"

 /*
- * 128-bit signed and unsigned integers
- *        There currently is only a limited support for the type. E.g. 128bit
- *        literals and snprintf are not supported; but math is.
- */
-#if defined(PG_INT128_TYPE)
-#define HAVE_INT128
-typedef PG_INT128_TYPE int128;
-typedef unsigned PG_INT128_TYPE uint128;
-#endif
-
-/*
  * stdint.h limits aren't guaranteed to be present and aren't guaranteed to
  * have compatible types with our fixed width types. So just define our own.
  */
@@ -642,6 +631,25 @@ typedef NameData *Name;
 #define pg_attribute_noreturn()
 #endif

+/*
+ * 128-bit signed and unsigned integers
+ *        There currently is only a limited support for the type. E.g. 128bit
+ *        literals and snprintf are not supported; but math is.
+ */
+#if defined(PG_INT128_TYPE)
+#define HAVE_INT128
+typedef PG_INT128_TYPE int128
+#if defined(pg_attribute_aligned)
+pg_attribute_aligned(MAXIMUM_ALIGNOF)
+#endif
+;
+typedef unsigned PG_INT128_TYPE uint128
+#if defined(pg_attribute_aligned)
+pg_attribute_aligned(MAXIMUM_ALIGNOF)
+#endif
+;
+#endif
+

 /*
  * Forcing a function not to be inlined can be useful if it's the slow path of

Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Michael Paquier
Дата:
On Tue, Nov 14, 2017 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> So what I am now thinking is that the only practical answer is to stop
>> gcc from believing that it is safe to use 16-aligned instructions on
>> int128's.  (Some reading on the net suggests that the actual performance
>> penalty for that is minimal anyway on modern Intel chips.)
>
> Concretely, the attached patch fixes it for me.  I've verified by
> examining the assembly code that this stops gcc from using movdqa or
> movaps in numeric.c, except for one place where it apparently can
> prove that it's dealing with a sufficiently-aligned local variable.

I am not seeing any difference in the assembly code generated by gcc
-S with and without your patch. Perhaps I am missing something? What
are you actually seeing?

> As I said before, I don't like moving the int128 typedefs into a section
> where they don't belong, but that's just cosmetic --- this is good enough
> for testing.numeric.s.HEAD

Section 3 could be moved after the section 4 listing the alignment
macros. It seems that it won't hurt to back-patch the refactoring as
well.
-- 
Michael


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Nov 14, 2017 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Concretely, the attached patch fixes it for me.  I've verified by
>> examining the assembly code that this stops gcc from using movdqa or
>> movaps in numeric.c, except for one place where it apparently can
>> prove that it's dealing with a sufficiently-aligned local variable.

> I am not seeing any difference in the assembly code generated by gcc
> -S with and without your patch. Perhaps I am missing something? What
> are you actually seeing?

Hm, double-check your procedure.  I get this:

$ diff numeric.s.std numeric.s.patch
2746c2746
<       movaps  %xmm0, (%r14)
---
>       movups  %xmm0, (%r14)
13948,13951c13948,13951
<       movdqa  16(%rbp), %xmm0
<       movaps  %xmm0, 16(%rax)
<       movdqa  32(%rbp), %xmm0
<       movaps  %xmm0, 32(%rax)
---
>       movdqu  16(%rbp), %xmm0
>       movups  %xmm0, 16(%rax)
>       movdqu  32(%rbp), %xmm0
>       movups  %xmm0, 32(%rax)
14434,14435c14434,14435
<       movdqa  16(%rbp), %xmm0
<       movaps  %xmm0, 16(%rax)
---
>       movdqu  16(%rbp), %xmm0
>       movups  %xmm0, 16(%rax)

(If using -g, there are a bunch more diffs in the debug tables.)
        regards, tom lane


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Michael Paquier
Дата:
On Tue, Nov 14, 2017 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm, double-check your procedure.  I get this:
>
> $ diff numeric.s.std numeric.s.patch
> 2746c2746
> <       movaps  %xmm0, (%r14)
> ---
>>       movups  %xmm0, (%r14)
> 13948,13951c13948,13951
> <       movdqa  16(%rbp), %xmm0
> <       movaps  %xmm0, 16(%rax)
> <       movdqa  32(%rbp), %xmm0
> <       movaps  %xmm0, 32(%rax)
> ---
>>       movdqu  16(%rbp), %xmm0
>>       movups  %xmm0, 16(%rax)
>>       movdqu  32(%rbp), %xmm0
>>       movups  %xmm0, 32(%rax)
> 14434,14435c14434,14435
> <       movdqa  16(%rbp), %xmm0
> <       movaps  %xmm0, 16(%rax)
> ---
>>       movdqu  16(%rbp), %xmm0
>>       movups  %xmm0, 16(%rax)
>
> (If using -g, there are a bunch more diffs in the debug tables.)

Oh, well. I am not sure what went wrong. But I can see the difference
now, which is what you have:
@@ -2652,7 +2652,7 @@ numericvar_to_int128:    pxor    %xmm0, %xmm0    movl    $1, %ebx    testq    %rdi, %rdi
-    movups    %xmm0, (%r14)
+    movaps    %xmm0, (%r14)    je    .L452    call    pfree@PLT.L452:
@@ -13856,10 +13856,10 @@ numeric_poly_combine:    movq    8(%rbp), %rdx    movq    %r12, CurrentMemoryContext(%rip)
movq   %rdx, 8(%rax)
 
-    movdqu    16(%rbp), %xmm0
-    movups    %xmm0, 16(%rax)
-    movdqu    32(%rbp), %xmm0
-    movups    %xmm0, 32(%rax)
+    movdqa    16(%rbp), %xmm0
+    movaps    %xmm0, 16(%rax)
+    movdqa    32(%rbp), %xmm0
+    movaps    %xmm0, 32(%rax)    jmp    .L2110.L2124:    leaq    .LC0(%rip), %rdi
@@ -14338,8 +14338,8 @@ int8_avg_combine:    movq    8(%rbp), %rdx    movq    %r12, CurrentMemoryContext(%rip)    movq
 %rdx, 8(%rax)
 
-    movdqu    16(%rbp), %xmm0
-    movups    %xmm0, 16(%rax)
+    movdqa    16(%rbp), %xmm0
+    movaps    %xmm0, 16(%rax)    jmp    .L2188.L2202:    leaq    .LC0(%rip), %rdi

The patch addresses the crash here as well by the way.
-- 
Michael


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Andres Freund
Дата:
On 2017-11-14 15:16:34 +0900, Michael Paquier wrote:
> Oh, well. I am not sure what went wrong. But I can see the difference
> now, which is what you have:
> @@ -2652,7 +2652,7 @@ numericvar_to_int128:
>      pxor    %xmm0, %xmm0
>      movl    $1, %ebx
>      testq    %rdi, %rdi
> -    movups    %xmm0, (%r14)
> +    movaps    %xmm0, (%r14)
>      je    .L452
>      call    pfree@PLT
>  .L452:
> @@ -13856,10 +13856,10 @@ numeric_poly_combine:
>      movq    8(%rbp), %rdx
>      movq    %r12, CurrentMemoryContext(%rip)
>      movq    %rdx, 8(%rax)
> -    movdqu    16(%rbp), %xmm0
> -    movups    %xmm0, 16(%rax)
> -    movdqu    32(%rbp), %xmm0
> -    movups    %xmm0, 32(%rax)
> +    movdqa    16(%rbp), %xmm0
> +    movaps    %xmm0, 16(%rax)
> +    movdqa    32(%rbp), %xmm0
> +    movaps    %xmm0, 32(%rax)
>      jmp    .L2110
>  .L2124:
>      leaq    .LC0(%rip), %rdi
> @@ -14338,8 +14338,8 @@ int8_avg_combine:
>      movq    8(%rbp), %rdx
>      movq    %r12, CurrentMemoryContext(%rip)
>      movq    %rdx, 8(%rax)
> -    movdqu    16(%rbp), %xmm0
> -    movups    %xmm0, 16(%rax)
> +    movdqa    16(%rbp), %xmm0
> +    movaps    %xmm0, 16(%rax)

Just to be sure: This is the diff you're getting when you *back out* the
fix, right? Because movdqa etc are the alignment requiring instructions,
whereas movdqu is unaligned...

Greetings,

Andres Freund


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Michael Paquier
Дата:
On Tue, Nov 14, 2017 at 3:22 PM, Andres Freund <andres@anarazel.de> wrote:
> Just to be sure: This is the diff you're getting when you *back out* the
> fix, right? Because movdqa etc are the alignment requiring instructions,
> whereas movdqu is unaligned...

Yep :)
-- 
Michael


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Nov 14, 2017 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As I said before, I don't like moving the int128 typedefs into a section
>> where they don't belong, but that's just cosmetic --- this is good enough
>> for testing.

> Section 3 could be moved after the section 4 listing the alignment
> macros. It seems that it won't hurt to back-patch the refactoring as
> well.

After some further consideration of what's where in c.h, I propose that
we redefine section 1 (hacks to cope with non-ANSI C compilers) as
"compiler characteristics", and then move all of these items into it:

"#ifdef PG_FORCE_DISABLE_INLINE" stanza

all the pg_attribute_xxx() macros, also pg_noinline

PG_USED_FOR_ASSERTS_ONLY

maybe pg_unreachable(), although it's not insane to keep it in section 7

ditto for likely()/unlikely()

If anyone's really hot to keep the "non-ANSI hacks" segregated from those,
we could invent a "section 1.1" for that ... but I'd say that at least one
of the stanzas that are in there now is unrelated to ANSI compatibility
already.

Having gotten pg_attribute_aligned in front of the int128 stanza,
my vision for a full fix for the bug is:

* Explicitly document somewhere that MAXALIGN ignores int128.

* Have configure test for and define ALIGNOF_PG_INT128_TYPE
if it defines PG_INT128_TYPE.

* Set up the int128 stanza like this:

#if defined(PG_INT128_TYPE)
#if defined(pg_attribute_aligned) || ALIGNOF_PG_INT128_TYPE <= MAXIMUM_ALIGNOF
#define HAVE_INT128 1
typedef PG_INT128_TYPE int128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;
typedef unsigned PG_INT128_TYPE uint128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;
#endif
#endif

That is, even if configure finds an int128 type, we'll only use it if
it fits or can be made to fit into our system-wide MAXALIGN assumption.

* It'd be kind of nice to insert aStaticAssertStmt(alignof(int128) <= MAXIMUM_ALIGNOF)
someplace to cross-check this, but as far as I can find we aren't
relying on alignof() to exist, so I don't see a good way to do it.


BTW, looking at the existing uses of pg_attribute_aligned
along with this example, I can't help but think that
this decision:

/** NB: aligned and packed are not given default definitions because they* affect code functionality; they *must* be
implementedby the compiler* if they are to be used.*/
 

was just useless pedantry.  If we're going to ifdef around it everywhere
the macro is of use, why not just #define it to empty?  That would
certainly make the above a lot easier to read.  We could add a
HAVE_PG_ATTRIBUTE_ALIGNED symbol for use in #if tests.

Barring objections, I'll get on with making this happen.
        regards, tom lane


Re: [BUGS] BUG #14897: Segfault on statitics SQL request

От
Tom Lane
Дата:
I wrote:
> That is, even if configure finds an int128 type, we'll only use it if
> it fits or can be made to fit into our system-wide MAXALIGN assumption.
> ...
> Barring objections, I'll get on with making this happen.

I've pushed fixes along this line into the affected branches.

Anybody who's looking for a quick-hack fix might be better advised
to use the short patch I posted upthread, though.
        regards, tom lane