Обсуждение: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

Поиск

Список

Период

Сортировка

[BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Stefan Tzeggai

Дата:

25 октября 2017 г., 18:16:39

Hi

I can reproduce a segfault by executing a query.

I run Postgresql 10.0-1.pgdg16.04+1 on Ubuntu 16.04.3

The machine has hyperthreading enabled and 48 virtual cores: 2xE5-2690v3

I have a materialized view:

refresh materialized view concurrently ;
--works

results.as_20171025_20170930_ut78777;
--works

set max_parallel_workers_per_gather to 0;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN

(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--works: 129587

set max_parallel_workers_per_gather to 3;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN

(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--works: 129587

set max_parallel_workers_per_gather to 4;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN

(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--SEGFAULT!

set max_parallel_workers_per_gather to 4;
explain SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT
WHERE ((((oart_zwangsversteigerung_janein IS NULL)) AND (oadr_gkz IN

(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR (((oart_zwangsversteigerung_janein IS
NULL)) AND (oadr_gkz IN (2000000))))

"Finalize Aggregate  (cost=186411.37..186411.38 rows=1 width=8)"
"  ->  Gather  (cost=186410.95..186411.36 rows=4 width=8)"
"        Workers Planned: 4"
"        ->  Partial Aggregate  (cost=185410.95..185410.96 rows=1 width=8)"
"              ->  Parallel Bitmap Heap Scan on
as_20171025_20170930_ut78777 rt  (cost=12058.69..185353.14 rows=23121
width=0)"
"                    Recheck Cond: (((oadr_gkz = ANY

('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))"
"                    Filter: ((oart_zwangsversteigerung_janein IS NULL)
AND (((oadr_gkz = ANY

('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
AND (objekttyp_grob = 1 (...)"
"                    ->  BitmapOr  (cost=12058.69..12058.69 rows=94046
width=0)"
"                          ->  BitmapAnd  (cost=11726.20..11726.20
rows=76321 width=0)"
"                                ->  Bitmap Index Scan on
as_20171025_20170930_ut78777_oadr_gkz_wnnidx  (cost=0.00..3129.41
rows=185997 width=0)"
"                                      Index Cond: (oadr_gkz = ANY

('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))"
"                                ->  Bitmap Index Scan on
as_20171025_20170930_ut78777_objekttyp_grob_idx  (cost=0.00..8550.30
rows=491449 width=0)"
"                                      Index Cond: (objekttyp_grob = 1)"
"                          ->  Bitmap Index Scan on
as_20171025_20170930_ut78777_oadr_gkz_wnnidx  (cost=0.00..309.37
rows=17726 width=0)"
"                                Index Cond: (oadr_gkz = 2000000)"


And the postgresql-10.log says:

>2017-10-25 13:45:35.149 CEST [6345] LOG:  Serverprozess (PID 25637)
wurde von Signal 11 beendet: Segmentation fault
>2017-10-25 13:45:35.149 CEST [6345] DETAIL:  Der fehlgeschlagene
Prozess führte aus:
...
>2017-10-25 13:42:14.332 CEST [25629] LOG:  Redo beginnt bei 108/449A9D98
>2017-10-25 13:42:14.396 CEST [25629] LOG:  unerwartete Pageaddr
107/6F8CC000 in Logsegment 000000010000010800000045, Offset 9224192
>2017-10-25 13:42:14.396 CEST [25629] LOG:  Redo fertig bei 108/458CA968

I upgraded Postgresql using pg_upgrade with hard links a few days ago.
This view has not been upgraded from PG9.6 to 10, but has been created
freshly on PG10 this morning.

Other related settings in postgresql.conf are:
>max_worker_processes = 12
>max_parallel_workers_per_gather = 4
>max_parallel_workers = 12

So what I fugured out is that it only crashed when I increase
max_parallel_workers_per_gather to more than 3.

Probably I missunderstood some of the max_parallel_-Setting and i do
bogus, but the Database should probably not segfault...

How I can I help you with more information?

Steve




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Tom Lane

Дата:

25 октября 2017 г., 20:34:35

Stefan Tzeggai <tzeggai@empirica-systeme.de> writes:
> I can reproduce a segfault by executing a query.

That sounds like a bug, all right, but you've not provided enough
detail for anyone else to reproduce it.  A self-contained test
case would be the best thing.

If you can't provide that, it's possible that a stack trace from
the core dump would be enough info to diagnose the problem, but
no promises ...

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Stefan Tzeggai

Дата:

26 октября 2017 г., 02:33:13

Hi

To be precise I can only reproduce the bug about 20% of the times I
execute the query. I have to run the query four, five times, it it
crashes. Reproduced that many times.
I have a feeling that it hast todo with the number of parallel workers
that the planner starts. I found no way to force it to any number.

I have seen this segfault on at least two mashines (running the same
application with same data). Have not seen it since I lowered
max_parallel_workers_per_gather to 2.


I tried to generate a table+matview+indexes etc. to reproduce the crash
from scratch, but i had no success so far.


I also tried to get a sensible stack trace. I attached 9 gdb to all
postgres-pids and when I triggered the crash, two of the gdb had some
output and produced something on 'bt'. Attached..


If I would be able to dump the relevant data from my db and I would be
able to reproduce the crash with it on a fresh PG10 install - Would
anyone have time to look at it? I guess its would no more than 50Mb...

I am happy to help as good as i can,

Steve



Program received signal SIGUSR1, User defined signal 1.
0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
84    in ../sysdeps/unix/syscall-template.S
Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
84    in ../sysdeps/unix/syscall-template.S
#0  0x00007f12334039b3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005564bcaccd01 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffce2d47e90, cur_timeout=200, set=0x5564beab53a8) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x5564beab53a8,
timeout=timeout@entry=200,
occurred_events=occurred_events@entry=0x7ffce2d47e90,
nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=83886093)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1000
#3  0x00005564bcacd174 in WaitLatchOrSocket (latch=0x7f1227241be4,
wakeEvents=wakeEvents@entry=25, sock=sock@entry=-1, timeout=200,
wait_event_info=wait_event_info@entry=83886093) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:385
#4  0x00005564bcacd225 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=25, timeout=<optimized out>,
wait_event_info=wait_event_info@entry=83886093) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:339
#5  0x00005564bca8193f in WalWriterMain () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/walwriter.c:293
#6  0x00005564bc8c0401 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffce2d48070) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/bootstrap/bootstrap.c:442
#7  0x00005564bca7cd83 in StartChildProcess (type=WalWriterProcess) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:5313
#8  0x00005564bca7e11a in reaper (postgres_signal_arg=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:2871
#9  <signal handler called>
#10 0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#11 0x00005564bc82a489 in ServerLoop () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717
#12 0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361
#13 0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228


########## second one:


Continuing.

Program received signal SIGUSR1, User defined signal 1.
0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
84      in ../sysdeps/unix/syscall-template.S
#0  0x00007f12333f9573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005564bc82a489 in ServerLoop () at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717
#2  0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361
#3  0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at
/build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228





-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вложения

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Tom Lane

Дата:

26 октября 2017 г., 02:41:57

Stefan Tzeggai <tzeggai@empirica-systeme.de> writes:
> I also tried to get a sensible stack trace. I attached 9 gdb to all
> postgres-pids and when I triggered the crash, two of the gdb had some
> output and produced something on 'bt'. Attached..

Those look like normal operation --- SIGUSR1 isn't a crash condition,
it's what PG normally uses to wake up a sleeping process.  If you
want to attach gdb before provoking the crash, you need to tell it
to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint"
is the right incantation).

It might be easier to enable core files ("ulimit -c unlimited" before
starting the postmaster) and then gdb the core files.

> If I would be able to dump the relevant data from my db and I would be
> able to reproduce the crash with it on a fresh PG10 install - Would
> anyone have time to look at it? I guess its would no more than 50Mb...

Sure, I or somebody else would look at it.
        regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Stefan Tzeggai

Дата:

31 октября 2017 г., 18:09:05

Just to keep you updated: I am on holydays this week. I will put more
time into this next week.

Steve

Am 25.10.2017 um 22:41 schrieb Tom Lane:
>> If I would be able to dump the relevant data from my db and I would be
>> able to reproduce the crash with it on a fresh PG10 install - Would
>> anyone have time to look at it? I guess its would no more than 50Mb...
> 
> Sure, I or somebody else would look at it.
> 
>             regards, tom lane
> 
> 


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Stefan Tzeggai

Дата:

08 ноября 2017 г., 17:50:20

Hi

The segfaults thrown when I run my application on PG10 got worse. I have
found more segfaults even when max_parallel_workers_per_gather is left
default.

I have been able to create a 6Mb PG10 dump that I can be imported into a
vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults
PG10 100% (on my mashines at least).

The dataset has been wiped of sensitive data, but it still should not go
public. I will send a password by email to all PG-hackers interested.

https://oc.empirica-systeme.de/index.php/s/0XLKObTrUjRlCV7

The dump contains a * a table with lots of columns called "basedata", 56k rows * a mat view created as select * from
basedatacalled "mv", 56k rows * Lots of btree indexes on most of the mv-colums

I do the following on my laptop running latest Ubuntu 16.04 with the
PG-APT-Repository:

#!PURGING ALL PG STUFF HERE TO GET A CLEAN START!!
sudo apt-get purge postgresql-9.6 postgresql-10
postgresql-10-postgis-2.4-scripts postgresql-10-postgis-2.4
sudo rm /etc/postgresql -rf

# Installing 10.0-1.pgdg16.04+1
sudo apt install postgresql-10
sudo su postgres
dropdb analyst
createdb analyst

SELECT count(1) FROM mv WHERE
((nachfrageart IN (1)) AND (oadr_gkz IN (6611000) OR oadr_kkz IN

(3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6535,6631,6632,6633,6634,6635,6636,7315,8125,8215,8221,8222,9663))
AND (objekttyp_grob IN (1,2)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01'))OR
((nachfrageart IN (0)) AND (nutzungsart IN (0)) AND (oadr_gkz IN
(6611000,8121000,8212000) OR oadr_kkz IN

(3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6631,6633,7315,8125,8221,8222,9663))
AND (objekttyp_grob IN (1,2,3)) AND (startdate>='2012-01-01' OR enddate
IS NULL OR enddate>='2012-01-01'));

100% of the times segfault when running above query. Tested on thrwew
Ubuntu 16.04 servers and one Ubuntu 16.04 Desktop.

Where data comes from: The data is created on a PG9.6 mashine daily,
dumped, and imported into PG10. The whole dataflow is stable with PG9.6.
I have seen the problem with every fresh dataset.

I hope this finally makes the bug reproducable to you. If it does not
segfault on your mashine, please try to increase
max_parallel_workers_per_gather to 5.

I am very sorry that I didn't test PG10 earlier when it was beta. I
guess the current bughunt makes it more likely that I will test PG11
beta with my application. Promised!

Greetings,Steve

Am 25.10.2017 um 22:41 schrieb Tom Lane:
> Stefan Tzeggai <tzeggai@empirica-systeme.de> writes:
>> I also tried to get a sensible stack trace. I attached 9 gdb to all
>> postgres-pids and when I triggered the crash, two of the gdb had some
>> output and produced something on 'bt'. Attached..
> 
> Those look like normal operation --- SIGUSR1 isn't a crash condition,
> it's what PG normally uses to wake up a sleeping process.  If you
> want to attach gdb before provoking the crash, you need to tell it
> to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint"
> is the right incantation).
> 
> It might be easier to enable core files ("ulimit -c unlimited" before
> starting the postmaster) and then gdb the core files.
> 
>> If I would be able to dump the relevant data from my db and I would be
>> able to reproduce the crash with it on a fresh PG10 install - Would
>> anyone have time to look at it? I guess its would no more than 50Mb...
> 
> Sure, I or somebody else would look at it.
> 
>             regards, tom lane
> 
> 

-- 
Stefan Tzeggai

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Amit Kapila

Дата:

08 ноября 2017 г., 18:59:45

On Wed, Nov 8, 2017 at 5:20 PM, Stefan Tzeggai
<tzeggai@empirica-systeme.de> wrote:
> Hi
>
> The segfaults thrown when I run my application on PG10 got worse. I have
> found more segfaults even when max_parallel_workers_per_gather is left
> default.
>
> I have been able to create a 6Mb PG10 dump that I can be imported into a
> vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults
> PG10 100% (on my mashines at least).
>
> The dataset has been wiped of sensitive data, but it still should not go
> public. I will send a password by email to all PG-hackers interested.
>

Can you share the password with me?  My guess is that this is related
to Parallel Bitmap Heap Scan as that is a newly introduced feature in
PG10 and the previous email shows that in the plan.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Dilip Kumar

Дата:

08 ноября 2017 г., 19:24:00

On Wed, Oct 25, 2017 at 5:46 PM, Stefan Tzeggai
<tzeggai@empirica-systeme.de> wrote:

> "Finalize Aggregate  (cost=186411.37..186411.38 rows=1 width=8)"
> "  ->  Gather  (cost=186410.95..186411.36 rows=4 width=8)"
> "        Workers Planned: 4"
> "        ->  Partial Aggregate  (cost=185410.95..185410.96 rows=1 width=8)"
> "              ->  Parallel Bitmap Heap Scan on
> as_20171025_20170930_ut78777 rt  (cost=12058.69..185353.14 rows=23121
> width=0)"
> "                    Recheck Cond: (((oadr_gkz = ANY
>
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
> AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))"
> "                    Filter: ((oart_zwangsversteigerung_janein IS NULL)
> AND (((oadr_gkz = ANY
>
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
> AND (objekttyp_grob = 1 (...)"
> "                    ->  BitmapOr  (cost=12058.69..12058.69 rows=94046
> width=0)"
> "                          ->  BitmapAnd  (cost=11726.20..11726.20
> rows=76321 width=0)"
> "                                ->  Bitmap Index Scan on

By looking at the plan it seems like the issue what got fixed in below commit.

Author: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:23:28
Committer: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:35:14
Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref()
so it won't lie to aggregate final functions.)
Child:  cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's
ability to check for improperly-nested aggregates.)
Branch: remotes/origin/REL_10_STABLE
Follows: REL_10_0
Precedes: REL_10_1
   Fix possible crash with Parallel Bitmap Heap Scan.
   If a Parallel Bitmap Heap scan's chain of leftmost descendents   includes a BitmapOr whose first child is a
BitmapAnd,the prior coding   would mistakenly create a non-shared TIDBitmap and then try to perform   shared
iteration.
   Report by Tomas Vondra.  Patch by Dilip Kumar.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Stefan Tzeggai

Дата:

08 ноября 2017 г., 19:30:14

Sorry, I missed a line:

> # Installing 10.0-1.pgdg16.04+1
> sudo apt install postgresql-10
> sudo su postgres
> dropdb analyst
> createdb analyst
AND THEN pg_restore -U postgres -v -d analyst -Fc
dumpSegfaultSmaller.backup

Sorry


Am 08.11.2017 um 12:50 schrieb Stefan Tzeggai:
> Hi
> 
> The segfaults thrown when I run my application on PG10 got worse. I have
> found more segfaults even when max_parallel_workers_per_gather is left
> default.
> 
> I have been able to create a 6Mb PG10 dump that I can be imported into a
> vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults
> PG10 100% (on my mashines at least).
> 
> The dataset has been wiped of sensitive data, but it still should not go
> public. I will send a password by email to all PG-hackers interested.
> 
> https://oc.empirica-systeme.de/index.php/s/0XLKObTrUjRlCV7
> 
> The dump contains a
>   * a table with lots of columns called "basedata", 56k rows
>   * a mat view created as select * from basedata called "mv", 56k rows
>   * Lots of btree indexes on most of the mv-colums
> 
> I do the following on my laptop running latest Ubuntu 16.04 with the
> PG-APT-Repository:
> 
> #!PURGING ALL PG STUFF HERE TO GET A CLEAN START!!
> sudo apt-get purge postgresql-9.6 postgresql-10
> postgresql-10-postgis-2.4-scripts postgresql-10-postgis-2.4
> sudo rm /etc/postgresql -rf
> 
> # Installing 10.0-1.pgdg16.04+1
> sudo apt install postgresql-10
> sudo su postgres
> dropdb analyst
> createdb analyst
> 
> 
> SELECT count(1) FROM mv WHERE
> ((nachfrageart IN (1)) AND (oadr_gkz IN (6611000) OR oadr_kkz IN
>
(3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6535,6631,6632,6633,6634,6635,6636,7315,8125,8215,8221,8222,9663))
> AND (objekttyp_grob IN (1,2)) AND (startdate>='2012-01-01' OR enddate IS
> NULL OR enddate>='2012-01-01'))
>  OR
> ((nachfrageart IN (0)) AND (nutzungsart IN (0)) AND (oadr_gkz IN
> (6611000,8121000,8212000) OR oadr_kkz IN
>
(3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6631,6633,7315,8125,8221,8222,9663))
> AND (objekttyp_grob IN (1,2,3)) AND (startdate>='2012-01-01' OR enddate
> IS NULL OR enddate>='2012-01-01'));
> 
> 100% of the times segfault when running above query. Tested on thrwew
> Ubuntu 16.04 servers and one Ubuntu 16.04 Desktop.
> 
> Where data comes from: The data is created on a PG9.6 mashine daily,
> dumped, and imported into PG10. The whole dataflow is stable with PG9.6.
> I have seen the problem with every fresh dataset.
> 
> I hope this finally makes the bug reproducable to you. If it does not
> segfault on your mashine, please try to increase
> max_parallel_workers_per_gather to 5.
> 
> I am very sorry that I didn't test PG10 earlier when it was beta. I
> guess the current bughunt makes it more likely that I will test PG11
> beta with my application. Promised!
> 
> Greetings,
>  Steve
> 
> 
> Am 25.10.2017 um 22:41 schrieb Tom Lane:
>> Stefan Tzeggai <tzeggai@empirica-systeme.de> writes:
>>> I also tried to get a sensible stack trace. I attached 9 gdb to all
>>> postgres-pids and when I triggered the crash, two of the gdb had some
>>> output and produced something on 'bt'. Attached..
>>
>> Those look like normal operation --- SIGUSR1 isn't a crash condition,
>> it's what PG normally uses to wake up a sleeping process.  If you
>> want to attach gdb before provoking the crash, you need to tell it
>> to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint"
>> is the right incantation).
>>
>> It might be easier to enable core files ("ulimit -c unlimited" before
>> starting the postmaster) and then gdb the core files.
>>
>>> If I would be able to dump the relevant data from my db and I would be
>>> able to reproduce the crash with it on a fresh PG10 install - Would
>>> anyone have time to look at it? I guess its would no more than 50Mb...
>>
>> Sure, I or somebody else would look at it.
>>
>>             regards, tom lane
>>
>>
> 

-- 
empirica-systeme GmbH
Stefan Tzeggai
Brunsstr. 31
72074 Tübingen
email tzeggai@empirica-systeme.de
phone  +49 7071 6392922
mobile +49 176 40 38 9559

"Wer nichts zu verbergen hat, braucht auch keine Hose!"



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Stefan Tzeggai

Дата:

08 ноября 2017 г., 20:02:10

Hi

Am 08.11.2017 um 14:24 schrieb Dilip Kumar:
> By looking at the plan it seems like the issue what got fixed in below
commit.
>>
> Author: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:23:28
> Committer: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:35:14
> Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref()
> so it won't lie to aggregate final functions.)
> Child:  cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's
> ability to check for improperly-nested aggregates.)
> Branch: remotes/origin/REL_10_STABLE
> Follows: REL_10_0
> Precedes: REL_10_1
>
>     Fix possible crash with Parallel Bitmap Heap Scan.
>
>     If a Parallel Bitmap Heap scan's chain of leftmost descendents
>     includes a BitmapOr whose first child is a BitmapAnd, the prior coding
>     would mistakenly create a non-shared TIDBitmap and then try to perform
>     shared iteration.
>
>     Report by Tomas Vondra.  Patch by Dilip Kumar.

Do I understand it correctly, that that fix would be released with 10.1
tomorrow (according to https://www.postgresql.org/developer/roadmap/)
and I could then test it?

Steve


Am 08.11.2017 um 14:24 schrieb Dilip Kumar:
> On Wed, Oct 25, 2017 at 5:46 PM, Stefan Tzeggai
> <tzeggai@empirica-systeme.de> wrote:
> 
>> "Finalize Aggregate  (cost=186411.37..186411.38 rows=1 width=8)"
>> "  ->  Gather  (cost=186410.95..186411.36 rows=4 width=8)"
>> "        Workers Planned: 4"
>> "        ->  Partial Aggregate  (cost=185410.95..185410.96 rows=1 width=8)"
>> "              ->  Parallel Bitmap Heap Scan on
>> as_20171025_20170930_ut78777 rt  (cost=12058.69..185353.14 rows=23121
>> width=0)"
>> "                    Recheck Cond: (((oadr_gkz = ANY
>>
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
>> AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))"
>> "                    Filter: ((oart_zwangsversteigerung_janein IS NULL)
>> AND (((oadr_gkz = ANY
>>
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
>> AND (objekttyp_grob = 1 (...)"
>> "                    ->  BitmapOr  (cost=12058.69..12058.69 rows=94046
>> width=0)"
>> "                          ->  BitmapAnd  (cost=11726.20..11726.20
>> rows=76321 width=0)"
>> "                                ->  Bitmap Index Scan on
> 
> 
> By looking at the plan it seems like the issue what got fixed in below commit.
> 
> Author: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:23:28
> Committer: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:35:14
> Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref()
> so it won't lie to aggregate final functions.)
> Child:  cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's
> ability to check for improperly-nested aggregates.)
> Branch: remotes/origin/REL_10_STABLE
> Follows: REL_10_0
> Precedes: REL_10_1
> 
>     Fix possible crash with Parallel Bitmap Heap Scan.
> 
>     If a Parallel Bitmap Heap scan's chain of leftmost descendents
>     includes a BitmapOr whose first child is a BitmapAnd, the prior coding
>     would mistakenly create a non-shared TIDBitmap and then try to perform
>     shared iteration.
> 
>     Report by Tomas Vondra.  Patch by Dilip Kumar.
> 

-- 
empirica-systeme GmbH
Stefan Tzeggai
Brunsstr. 31
72074 Tübingen
email tzeggai@empirica-systeme.de
phone  +49 7071 6392922
mobile +49 176 40 38 9559

"Wer nichts zu verbergen hat, braucht auch keine Hose!"



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Dilip Kumar

Дата:

08 ноября 2017 г., 21:15:37

On Wed, Nov 8, 2017 at 7:00 PM, Stefan Tzeggai
<tzeggai@empirica-systeme.de> wrote:
> Sorry, I missed a line:
>
>> # Installing 10.0-1.pgdg16.04+1
>> sudo apt install postgresql-10
>> sudo su postgres
>> dropdb analyst
>> createdb analyst
>
>  AND THEN pg_restore -U postgres -v -d analyst -Fc
> dumpSegfaultSmaller.backup
>

Thanks for the information, I could reproduce the issue at v10 stamp
commit[1] and
it's fixed at [2], I have also verified from the core dump that issue
is same what
got fixed at [2]

[1]
commit 5df0e99bea1c3e5fbffa7fbd0982da88ea149bb6
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Oct 2 17:09:15 2017 -0400
   Stamp 10.0.

[2]
commit a3b1c221893f739950e9232b4b789750f247cee5
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Oct 13 14:53:28 2017 -0400
   Fix possible crash with Parallel Bitmap Heap Scan.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Stefan Tzeggai

Дата:

08 ноября 2017 г., 21:46:58

Dear Dilip

Am 08.11.2017 um 16:15 schrieb Dilip Kumar:
> Thanks for the information, I could reproduce the issue at v10 stamp
> commit[1] and
> it's fixed at [2], I have also verified from the core dump that issue
> is same what
> got fixed at [2]
> 
> [1]
> commit 5df0e99bea1c3e5fbffa7fbd0982da88ea149bb6
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Mon Oct 2 17:09:15 2017 -0400
> 
>     Stamp 10.0.
> 
> [2]
> commit a3b1c221893f739950e9232b4b789750f247cee5
> Author: Robert Haas <rhaas@postgresql.org>
> Date:   Fri Oct 13 14:53:28 2017 -0400
> 
>     Fix possible crash with Parallel Bitmap Heap Scan.
> 

Thats great news!

Just a pitty that I invested all afternoon yesterday to fiddle together
that tiny dataset to redprouce the bug. But thats how live goes ;-)

It was still a great experience to see how prefessional and fast you
guys look at the bugs! +1 And I will test 11 RC next year!

Last question before I downgrade my PG 10 installations... The fix will
be released this week with 10.1 ?

https://www.postgresql.org/developer/roadmap/

Thanks again
Steve

-- 
Stefan Tzeggai

"Wer nichts zu verbergen hat, braucht auch keine Hose!"



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] PG10 Segfault 11 on default Ubuntu 16.04 installs

От

Tom Lane

Дата:

08 ноября 2017 г., 22:30:29

Stefan Tzeggai <tzeggai@empirica-systeme.de> writes:
> Last question before I downgrade my PG 10 installations... The fix will
> be released this week with 10.1 ?

Yes, it's in 10.1.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

От

Michael Paquier

Дата:

09 ноября 2017 г., 06:01:51

On Wed, Nov 8, 2017 at 11:02 PM, Stefan Tzeggai
<tzeggai@empirica-systeme.de> wrote:
> Am 08.11.2017 um 14:24 schrieb Dilip Kumar:
>> By looking at the plan it seems like the issue what got fixed in below
> commit.
>>>
>> Author: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:23:28
>> Committer: Robert Haas <rhaas@postgresql.org>  2017-10-14 00:35:14
>> Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref()
>> so it won't lie to aggregate final functions.)
>> Child:  cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's
>> ability to check for improperly-nested aggregates.)
>> Branch: remotes/origin/REL_10_STABLE
>> Follows: REL_10_0
>> Precedes: REL_10_1
>>
>>     Fix possible crash with Parallel Bitmap Heap Scan.
>>
>>     If a Parallel Bitmap Heap scan's chain of leftmost descendents
>>     includes a BitmapOr whose first child is a BitmapAnd, the prior coding
>>     would mistakenly create a non-shared TIDBitmap and then try to perform
>>     shared iteration.
>>
>>     Report by Tomas Vondra.  Patch by Dilip Kumar.
>
> Do I understand it correctly, that that fix would be released with 10.1
> tomorrow (according to https://www.postgresql.org/developer/roadmap/)
> and I could then test it?

Yes, this commit is included in 10.1.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3

Вложения