Re: pg11.1: dsa_area could not attach to segment

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: pg11.1: dsa_area could not attach to segment
Дата
Msg-id 20190211040132.GV31721@telsasoft.com
обсуждение исходный текст
Ответ на Re: pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: pg11.1: dsa_area could not attach to segment  (Sergei Kornilov <sk@zsrv.org>)
Список pgsql-hackers
Hi,

On Mon, Feb 11, 2019 at 11:11:32AM +1100, Thomas Munro wrote:
> I haven't ever managed to reproduce that one yet.  It's great you have
> a reliable repro...  Let's discuss it on the #15585 thread.

I realized that I gave bad information (at least to Thomas).  On the server
where I've been reproducing this, it wasn't in an empty DB cluster, but one
where I'd restored our DB schema.  I think that's totally irrelevant, except
that pg_attribute needs to be big enough to get parallel scan.

Here's confirmed steps to reproduce

initdb -D /var/lib/pgsql/test
pg_ctl -c start -D /var/lib/pgsql/test -o '-c operator_precedence_warning=on -c maintenance_work_mem=1GB -c
max_wal_size=16GB-c full_page_writes=off -c autovacuum=off -c fsync=off -c port=5678 -c unix_socket_directories=/tmp'
 
PGPORT=5678 PGHOST=/tmp psql postgres -c 'CREATE TABLE queued_alters(child text,parent text); CREATE TABLE
queued_alters_child()INHERITS(queued_alters);ANALYZE queued_alters, pg_attribute'
 

# Inflate pg_attribute to nontrivial size:
echo "CREATE TABLE t(`for c in $(seq 1 222); do echo "c$c int,"; done |xargs |sed 's/,$//'`)" |PGHOST=/tmp PGPORT=5678
psqlpostgres 
 
for a in `seq 1 999`; do echo "CREATE TABLE t$a() INHERITS(t);"; done |PGHOST=/tmp PGPORT=5678 psql -q postgres

while PGOPTIONS='-cmin_parallel_table_scan_size=0' PGPORT=5678 PGHOST=/tmp psql postgres -c "explain analyze SELECT
colcld.childc, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols,
array_agg(format_type(colpar.atttypid,colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN
pg_attributecolpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN
(SELECT*, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\3\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do :; done >/dev/null &
 

# Verify this is planning parallel workers, then repeat 10-20x.

Typically fails on this server in under 10min.

Sorry for the error.

Justin

On Wed, Feb 06, 2019 at 07:47:19PM -0600, Justin Pryzby wrote:
> FYI, I wasn't yet able to make this work yet.
> (gdb) print *segment_map->header
> Cannot access memory at address 0x7f347e554000
> 
> However I *did* reproduce the error in an isolated, non-production postgres
> instance.  It's a total empty, untuned v11.1 initdb just for this, running ONLY
> a few simultaneous loops around just one query It looks like the simultaneous
> loops sometimes (but not always) fail together.  This has happened a couple
> times.  
> 
> It looks like one query failed due to "could not attach" in leader, one failed
> due to same in worker, and one failed with "not pinned", which I hadn't seen
> before and appears to be related to DSM, not DSA...
> 
> |ERROR:  dsa_area could not attach to segment
> |ERROR:  cannot unpin a segment that is not pinned
> |ERROR:  dsa_area could not attach to segment
> |CONTEXT:  parallel worker
> |
> |[2]   Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 
> |    :;
> |done > /dev/null
> |[5]-  Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 
> |    :;
> |done > /dev/null
> |[6]+  Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 
> 
> I'm also trying to reproduce on other production servers.  But so far nothing
> else has shown the bug, including the other server which hit our original
> (other) DSA error with the queued_alters query.  So I tentatively think there
> really may be something specific to the server (not the hypervisor so maybe the
> OS, libraries, kernel, scheduler, ??).
> 
> Find the schema for that table here:
> https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com
> 
> Note, for unrelated reasons, that query was also previously discussed here:
> https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: anole's failed timeouts test
Следующее
От: Tom Lane
Дата:
Сообщение: Re: anole's failed timeouts test