Re: pg11.1: dsa_area could not attach to segment

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: pg11.1: dsa_area could not attach to segment
Дата
Msg-id 20190207014719.GJ29720@telsasoft.com
обсуждение исходный текст
Ответ на pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: pg11.1: dsa_area could not attach to segment  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
Re: pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
FYI, I wasn't yet able to make this work yet.
(gdb) print *segment_map->header
Cannot access memory at address 0x7f347e554000

However I *did* reproduce the error in an isolated, non-production postgres
instance.  It's a total empty, untuned v11.1 initdb just for this, running ONLY
a few simultaneous loops around just one query It looks like the simultaneous
loops sometimes (but not always) fail together.  This has happened a couple
times.  

It looks like one query failed due to "could not attach" in leader, one failed
due to same in worker, and one failed with "not pinned", which I hadn't seen
before and appears to be related to DSM, not DSA...

|ERROR:  dsa_area could not attach to segment
|ERROR:  cannot unpin a segment that is not pinned
|ERROR:  dsa_area could not attach to segment
|CONTEXT:  parallel worker
|
|[2]   Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 
|    :;
|done > /dev/null
|[5]-  Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 
|    :;
|done > /dev/null
|[6]+  Done                    while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p,
array_agg(colpar.attname::textORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod)
ORDERBY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON
to_regclass(qa.parent)=colpar.attrelidAND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *,
attrelid::regclass::textAS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND
colcld.attnum>0AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
GROUPBY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child,
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$','\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '')
DESCLIMIT 1"; do
 

I'm also trying to reproduce on other production servers.  But so far nothing
else has shown the bug, including the other server which hit our original
(other) DSA error with the queued_alters query.  So I tentatively think there
really may be something specific to the server (not the hypervisor so maybe the
OS, libraries, kernel, scheduler, ??).

Find the schema for that table here:
https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com

Note, for unrelated reasons, that query was also previously discussed here:
https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com

Justin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Location of pg_rewind/RewindTest.pm and ssl/ServerSetup.pm
Следующее
От: "Nagaura, Ryohei"
Дата:
Сообщение: RE: Timeout parameters