Обсуждение: pg_upgrade of 11 -> 13: free(): invalid pointer

Поиск
Список
Период
Сортировка

pg_upgrade of 11 -> 13: free(): invalid pointer

От
Jeremy Wilson
Дата:
I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the
sameDB: 

—
Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
  messages
*failure*

Consult the last few lines of "pg_upgrade_dump_16387.log" for
the probable cause of the failure.
Failure, exiting
—

The log contains (which is different each time):

—
pg_restore: WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the
rasteris empty (width = 0 and height = 0). Otherwise, returns false.’; 
—

And the pgsql13 server log contains:

—
2020-11-17 11:51:40.953 EST [96545] LOG:  database system is ready to accept connections
free(): invalid pointer
2020-11-17 11:51:42.880 EST [96545] LOG:  server process (PID 96575) was terminated by signal 6: Aborted
2020-11-17 11:51:42.880 EST [96545] LOG:  terminating any other active server processes
2020-11-17 11:51:42.880 EST [96582] WARNING:  terminating connection because of crash of another server process
2020-11-17 11:51:42.880 EST [96582] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
2020-11-17 11:51:42.880 EST [96582] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2020-11-17 11:51:42.884 EST [96545] LOG:  all server processes terminated; reinitializing
2020-11-17 11:51:42.904 EST [96545] LOG:  received fast shutdown request
2020-11-17 11:51:42.905 EST [96585] LOG:  database system was interrupted; last known up at 2020-11-17 11:51:42 EST
2020-11-17 11:51:42.906 EST [96585] LOG:  database system was not properly shut down; automatic recovery in progress
2020-11-17 11:51:42.906 EST [96585] LOG:  redo starts at E0/DB6B2960
2020-11-17 11:51:42.907 EST [96545] LOG:  abnormal database system shutdown
2020-11-17 11:51:42.909 EST [96545] LOG:  database system is shut down
—

So I’m assuming it’s that free() call.  Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS
8.




Re: pg_upgrade of 11 -> 13: free(): invalid pointer

От
Adrian Klaver
Дата:
On 11/17/20 8:59 AM, Jeremy Wilson wrote:
> I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the
sameDB:
 
> 
> —
> Performing Upgrade
> ------------------
> Analyzing all rows in the new cluster                       ok
> Freezing all rows in the new cluster                        ok
> Deleting files from new pg_xact                             ok
> Copying old pg_xact to new server                           ok
> Setting next transaction ID and epoch for new cluster       ok
> Deleting files from new pg_multixact/offsets                ok
> Copying old pg_multixact/offsets to new server              ok
> Deleting files from new pg_multixact/members                ok
> Copying old pg_multixact/members to new server              ok
> Setting next multixact ID and offset for new cluster        ok
> Resetting WAL archives                                      ok
> Setting frozenxid and minmxid counters in new cluster       ok
> Restoring global objects in the new cluster                 ok
> Restoring database schemas in the new cluster
>    messages
> *failure*
> 
> Consult the last few lines of "pg_upgrade_dump_16387.log" for
> the probable cause of the failure.
> Failure, exiting
> —
> 
> The log contains (which is different each time):
> 
> —
> pg_restore: WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
> pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
> pg_restore: error: could not execute query: server closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.
> Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the
rasteris empty (width = 0 and height = 0). Otherwise, returns false.’;
 
> —
> 
> And the pgsql13 server log contains:
> 
> —
> 2020-11-17 11:51:40.953 EST [96545] LOG:  database system is ready to accept connections
> free(): invalid pointer
> 2020-11-17 11:51:42.880 EST [96545] LOG:  server process (PID 96575) was terminated by signal 6: Aborted
> 2020-11-17 11:51:42.880 EST [96545] LOG:  terminating any other active server processes
> 2020-11-17 11:51:42.880 EST [96582] WARNING:  terminating connection because of crash of another server process
> 2020-11-17 11:51:42.880 EST [96582] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
 
> 2020-11-17 11:51:42.880 EST [96582] HINT:  In a moment you should be able to reconnect to the database and repeat
yourcommand.
 
> 2020-11-17 11:51:42.884 EST [96545] LOG:  all server processes terminated; reinitializing
> 2020-11-17 11:51:42.904 EST [96545] LOG:  received fast shutdown request
> 2020-11-17 11:51:42.905 EST [96585] LOG:  database system was interrupted; last known up at 2020-11-17 11:51:42 EST
> 2020-11-17 11:51:42.906 EST [96585] LOG:  database system was not properly shut down; automatic recovery in progress
> 2020-11-17 11:51:42.906 EST [96585] LOG:  redo starts at E0/DB6B2960
> 2020-11-17 11:51:42.907 EST [96545] LOG:  abnormal database system shutdown
> 2020-11-17 11:51:42.909 EST [96545] LOG:  database system is shut down
> —
> 
> So I’m assuming it’s that free() call.  Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS
8.

Was this after a clean install of the corrected RPM's?


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_upgrade of 11 -> 13: free(): invalid pointer

От
Jeremy Wilson
Дата:

> On Nov 17, 2020, at 12:18 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 11/17/20 8:59 AM, Jeremy Wilson wrote:
>
> Was this after a clean install of the corrected RPM’s?

Yes, this is a fresh install of CentOS 8 and installed using the updated repo and RPMs.




Re: pg_upgrade of 11 -> 13: free(): invalid pointer

От
Bruce Momjian
Дата:
On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
> pg_restore: WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
> pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
> pg_restore: error: could not execute query: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the
rasteris empty (width = 0 and height = 0). Otherwise, returns false.’;
 

My guess is that this is a crash in the PostGIS shared library.  I would
ask the PostGIS team if they know of any crash cases, and if not, I
think you need to do a pg_dump of the database and test-load it into a
new database to see what query makes it fail, and then load debug
symbols and do a backtrace of the stack at the point of the crash. 
Yeah, not fun.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: pg_upgrade of 11 -> 13: free(): invalid pointer

От
Bruce Momjian
Дата:
On Tue, Nov 17, 2020 at 02:44:47PM -0500, Bruce Momjian wrote:
> On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
> > pg_restore: WARNING:  terminating connection because of crash of another server process
> > DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
> > HINT:  In a moment you should be able to reconnect to the database and repeat your command.
> > pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
> > pg_restore: while PROCESSING TOC:
> > pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
> > pg_restore: error: could not execute query: server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if
theraster is empty (width = 0 and height = 0). Otherwise, returns false.’;
 
> 
> My guess is that this is a crash in the PostGIS shared library.  I would
> ask the PostGIS team if they know of any crash cases, and if not, I
> think you need to do a pg_dump of the database and test-load it into a
> new database to see what query makes it fail, and then load debug
> symbols and do a backtrace of the stack at the point of the crash. 
> Yeah, not fun.

Actually pg_dump --schema-only is what you want to dump and load into a
separate databsae.   No need to dump the data.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: pg_upgrade of 11 -> 13: free(): invalid pointer

От
Paul Ramsey
Дата:
> On Nov 17, 2020, at 11:44 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
>> pg_restore: WARNING:  terminating connection because of crash of another server process
>> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
>> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
>> pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
>> pg_restore: while PROCESSING TOC:
>> pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
>> pg_restore: error: could not execute query: server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if
theraster is empty (width = 0 and height = 0). Otherwise, returns false.’; 
>
> My guess is that this is a crash in the PostGIS shared library.  I would
> ask the PostGIS team if they know of any crash cases, and if not, I
> think you need to do a pg_dump of the database and test-load it into a
> new database to see what query makes it fail, and then load debug
> symbols and do a backtrace of the stack at the point of the crash.
> Yeah, not fun.

These kinds of problems have been almost always due to multiple versions of dependencies installed simultaneously. So
packagingfun. You'll get some version of postgis compiled against one train of dependencies and another against another
train,and for upgrade both trains will end up installed simultaneously, and things will break.  
P

>
> --
>  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
>  The usefulness of a cup is in its emptiness, Bruce Lee
>
>
>