Re: v12.0: segfault in reindex CONCURRENTLY

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: v12.0: segfault in reindex CONCURRENTLY
Дата
Msg-id 20191013162426.GF4475@telsasoft.com
обсуждение исходный текст
Ответ на Re: v12.0: segfault in reindex CONCURRENTLY  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: v12.0: segfault in reindex CONCURRENTLY  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Resending this message, which didn't make it to the list when I sent it
earlier.  (And, notified -www).

On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote:
> On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote:
> > Unfortunately, there was no core file, and I'm still trying to reproduce it.
> 
> Forgot to set ulimit -c?  Having a backtrace would surely help.

Fortunately (?) another server hit crashed last night.
(Doesn't appear to be relevant, but this table has no inheritence/partition-ness).

Looks like it's a race condition and dereferencing *holder=NULL.  The first
crash was probably the same bug, due to report query running during "reindex
CONCURRENTLY", and probably finished at nearly the same time as another locker.

Relevant code introduced here:

commit ab0dfc961b6a821f23d9c40c723d11380ce195a6
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Tue Apr 2 15:18:08 2019 -0300

    Report progress of CREATE INDEX operations

Needs to be conditionalized (as anticipated by the comment)

+            if (holder)
                                pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
                                                                                         holder->pid);


Core was generated by `postgres: postgres ts [local] REINDEX                  '.
Program terminated with signal 11, Segmentation fault.

#0  WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true)
atlmgr.c:911
 
#1  0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at
indexcmds.c:3090
#2  0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at
indexcmds.c:2352
#3  0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468, queryString=queryString@entry=0x1d046e0
"REINDEXINDEX CONCURRENTLY loaded_cdr_files_filename",
 
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
dest=dest@entry=0x1d05548,
    completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787
#4  0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY
loaded_cdr_files_filename",context=PROCESS_UTILITY_TOPLEVEL,
 
    params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006
#5  0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>,
setHoldSnapshot=<optimizedout>, dest=0x1d05548,
 
    completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175
#6  0x0000000000763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x1d05548,
 
    altdest=altdest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328
#7  0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x1d05548, altdest=0x1d05548,
 
    completionTag=0x7ffc05e6c0a0 "") at pquery.c:796
#8  0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215
#9  0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>,
username=<optimizedout>) at postgres.c:4236
 
#10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431
#11 BackendStartup (port=0x1d2b340) at postmaster.c:4122
#12 ServerLoop () at postmaster.c:1704
#13 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cff280) at postmaster.c:1377
#14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228

bt f

#0  WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true)
atlmgr.c:911
 
        holder = 0x0
        lockholders = 0x1d9b778
        holders = <optimized out>
        lc = 0x1d9bf80
        total = <optimized out>
        done = 1
#1  0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) at
indexcmds.c:3090
        heapRelationIds = 0x1d30360
        indexIds = 0x1d303b0
        newIndexIds = <optimized out>
        relationLocks = <optimized out>
        lockTags = <optimized out>
        lc = 0x0
        lc2 = 0x0
        private_context = <optimized out>
        oldcontext = <optimized out>
        relkind = 105 'i'
        relationName = 0x0
        relationNamespace = 0x0
        ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec =
75668733820930},ru_stime = {tv_sec = 0, tv_usec = 30592272}, {
 
              ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105,
__ru_idrss_word= 105}, {ru_isrss = -926385342574214656, 
 
              __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt =
0,__ru_majflt_word = 0}, {ru_nswap = 17618, 
 
              __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864},
{ru_oublock= 30430312, __ru_oublock_word = 30430312}, {
 
              ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864,
__ru_msgrcv_word= 139781327898864}, {ru_nsignals = 139781327898864, 
 
              __ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864},
{ru_nivcsw= 139781327898864, 
 
              __ru_nivcsw_word = 139781327898864}}}
        __func__ = "ReindexRelationConcurrently"
#2  0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at
indexcmds.c:2352
        state = {concurrent = true, locked_table_oid = 17608}
        indOid = 17618
        irel = <optimized out>
        persistence = 112 'p'
...



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: stress test for parallel workers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: v12.0: ERROR: could not find pathkey item to sort