Обсуждение: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16722 Logged by: Sergey Koposov Email address: skoposov@ed.ac.uk PostgreSQL version: 11.9 Operating system: debian Description: Hi, When ingesting a billion or so rows in the table that has some array columns (see schema below), at some point all the backends doing the ingestion hang with 100% CPU usage. When investigating, I think I can trace this to the limit of 2^32 toasted records per table. See the gdb full backtrace of hanged backends in the bottom. When the problem occurs, it has 272 mill records. Basically the hanging happens in GetNewOidWithIndex called by toast_save_datum. While I understand the limit for the toast number is there to stay, but the behaviour of PG next to the limit is IMO a bug (or at least non-trivial) . I would rather prefer to see an error-message as opposed to backends hanging. I see a similar report BUG #14137, but it didn't seem to receive any reply. If it matters, the ingestion inside the table is done by 10 parallel backends doing COPY in the table. (I don't know if it affects the way the objids are allocated for toast objects) Thanks, Sergey Koposov GDB output ***************** #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 No locals. #1 0x00000000006b3f42 in PGSemaphoreLock (sema=0x7f607fe1cbb8) at pg_sema.c:316 errStatus = <optimized out> __func__ = "PGSemaphoreLock" #2 0x000000000071ea14 in LWLockAcquire (lock=0x7f689257bf20, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1244 mustwait = <optimized out> proc = 0x7f68adeb46b0 result = true extraWaits = 0 __func__ = "LWLockAcquire" #3 0x0000000000700440 in StartBufferIO (buf=buf@entry=0x7f6083a2e640, forInput=forInput@entry=true) at bufmgr.c:3887 buf_state = <optimized out> #4 0x000000000070204d in BufferAlloc (foundPtr=0x7fff36ee18db, strategy=0x0, blockNum=7605075, forkNum=MAIN_FORKNUM, relpersistence=<optimized out>, smgr=0x130f380) at bufmgr.c:1251 newPartitionLock = 0x7f607fe24080 oldTag = {rnode = {spcNode = 20206528, dbNode = 0, relNode = 8688143}, forkNum = MAIN_FORKNUM, blockNum = 8469712} oldPartitionLock = <optimized out> buf = 0x7f6083a2e640 oldHash = <optimized out> oldFlags = <optimized out> buf_id = <optimized out> newTag = {rnode = {spcNode = 1663, dbNode = 16818, relNode = 2229524624}, forkNum = MAIN_FORKNUM, blockNum = 7605075} newHash = 4182877260 valid = false buf_state = 0 #5 ReadBuffer_common (smgr=0x130f380, relpersistence=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=7605075, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff36ee198f) at bufmgr.c:745 bufHdr = <optimized out> bufBlock = <optimized out> found = true isExtend = false isLocalBuf = <optimized out> __func__ = "ReadBuffer_common" #6 0x00000000007023e1 in ReadBufferExtended (reln=0x7f607fd9bc08, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=7605075, mode=mode@entry=RBM_NORMAL, strategy=strategy@entry=0x0) at bufmgr.c:664 hit = false buf = <optimized out> __func__ = "ReadBufferExtended" #7 0x00000000007024eb in ReadBuffer (blockNum=<optimized out>, reln=<optimized out>) at bufmgr.c:596 No locals. #8 ReleaseAndReadBuffer (buffer=buffer@entry=20373756, relation=<optimized out>, blockNum=<optimized out>) at bufmgr.c:1545 bufHdr = <optimized out> #9 0x00000000004c6db2 in index_fetch_heap (scan=0x0, scan@entry=0x136e098) at indexam.c:595 prev_buf = 20373756 tid = 0x136e0fc all_dead = false got_heap_tuple = <optimized out> #10 0x00000000004c6e00 in index_getnext (scan=0x136e098, direction=direction@entry=ForwardScanDirection) at indexam.c:691 heapTuple = <optimized out> tid = <optimized out> #11 0x00000000004c6299 in systable_getnext (sysscan=sysscan@entry=0x136dec8) at genam.c:425 htup = <optimized out> __func__ = "systable_getnext" #12 0x000000000050cd09 in GetNewOidWithIndex (relation=0x7f607fd9bc08, indexId=310933857, oidcolumn=1) at catalog.c:371 newOid = 2744075972 scan = 0x136dec8 key = {sk_flags = 0, sk_attno = 1, sk_strategy = 3, sk_subtype = 0, sk_collation = 100, sk_func = {fn_addr = 0x7b6320 <oideq>, fn_oid = 184, fn_nargs = 2, fn_strict = true, fn_retset = false, fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x13453c0, fn_expr = 0x0}, sk_argument = 2744075972} collides = <optimized out> #13 0x00000000004c3063 in toast_save_datum (value=20557421, oldexternal=0x0, options=0, rel=<optimized out>, rel=<optimized out>) at tuptoaster.c:1574 toastrel = 0x7f607fd9bc08 toastidxs = 0x136bba0 toasttup = 0x7f607fd98540 t_values = {300189817, 0, 140734114962192} t_isnull = {false, false, false} result = 0xfffffffffffffe00 chunk_data = {hdr = {vl_len_ = "\340\000\000", vl_dat = 0x7fff36ee1b14 "\001"}, data = "\340\000\000\000\001\000\000\000\000\000\000\000\274\002\000\000\b\000\000\000\001\000\000\000̥\203@\225\002\365@z\210,A/<gA\a\233}AYf\210Aɀ\215AД\226A.U)?\320\325.?\242aA?~\311:?i\215\211?\334\020\203?\000\000\000\000\272 \354@\222\202ي\321\070\354@\356\274\001ԀM\354@\"\272o\016nf\354@Ӯ\327N\275z\354@\242\250\"K\365\223\354@\372k\252\305D\250\354@", '\000' <repeats 1696 times>..., align_it = 224} chunk_size = 0 num_indexes = 1 validIndex = 0 #14 0x00000000004c45a2 in toast_insert_or_update (rel=0x7f607fd983a8, newtup=0x0, oldtup=0xb5, options=-1) at tuptoaster.c:867 old_value = 20557421 tupleDesc = 0x12beb88 numAttrs = 181 need_change = 69 need_free = 109 toast_action = "pp pppp ppppppxppp ", 'p' <repeats 51 times>, "xxxxxxxxxx ", 'x' <repeats 12 times>, 'p' <repeats 61 times>, " ppppppxxppppppxxpppppppp", '\000' <repeats 1418 times> toast_isnull = {false <repeats 1600 times>} toast_oldisnull = {false <repeats 1600 times>} toast_values = {9011, 508863, 20557040, 2663, 1, 2048, 130, 20557062, 4642831835879822414, 4629766747987641402, 1478912696, 1475380501, 1151322629, 1152363744, 20557104, 1018397949, 4678179332087438334, 4678191406000876033, 20557168, 1325489961962492288, 18446744072652356428, 18446744072655684425, 1052577735, 1094195074, 1088406363, 1106636035, 1157361664, 1100416689, 1139960709, 319, 1100668600, 1108163953, 29, 1100071362, 1111486228, 33, 0, 0, 0, 300, 297, 1051787629, 0, 2143289344, 2143289344, 1067470643, 1052143608, 31, 1099325834, 1107053637, 1109951334, 1096537497, 1083978235, 1077223218, 18446744072661571125, 1115025818, 1101955695, 1117711694, 1081872323, 1066791225, 987449489, 930850946, 1095769653, 1103322737, 1107459414, 1095769653, 1103322737, 1107459414, 20374504, 20557421, 20557474, 20557527, 20557580, 20557633, 20557686, 20557739, 20557792, 20557845, 20557898, 20557951, 20373112, 20373032, 20573080, 20558163, 20558204, 20558245, 20558286, 20558327, 20558368, 20558409, 20558450, 20558491, 20558532, 20558573, 20558614, 1064299422, 1064635873, 1064948113, 1065291023, 1065314998, 1065345062, 1065350129, 1, 1, 4, 278, 278, 31, 16, 1074988052, 1076692416, 1076469703, 1076226148, 1066023726, 1057373083, 1041813708, 910775196, 897988541, 958014467, 1073709436, 1080820995, 1087871082, 1105555510, 996563946, 990516632, 1007439877, 1065342546, 1065343854, 1065326389, 0, 0, 0, 0, 0, 0, 0, 0, 1070595106, 1067663254, 1068637012, 1132005523, 1120866882, 1124400175, 1126997579, 1115152811, 1117702755, 1086250728, 1080808387, 1082966116, 1086250960, 1080808777, 1082966483, 1089896366, 1069203394, 987853215, 932092729, 20558872, 1123232006, 1124468680, 20374216, 20374168, 20374120, 20374072, 20559216, 20559267, 20374024, 20373976, 20373928, 20373464, 20373416, 20313024, 20559936, 20559987, 0 <repeats 1427 times>} toast_oldvalues = {0 <repeats 1245 times>, 8599425, 0, 256, 140734115001632, 0, 0, 0, 0, 8602692, 0, 19089440, 140734115001632, 19356882, 140734115001634, 13444600, 3, 19356880, 13444600, 8796705, 16967, 8796705, 16967, 0 <repeats 23 times>, 140734115003392, 140734115003376, 10204964, 10204968, 0, 140087678209897, 0, 0, 140734115002240, 0, 0, 0, 0, 0, 140734115002192, 0 <repeats 11 times>, 32, 0, 0, 0, 0, 4294967295, 140087678216602, 0, 0, 4294967296, 18446744073709551615, 5, 0, 140087678216602, 0, 10204964, 0, 6, 10204968, 0, 0, 0, 0, 0, 0, 0, 0, 206158430232, 140734115003968, 140734115003776, 0 <repeats 125 times>, 140087698075212, 0, 1037797632, 7, 9, 140087700155240, 140087698075212, 10204964, 2090588023, 1, 9, 140087700155240, 140087698077630, 140734115003664, 140734115003456, 140087695826704, 140087695833544, 140734115003728, 32665437, 140734115003712, 4277816, 0, 140087700155384, 140087700264280, 4607254, 140087695834744, 4274192, 4294967296, 4294967581, 1, 140087700155384, 140734115003872, 140087700276488, 140734115003912, 140087700275632, 1, 140087698078061, 0, 140087700155384, 1, 0, 1, 140087700275632, 140734115003680, 140734115003664, 1037797632, 4495133, 0, 140087700276488, 140734115003728, 140734115003712, 2090588023, 4607254, 4294967295, 140087678425001, 140087695833544, 140087700264280, 165776, 140087681730168, 131072, 140087678425001, 18446744073709547520, 140087678400681, 0, 19492864, 140734115005392, 19327088, 165776, 13428184, 0, 27, 27, 0, 19327104, 140087698096119, 4294967297, 0, 18446744073709551504, 140087695833544, 28, 19327104, 140734115004064, 140087698122005, 0, 0, 0, 0, 140734115004064, 19328986, 140087678216602, 140087700275632, 108, 8796505...} toast_oldexternal = {0x0 <repeats 181 times>, 0x833781 <hash_search_with_hash_value+193>, 0x3c, 0x100, 0x7fff36eec5e0, 0x0, 0x0, 0x0, 0x0, 0x834444 <string_hash+36>, 0x7b, 0x1234820, 0x7fff36eec5e0, 0x124eee3, 0x7fff36000000, 0xcd25f8 <ConfigureNamesString+4408>, 0x0, 0x0, 0x6aa00000000, 0x1100000009, 0xffffffff00540100, 0xffffffff1a5d09cb, 0xffffffff9b26ada0, 0xffffffff9bd60520, 0xffffffff9ccf30a0, 0xffffffff9da4c3a0, 0xffffffff9e9c9da0, 0xffffffff9f971aa0, 0xffffffffa085ba20, 0xffffffffa176fca0, 0xffffffffa2659c20, 0xffffffffa37bc8a0, 0xffffffffa44eb8a0, 0xffffffffa53ffb20, 0xffffffffa6256020, 0xffffffffa727c620, 0xffffffffa82a2c20, 0xffffffffa8ebf8a0, 0xffffffffaa00d3a0, 0xffffffffaad51520, 0xffffffffabe9f020, 0xffffffffacc76c20, 0xffffffffadc9d220, 0xffffffffaea74e20, 0xffffffffafa079a0, 0xffffffffb0873020, 0xffffffffb192d0a0, 0xffffffffb2704ca0, 0xffffffffb372b2a0, 0xffffffffb4502ea0, 0xffffffffb5495a20, 0xffffffffb63010a0, 0xffffffffb73276a0, 0xffffffffb80ff2a0, 0xffffffffb91258a0, 0xffffffffb9efd4a0, 0xffffffffbae90020, 0xffffffffbbd8f120, 0xffffffffbcdb5720, 0xffffffffbdb8d320, 0xffffffffbeb1fea0, 0xffffffffbf98b520, 0xffffffffc09b1b20, 0xffffffffc1789720, 0xffffffffc27afd20, 0xffffffffc3587920, 0xffffffffc451a4a0, 0xffffffffc5385b20, 0xffffffffc63ac120, 0xffffffffc758d6a0, 0xffffffffc7da09a0, 0xffffffffca162690, 0xffffffffca975990, 0xffffffffcbd11e90, 0xffffffffcc773b90, 0xffffffffcdb10090, 0xffffffffce605810, 0xffffffffcf90e290, 0xffffffffd06e5e90, 0xffffffffd1721610, 0xffffffffd1fb3210, 0xffffffffd269fe20, 0xffffffffd36329a0, 0xffffffffd449e020, 0xffffffffd51e21a0, 0xffffffffd542fd90, 0xffffffffd5dfe010, 0xffffffffd64eac20, 0xffffffffd6fe03a0, 0xffffffffd82e8e20, 0xffffffffd8f99520, 0xffffffffda0e7020, 0xffffffffdaebec20, 0xffffffffdbe517a0, 0xffffffffdccbce20, 0xffffffffddc4f9a0, 0xffffffffdeb4eaa0, 0xffffffffdfae1620, 0xffffffffe094cca0, 0xffffffffe17248a0, 0xffffffffe26b7420, 0xffffffffe3522aa0, 0xffffffffe45490a0, 0xffffffffe5320ca0, 0xffffffffe63dad20, 0xffffffffe71b2920, 0xffffffffe81454a0, 0xffffffffe8fb0b20, 0xffffffffe9fd7120, 0xffffffffeadaed20, 0xffffffffebdd5320, 0xffffffffecbacf20, 0xffffffffedb3faa0, 0xffffffffee9ab120, 0xffffffffef8167a0, 0xfffffffff09f7d20, 0xfffffffff16149a0, 0xfffffffff27f5f20, 0xfffffffff34a6620, 0xfffffffff45f4120, 0xfffffffff5210da0, 0xfffffffff63f2320, 0xfffffffff700efa0, 0xfffffffff81f0520, 0xfffffffff8e0d1a0, 0xfffffffff9fee720, 0xfffffffffac0b3a0, 0xfffffffffbe803a0, 0xfffffffffc7baba0, 0xfffffffffdc7bb70, 0x370c620, 0x4295820, 0x550a820, 0x6093a20, 0x7308a20, 0x7e91c20, 0x9106c20, 0x9c8fe20, 0xaf04e20, 0xbb21aa0, 0xcd03020, 0xd91fca0, 0xeb01220, 0xf71dea0, 0x10992ea0, 0x1151c0a0, 0x127910a0, 0x1331a2a0, 0x1458f2a0, 0x1523eb90, 0x1638c690, 0x1703cd90, 0x1818a890, 0x18e3af90, 0x19f88a90, 0x1ac39190, 0x1be1a710, 0x1cacae10, 0x1dc18910, 0x1e8c9010, 0x1fa16b10, 0x206c7210, 0x21814d10, 0x224c5410, 0x23612f10, 0x242c3610, 0x254a4b90, 0x260c1810, 0x272a2d90, 0x27f53490, 0x290a0f90, 0x29d51690, 0x2ae9f190, 0x2bb4f890, 0x2cc9d390, 0x2d94da90, 0x2ea9b590, 0x2f74bc90, 0x30899790, 0x315dd910, 0x3272b410, 0x333dbb10, 0x34529610, 0x351d9d10, 0x36327810, 0x36fd7f10, 0x381b9490, 0x38dd6110, 0x39fb7690, 0x3abd4310, 0x3bdb5890...} toast_sizes = {0, 0, 9, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 3, 0 <repeats 49 times>, 53 <repeats 12 times>, 27, 27, 27, 41 <repeats 12 times>, 0 <repeats 61 times>, 9, 0, 0, 81, 81, 81, 81, 51, 51, 81, 81, 81, 81, 144, 144, 51, 51, 0 <repeats 1427 times>} toast_free = {false <repeats 68 times>, true, false <repeats 11 times>, true, true, true, false <repeats 76 times>, true, true, true, true, false, false, true, true, true, true, true, true, false <repeats 1429 times>} toast_delold = {false <repeats 1600 times>} #15 0x00000000004b8a87 in heap_multi_insert (relation=0x7f607fe1cbb8, tuples=0x0, tuples@entry=0x1341428, ntuples=0, ntuples@entry=22, cid=0, options=0, bistate=0x7fffffff, bistate@entry=0x12c9710) at heapam.c:2730 xid = 1708296130 heaptuples = 0x136dc08 scratch = { data = "\002/\004\000\311\a\265\000\006\b\030\000\063#\000\000\277\303\a\000\023\062\064\065\061p325\000\000\000b\n\000\000\001\000\000\000\000\000\tPSF\000\000\000\000\000\000g\022\240\276\361\246n@\342\273\327E^1@@\372\036\231N\a\"\256N7A\240DF\355dC\226\000\000\000(\000\000\000\004\001\000\003\001\274\002\000\000\005\b\000\000\000\001\020L\264CB0V\313>B\004\033\002\001\000\000\000J\n\254<\000\000\000\000\230a\311\357p;\354@\001\252\367\061LF\354@\003", '\000' <repeats 119 times>..., force_align_d = 2.9948299028864589e-305, force_align_i64 = 50955530654723842} page = 0x1 <error: Cannot access memory at address 0x1> #16 0x0000000000586edf in CopyFromInsertBatch (cstate=cstate@entry=0x12c92d0, estate=estate@entry=0x12d0e80, mycid=mycid@entry=0, hi_options=hi_options@entry=0, resultRelInfo=resultRelInfo@entry=0x128fde8, myslot=myslot@entry=0x12d2db8, bistate=0x12c9710, nBufferedTuples=22, bufferedTuples=0x1341428, firstBufferedLineNo=2663) at copy.c:2973 oldcontext = 0x128c110 i = <optimized out> save_cur_lineno = 2684 #17 0x000000000058c77e in CopyFrom (cstate=cstate@entry=0x12c92d0) at copy.c:2815 slot = 0x12d2db8 skip_tuple = false loaded_oid = 0 tuple = 0x136cbf8 tupDesc = 0x12beb88 values = 0x12d34b8 nulls = 0x1329c78 resultRelInfo = <optimized out> saved_resultRelInfo = 0x0 estate = 0x12d0e80 mtstate = 0x12c9500 econtext = 0x12d1090 myslot = 0x12d2db8 oldcontext = 0x128c110 errcallback = {previous = 0x0, callback = 0x586d00 <CopyFromErrorCallback>, arg = 0x12c92d0} mycid = 0 hi_options = 0 bistate = 0x12c9710 processed = 2683 useHeapMultiInsert = true nBufferedTuples = <optimized out> prev_leaf_part_index = -1 bufferedTuples = 0x1341428 bufferedTuplesSize = 67408 firstBufferedLineNo = 2663 __func__ = "CopyFrom" #18 0x000000000058cb53 in DoCopy (pstate=0x128c220, stmt=<optimized out>, stmt_location=0, stmt_len=2354, processed=0x7fff36ef14b8) at copy.c:996 cstate = 0x12c92d0 is_from = true pipe = <optimized out> rel = 0x7f607fd983a8 relid = 310933852 query = 0x0 __func__ = "DoCopy" #19 0x0000000000731258 in standard_ProcessUtility (pstmt=0x1234560, queryString=0x1233030 "COPY decals9 (release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"..., context=PROCESS_UTILITY_TOPLEVEL, params=0xffffffffffffffff, queryEnv=0x0, dest=0x7fffffff, completionTag=0x7fff36ef1740 "") at utility.c:551 processed = 19943856 pstate = 0x128c220 __func__ = "standard_ProcessUtility" #20 0x000000000072e5ba in PortalRunUtility (portal=0x12f4150, pstmt=0x1234560, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x1234670, completionTag=<optimized out>) at pquery.c:1178 utilityStmt = <optimized out> snapshot = <optimized out> #21 0x000000000072f027 in PortalRunMulti (portal=portal@entry=0x12f4150, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x1234670, altdest=altdest@entry=0x1234670, completionTag=completionTag@entry=0x7fff36ef1740 "") at pquery.c:1331 pstmt = 0x1234560 active_snapshot_set = false stmtlist_item = 0x12bc400 #22 0x000000000072fbd7 in PortalRun (portal=0x12f4150, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x1234670, altdest=0x1234670, completionTag=0x7fff36ef1740 "") at pquery.c:799 save_exception_stack = 0x7fff36ef19b0 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {1, -6064393298424732441, 19645008, 140734115026752, 19874128, 2, 6064270506639647975, -6064395009656331033}, __mask_was_saved = 0, __saved_mask = { __val = {0, 140734115026655, 0, 9981816, 8602692, 1, 19882336, 10328646, 19874128, 10328646, 1, 1, 112, 19644960, 19080960, 9895294}}}} result = <optimized out> nprocessed = <optimized out> saveTopTransactionResourceOwner = 0x1285838 saveTopTransactionContext = 0x13050a0 saveActivePortal = 0x0 saveResourceOwner = 0x1285838 savePortalContext = 0x0 saveMemoryContext = 0x13050a0 __func__ = "PortalRun" #23 0x000000000072be17 in exec_simple_query ( query_string=0x7f607fe1cbb8 "\001") at postgres.c:1145 parsetree = 0x12bc220 portal = 0x12f4150 commandTag = 0x7fff36ef1740 "" completionTag = "\000\335'\001\000\000\000\000\b\375'\001\002\000\000\000\000\000\000\000\002\000\000\000Q\000\000\000\000\000\000\000\060\060#\001\000\000\000\000\b\375'\001\000\000\000\000\b\375'\001", '\000' <repeats 11 times> plantree_list = 0x1234670 receiver = 0x1234670 format = 0 dest = DestRemote parsetree_item = 0x12bc250 msec_str = "\000\335'\001\000\000\000\000\b\375'\001\002\000\000\000\000\000\000\000\002\000\000\000Q\000\000\000\000\000\000" __func__ = "exec_simple_query" #24 0x000000000072dade in PostgresMain (argc=1, argv=0x1233030, dbname=0x127fd20 "wsdb", username=0x127fd08 "koposov") at postgres.c:4193 query_string = 0x1233030 "COPY decals9 (release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"... firstchar = 19083312 input_message = { data = 0x1233030 "COPY decals9 (release,brickid,brickname,objid,brick_primary,maskbits,fitbits,type,ra,dec,ra_ivar,dec_ivar,bx,by,dchisq,ebv,mjd_min,mjd_max,ref_cat,ref_id,pmra,pmdec,parallax,pmra_ivar,pmdec_ivar,paral"..., len = 2356, maxlen = 4096, cursor = 2356} local_sigjmp_buf = {{__jmpbuf = {2523489864, -6064395710111914777, 19397896, 19397896, 0, 19382096, 6064270506719339751, -6064395007743597337}, __mask_was_saved = 1, __saved_mask = { __val = {0, 19397920, 140734115028992, 47, 19080960, 1024, 19398080, 19397896, 8711789, 0, 140734115027568, 140734115027664, 6479486, 13500160, 8610716, 19086720}}}} send_ready_for_query = false __func__ = "PostgresMain" #25 0x000000000047e812 in BackendRun (port=0x128b780) at postmaster.c:4356 ac = 1 secs = 658859760 usecs = 545108 i = 1 av = 0x127fdc0 maxac = <optimized out> #26 BackendStartup (port=0x128b780) at postmaster.c:4028 bn = 0x127bf50 pid = 0 #27 ServerLoop () at postmaster.c:1707 rmask = {fds_bits = {128, 0 <repeats 15 times>}} selres = <optimized out> now = <optimized out> readmask = {fds_bits = {248, 0 <repeats 15 times>}} last_lockfile_recheck_time = <optimized out> last_touch_time = <optimized out> __func__ = "ServerLoop" #28 0x00000000006c39a3 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x122d3c0) at postmaster.c:1380 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = true i = <optimized out> output_config_variable = <optimized out> __func__ = "PostmasterMain" #29 0x000000000047f748 in main (argc=3, argv=0x122d3c0) at main.c:228 No locals. quit A debugging session is active. Inferior 1 [process 115811] will be detached. Quit anyway? (y or n) Detaching from program: /opt/pgsql11/bin/postgres, process 115811 *************************************** SCHEMA: ************************************ Table "koposov.decals9" Column | Type | Collation | Nullable | Default -----------------------------------+--------------------+-----------+----------+--------- release | smallint | | | brickid | integer | | | brickname | text | | | objid | integer | | | brick_primary | smallint | | | maskbits | smallint | | | fitbits | smallint | | | type | text | | | ra | double precision | | | dec | double precision | | | ra_ivar | real | | | dec_ivar | real | | | bx | real | | | by | real | | | dchisq | real[] | | | ebv | real | | | mjd_min | double precision | | | mjd_max | double precision | | | ref_cat | text | | | ref_id | bigint | | | pmra | real | | | pmdec | real | | | parallax | real | | | pmra_ivar | real | | | pmdec_ivar | real | | | parallax_ivar | real | | | ref_epoch | real | | | gaia_phot_g_mean_mag | real | | | gaia_phot_g_mean_flux_over_error | real | | | gaia_phot_g_n_obs | smallint | | | gaia_phot_bp_mean_mag | real | | | gaia_phot_bp_mean_flux_over_error | real | | | gaia_phot_bp_n_obs | smallint | | | gaia_phot_rp_mean_mag | real | | | gaia_phot_rp_mean_flux_over_error | real | | | gaia_phot_rp_n_obs | smallint | | | gaia_phot_variable_flag | smallint | | | gaia_astrometric_excess_noise | real | | | gaia_astrometric_excess_noise_sig | real | | | gaia_astrometric_n_obs_al | smallint | | | gaia_astrometric_n_good_obs_al | smallint | | | gaia_astrometric_weight_al | real | | | gaia_duplicated_source | smallint | | | gaia_a_g_val | real | | | gaia_e_bp_min_rp_val | real | | | gaia_phot_bp_rp_excess_factor | real | | | gaia_astrometric_sigma5d_max | real | | | gaia_astrometric_params_solved | smallint | | | flux_g | real | | | flux_r | real | | | flux_z | real | | | flux_w1 | real | | | flux_w2 | real | | | flux_w3 | real | | | flux_w4 | real | | | flux_ivar_g | real | | | flux_ivar_r | real | | | flux_ivar_z | real | | | flux_ivar_w1 | real | | | flux_ivar_w2 | real | | | flux_ivar_w3 | real | | | flux_ivar_w4 | real | | | fiberflux_g | real | | | fiberflux_r | real | | | fiberflux_z | real | | | fibertotflux_g | real | | | fibertotflux_r | real | | | fibertotflux_z | real | | | apflux_g | real[] | | | apflux_r | real[] | | | apflux_z | real[] | | | apflux_resid_g | real[] | | | apflux_resid_r | real[] | | | apflux_resid_z | real[] | | | apflux_blobresid_g | real[] | | | apflux_blobresid_r | real[] | | | apflux_blobresid_z | real[] | | | apflux_ivar_g | real[] | | | apflux_ivar_r | real[] | | | apflux_ivar_z | real[] | | | apflux_masked_g | real[] | | | apflux_masked_r | real[] | | | apflux_masked_z | real[] | | | apflux_w1 | real[] | | | apflux_w2 | real[] | | | apflux_w3 | real[] | | | apflux_w4 | real[] | | | apflux_resid_w1 | real[] | | | apflux_resid_w2 | real[] | | | apflux_resid_w3 | real[] | | | apflux_resid_w4 | real[] | | | apflux_ivar_w1 | real[] | | | apflux_ivar_w2 | real[] | | | apflux_ivar_w3 | real[] | | | apflux_ivar_w4 | real[] | | | mw_transmission_g | real | | | mw_transmission_r | real | | | mw_transmission_z | real | | | mw_transmission_w1 | real | | | mw_transmission_w2 | real | | | mw_transmission_w3 | real | | | mw_transmission_w4 | real | | | nobs_g | smallint | | | nobs_r | smallint | | | nobs_z | smallint | | | nobs_w1 | smallint | | | nobs_w2 | smallint | | | nobs_w3 | smallint | | | nobs_w4 | smallint | | | rchisq_g | real | | | rchisq_r | real | | | rchisq_z | real | | | rchisq_w1 | real | | | rchisq_w2 | real | | | rchisq_w3 | real | | | rchisq_w4 | real | | | fracflux_g | real | | | fracflux_r | real | | | fracflux_z | real | | | fracflux_w1 | real | | | fracflux_w2 | real | | | fracflux_w3 | real | | | fracflux_w4 | real | | | fracmasked_g | real | | | fracmasked_r | real | | | fracmasked_z | real | | | fracin_g | real | | | fracin_r | real | | | fracin_z | real | | | anymask_g | smallint | | | anymask_r | smallint | | | anymask_z | smallint | | | allmask_g | smallint | | | allmask_r | smallint | | | allmask_z | smallint | | | wisemask_w1 | smallint | | | wisemask_w2 | smallint | | | psfsize_g | real | | | psfsize_r | real | | | psfsize_z | real | | | psfdepth_g | real | | | psfdepth_r | real | | | psfdepth_z | real | | | galdepth_g | real | | | galdepth_r | real | | | galdepth_z | real | | | nea_g | real | | | nea_r | real | | | nea_z | real | | | blob_nea_g | real | | | blob_nea_r | real | | | blob_nea_z | real | | | psfdepth_w1 | real | | | psfdepth_w2 | real | | | psfdepth_w3 | real | | | psfdepth_w4 | real | | | wise_coadd_id | text | | | wise_x | real | | | wise_y | real | | | lc_flux_w1 | real[] | | | lc_flux_w2 | real[] | | | lc_flux_ivar_w1 | real[] | | | lc_flux_ivar_w2 | real[] | | | lc_nobs_w1 | smallint[] | | | lc_nobs_w2 | smallint[] | | | lc_fracflux_w1 | real[] | | | lc_fracflux_w2 | real[] | | | lc_rchisq_w1 | real[] | | | lc_rchisq_w2 | real[] | | | lc_mjd_w1 | double precision[] | | | lc_mjd_w2 | double precision[] | | | lc_epoch_index_w1 | smallint[] | | | lc_epoch_index_w2 | smallint[] | | | sersic | real | | | sersic_ivar | real | | | shape_r | real | | | shape_r_ivar | real | | | shape_e1 | real | | | shape_e1_ivar | real | | | shape_e2 | real | | | shape_e2_ivar | real | | |
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Fujii Masao
Дата:
On 2020/11/17 3:15, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 16722 > Logged by: Sergey Koposov > Email address: skoposov@ed.ac.uk > PostgreSQL version: 11.9 > Operating system: debian > Description: > > Hi, > > When ingesting a billion or so rows in the table that has some array columns > (see schema below), at some point all the backends doing the ingestion hang > with 100% CPU usage. When investigating, I think I can trace this to the > limit of 2^32 toasted records per table. Yes, this is the limitation of the number of out-of-line values in toast. https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit > See the gdb full backtrace of > hanged backends in the bottom. When the problem occurs, it has 272 mill > records. > Basically the hanging happens in GetNewOidWithIndex called by > toast_save_datum. > While I understand the limit for the toast number is there to stay, but the > behaviour of PG next to the limit is IMO a bug (or at least non-trivial) . > > I would rather prefer to see an error-message as opposed to backends > hanging. To emit an error, we need to check that there is no unused OID for the toast and it would take very long to do that. So I'm not sure if to emit an error message really improves the current situation or not. OTOH it might be good idea to emit a warning message (every time we cannot find unused OID in recent 1 million OIDs, for example) or report the issue as wait event, or something while the record insertion is hanging because of toast limit, so that we can easily detect the issue. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Magnus Hagander
Дата:
On Wed, Nov 18, 2020 at 2:18 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/11/17 3:15, PG Bug reporting form wrote: > > The following bug has been logged on the website: > > > > Bug reference: 16722 > > Logged by: Sergey Koposov > > Email address: skoposov@ed.ac.uk > > PostgreSQL version: 11.9 > > Operating system: debian > > Description: > > > > Hi, > > > > When ingesting a billion or so rows in the table that has some array columns > > (see schema below), at some point all the backends doing the ingestion hang > > with 100% CPU usage. When investigating, I think I can trace this to the > > limit of 2^32 toasted records per table. > > Yes, this is the limitation of the number of out-of-line values in toast. > https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit > > > > See the gdb full backtrace of > > hanged backends in the bottom. When the problem occurs, it has 272 mill > > records. > > Basically the hanging happens in GetNewOidWithIndex called by > > toast_save_datum. > > While I understand the limit for the toast number is there to stay, but the > > behaviour of PG next to the limit is IMO a bug (or at least non-trivial) . > > > > I would rather prefer to see an error-message as opposed to backends > > hanging. > > To emit an error, we need to check that there is no unused OID for > the toast and it would take very long to do that. So I'm not sure > if to emit an error message really improves the current situation or not. > > OTOH it might be good idea to emit a warning message (every time we > cannot find unused OID in recent 1 million OIDs, for example) or report > the issue as wait event, or something while the record insertion is hanging > because of toast limit, so that we can easily detect the issue. This definitely sounds like something that's worth putting out as a wait event. Even before you start traversing millions of OIDs it might gradually start to show up, and being able to monitor that would definitely be useful. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Andres Freund
Дата:
Hi, On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote: > This definitely sounds like something that's worth putting out as a > wait event. Even before you start traversing millions of OIDs it might > gradually start to show up, and being able to monitor that would > definitely be useful. I don't think this is likely to work well as a wait event. All the index traversals etc will do IO, acquire locks, etc, which will overwrite the wait event and reset it to nothing once done. Greetings, Andres Freund
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Magnus Hagander
Дата:
On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote: > > This definitely sounds like something that's worth putting out as a > > wait event. Even before you start traversing millions of OIDs it might > > gradually start to show up, and being able to monitor that would > > definitely be useful. > > I don't think this is likely to work well as a wait event. All the index > traversals etc will do IO, acquire locks, etc, which will overwrite the > wait event and reset it to nothing once done. Oh meh. Yeah, I didn't think about the actual implementation of doing it :/ That said, I'd really prefer to see *some* kind of counter that could get people noticing this a bit earlier, rather than just bugging them in the logfile once it's gone over a threshold. A statistics counter maybe, but I guess that'd have to be tracked at a per-table level to be really useful, which would make it potentially fairly expensive to keep around... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Andres Freund
Дата:
Hi, On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote: > That said, I'd really prefer to see *some* kind of counter that could > get people noticing this a bit earlier, rather than just bugging them > in the logfile once it's gone over a threshold. A statistics counter > maybe, but I guess that'd have to be tracked at a per-table level to > be really useful, which would make it potentially fairly expensive to > keep around... I'd rather spend effort to solve the underlying issue. It's a bit more work, but not crazily so, I think. Even the pg_upgrade pieces aren't that hard to solve. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote: >> That said, I'd really prefer to see *some* kind of counter that could >> get people noticing this a bit earlier, rather than just bugging them >> in the logfile once it's gone over a threshold. A statistics counter >> maybe, but I guess that'd have to be tracked at a per-table level to >> be really useful, which would make it potentially fairly expensive to >> keep around... > I'd rather spend effort to solve the underlying issue. It's a bit more > work, but not crazily so, I think. Even the pg_upgrade pieces aren't > that hard to solve. Yeah. If we're going to put work into this, widening the IDs used to identify toast values seems like the right work to be doing. Having said that, I had imagined that we might never have to fix it, because if your table is big enough that it has a problem of this ilk then likely you want to partition it anyway. And partitioning solves the problem since each partition has its own toast table. regards, tom lane
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Sergey KOPOSOV
Дата:
On Wed, 2020-11-18 at 19:02 -0500, Tom Lane wrote: > This email was sent to you by someone outside the University. > You should only click on links or attachments if you are certain that the email is genuine and the content is safe. > > Andres Freund <andres@anarazel.de> writes: > > On 2020-11-18 23:12:10 +0100, Magnus Hagander wrote: > > > That said, I'd really prefer to see *some* kind of counter that could > > > get people noticing this a bit earlier, rather than just bugging them > > > in the logfile once it's gone over a threshold. A statistics counter > > > maybe, but I guess that'd have to be tracked at a per-table level to > > > be really useful, which would make it potentially fairly expensive to > > > keep around... > > I'd rather spend effort to solve the underlying issue. It's a bit more > > work, but not crazily so, I think. Even the pg_upgrade pieces aren't > > that hard to solve. > > Yeah. If we're going to put work into this, widening the IDs used > to identify toast values seems like the right work to be doing. > > Having said that, I had imagined that we might never have to fix it, > because if your table is big enough that it has a problem of this > ilk then likely you want to partition it anyway. And partitioning > solves the problem since each partition has its own toast table. I can say that for my case of very large static clustered tables the partitioning doesn't really offer much benefits, only overhead (To the best of my understanding at least), so I'm not using it. Sergey PS And the current workaround that I used for this particular dataset that I reported was to just expand a few arrays that I have in the table into columns (as all arrays had static lengths) to not have any toasts at all. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Andres Freund
Дата:
Hi, On 2020-11-18 19:02:34 -0500, Tom Lane wrote: > Having said that, I had imagined that we might never have to fix it, > because if your table is big enough that it has a problem of this > ilk then likely you want to partition it anyway. And partitioning > solves the problem since each partition has its own toast table. A few billion rows isn't that much anymore, and partitioning has a fair number of restrictions (plus this only needs 4 billion toasted fields, not 4 billion rows). More importantly, to hit problems around this, one doesn't even have to have all those rows in a table at once - performance will suffer once the oid counter has wrapped around, especially if there's longer sequences of assigned values. > Yeah. If we're going to put work into this, widening the IDs used > to identify toast values seems like the right work to be doing. To outline, here's what I think the two major pieces to get there are: 1) Make toast oid assignment independent of the oid counter. The easiest way likely is to also create a serial and use that. That alone improves the situation considerably, because it takes much longer to to wrap around in each toast table. The overhead of the additional WAL records isn't nothing, but compared to maintaining a btree it's not likely to be measurable. 2) Widen non pg-upgraded toast tables to have 64bit chunk_id field. To reference chunks >= 2^32, add VARTAG_ONDISK64, which is only used when needed. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-11-18 19:02:34 -0500, Tom Lane wrote: >> Yeah. If we're going to put work into this, widening the IDs used >> to identify toast values seems like the right work to be doing. > To outline, here's what I think the two major pieces to get there are: > 1) Make toast oid assignment independent of the oid counter. The easiest > way likely is to also create a serial and use that. That alone > improves the situation considerably, because it takes much longer to > to wrap around in each toast table. I'm not really on board with creating an actual sequence object for each toast table. I don't think they are efficient enough. I had imagined internally widening the OID counter to 64 bits, of which ordinary OID usage would just discard the high 32 bits, but for TOAST chunk IDs we could use the whole value. Now, compared to your idea that would result in using the wider toast-pointer format more often, but I think there are ways around that. For example, we could do (1) Grab the next 64-bit OID. (2) Probe the toast table to see if the low 32 bits correspond to a value in use in that toast table. If not, just use the low 32 bits. (3) Otherwise, use the whole 64 bits and the wider pointer format. (We could do another uniqueness probe here, but I think it's unnecessary, since the 64-bit counter will never wrap around.) Since the existing OID assignment mechanism requires a uniqueness probe already, this isn't any more expensive than what we do now. regards, tom lane
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Andres Freund
Дата:
Hi, On 2020-11-18 20:17:55 -0500, Tom Lane wrote: > I'm not really on board with creating an actual sequence object for each > toast table. I don't think they are efficient enough. I'm not convinced that's true. Sure, nextval() could stand to be cheaper - but on the other hand, this would allow us to entirely avoid doing ahead-of-time uniqueness probes. Needing one extra WAL record every SEQ_LOG_VALS/32 toast insertions seems likely to be win over doing at least SEQ_LOG_VALS-1 additional uniqueness index probes. It's not like the oid counter is free of contention - one benefit of what I suggest would be that the potential contention point would move from system wide to table wide. I think reducing the rate of oid counter advancement is a significant benefit, moving the point of where we have to care about efficiency problems around system oid assignment costs quite a bit further into the future. I think there's also some small potential benefits around indexing in having densely allocated toast chunk ids for each toast table, over the sparser allocation we have right now. Greetings, Andres Freund
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
tomohiro hiramitsu
Дата:
Hi
> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
>
> That said, I'd really prefer to see *some* kind of counter that could
> get people noticing this a bit earlier, rather than just bugging them
> in the logfile once it's gone over a threshold. A statistics counter
> maybe, but I guess that'd have to be tracked at a per-table level to
> be really useful, which would make it potentially fairly expensive to
> keep around...
I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to notify users as the first step?
I think it's easy to implement and helpful for users.
First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID, and provide an opportunity for the user to deal with this problem with statement_timeout etc.
This is an interim measure, but I think it's better than it is.
Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply outputs logs, giving priority to avoiding a hang state without outputting logs.
* v01-0001-OID_log_output.patch
In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID.
Also, if the log is output even once, it will be output to the log when the OID is assigned.
* This patch does not cause an error even if it loops too many times.
How much time can be tolerated in the OID search loop depends on the user, so I think that the user should decide the time to make an error with statement_timeout.
* The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log outputs is set to 5.
The reason for setting this threshold is to reduce the number of log outputs and output the log before stopping at statement_timeout etc. I think the appropriate value for this threshold is controversial.
Since it is not necessary to keep logging, the upper limit is set to 5 times.
* The log output interval threshold is set to 1 million for the first log output.
In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2 minutes.I think this threshold is just right considering statements_timeout, but it's controversial.
Log output does not need to continue output during OID assignment. I think the first few and the last log are enough.
I will add this patch to the commitfest. I look forward to your feedback about the patch.
By the way
How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID?
For example, skip one and assign an OID.
As a side effect of this method, the OID progresses faster.
(Maybe the btree index will be fragmented faster)
It may be better to include skips only for TOAST chunk_id, as this method may have other side effects.
regards.
--
Tomohiro Hiramitsu
NTT Open Source Software Center
> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
>
> That said, I'd really prefer to see *some* kind of counter that could
> get people noticing this a bit earlier, rather than just bugging them
> in the logfile once it's gone over a threshold. A statistics counter
> maybe, but I guess that'd have to be tracked at a per-table level to
> be really useful, which would make it potentially fairly expensive to
> keep around...
I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to notify users as the first step?
I think it's easy to implement and helpful for users.
First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID, and provide an opportunity for the user to deal with this problem with statement_timeout etc.
This is an interim measure, but I think it's better than it is.
Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply outputs logs, giving priority to avoiding a hang state without outputting logs.
* v01-0001-OID_log_output.patch
In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID.
Also, if the log is output even once, it will be output to the log when the OID is assigned.
* This patch does not cause an error even if it loops too many times.
How much time can be tolerated in the OID search loop depends on the user, so I think that the user should decide the time to make an error with statement_timeout.
* The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log outputs is set to 5.
The reason for setting this threshold is to reduce the number of log outputs and output the log before stopping at statement_timeout etc. I think the appropriate value for this threshold is controversial.
Since it is not necessary to keep logging, the upper limit is set to 5 times.
* The log output interval threshold is set to 1 million for the first log output.
In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2 minutes.I think this threshold is just right considering statements_timeout, but it's controversial.
Log output does not need to continue output during OID assignment. I think the first few and the last log are enough.
I will add this patch to the commitfest. I look forward to your feedback about the patch.
By the way
How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID?
For example, skip one and assign an OID.
As a side effect of this method, the OID progresses faster.
(Maybe the btree index will be fragmented faster)
It may be better to include skips only for TOAST chunk_id, as this method may have other side effects.
regards.
--
Tomohiro Hiramitsu
NTT Open Source Software Center
2020年12月21日(月) 16:44 Magnus Hagander <magnus@hagander.net>:
On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote:
> > This definitely sounds like something that's worth putting out as a
> > wait event. Even before you start traversing millions of OIDs it might
> > gradually start to show up, and being able to monitor that would
> > definitely be useful.
>
> I don't think this is likely to work well as a wait event. All the index
> traversals etc will do IO, acquire locks, etc, which will overwrite the
> wait event and reset it to nothing once done.
Oh meh. Yeah, I didn't think about the actual implementation of doing it :/
That said, I'd really prefer to see *some* kind of counter that could
get people noticing this a bit earlier, rather than just bugging them
in the logfile once it's gone over a threshold. A statistics counter
maybe, but I guess that'd have to be tracked at a per-table level to
be really useful, which would make it potentially fairly expensive to
keep around...
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
Вложения
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Kasahara Tatsuhito
Дата:
Hi, On Mon, Dec 28, 2020 at 4:31 PM tomohiro hiramitsu <hiramit.tm@gmail.com> wrote: > > Hi > > > Oh meh. Yeah, I didn't think about the actual implementation of doing it :/ > > > > That said, I'd really prefer to see *some* kind of counter that could > > get people noticing this a bit earlier, rather than just bugging them > > in the logfile once it's gone over a threshold. A statistics counter > > maybe, but I guess that'd have to be tracked at a per-table level to > > be really useful, which would make it potentially fairly expensive to > > keep around... > > I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to notify usersas the first step? > I think it's easy to implement and helpful for users. +1 It is a simple and quick way for users to know if the OID assignment is taking a long time or not. At least it will be better than now. > First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID, andprovide an opportunity for the user to deal with this problem with statement_timeout etc. > This is an interim measure, but I think it's better than it is. > > Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply outputslogs, giving priority to avoiding a hang state without outputting logs. > > * v01-0001-OID_log_output.patch > In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID. > Also, if the log is output even once, it will be output to the log when the OID is assigned. > > * This patch does not cause an error even if it loops too many times. > How much time can be tolerated in the OID search loop depends on the user, so I think that the user should decidethe time to make an error with statement_timeout. > > * The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log outputsis set to 5. > The reason for setting this threshold is to reduce the number of log outputs and output the log before stopping atstatement_timeout etc. I think the appropriate value for this threshold is controversial. > Since it is not necessary to keep logging, the upper limit is set to 5 times. > > * The log output interval threshold is set to 1 million for the first log output. > In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2 minutes.Ithink this threshold is just right considering statements_timeout, but it's controversial. > > Log output does not need to continue output during OID assignment. I think the first few and the last log are enough. > > I will add this patch to the commitfest. I look forward to your feedback about the patch. > > > By the way > How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID? > For example, skip one and assign an OID. > > As a side effect of this method, the OID progresses faster. > (Maybe the btree index will be fragmented faster) I think the idea of assigning OIDs to sparse is interesting. However, I think it needs to find out how much it will affect the performance. It could be a non-negligible overhead, especially for large amounts of data insertion/updating. Best regards, > It may be better to include skips only for TOAST chunk_id, as this method may have other side effects. > > regards. > -- > Tomohiro Hiramitsu > NTT Open Source Software Center > > 2020年12月21日(月) 16:44 Magnus Hagander <magnus@hagander.net>: >> >> On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote: >> > >> > Hi, >> > >> > On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote: >> > > This definitely sounds like something that's worth putting out as a >> > > wait event. Even before you start traversing millions of OIDs it might >> > > gradually start to show up, and being able to monitor that would >> > > definitely be useful. >> > >> > I don't think this is likely to work well as a wait event. All the index >> > traversals etc will do IO, acquire locks, etc, which will overwrite the >> > wait event and reset it to nothing once done. >> >> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/ >> >> That said, I'd really prefer to see *some* kind of counter that could >> get people noticing this a bit earlier, rather than just bugging them >> in the logfile once it's gone over a threshold. A statistics counter >> maybe, but I guess that'd have to be tracked at a per-table level to >> be really useful, which would make it potentially fairly expensive to >> keep around... >> >> -- >> Magnus Hagander >> Me: https://www.hagander.net/ >> Work: https://www.redpill-linpro.com/ >> >> >> >> -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
tomohiro hiramitsu
Дата:
Hi
On Wed, Jan 6, 2021 at 11:13 AM Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> wrote:
> By the way
> How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID?
> For example, skip one and assign an OID.
>
> As a side effect of this method, the OID progresses faster.
> (Maybe the btree index will be fragmented faster)
I think the idea of assigning OIDs to sparse is interesting.
However, I think it needs to find out how much it will affect the performance.
It could be a non-negligible overhead, especially for large amounts of
data insertion/updating.
I agree with you. I can't predict the impact of this method on performance.
I may need to do some performance tests to find out the impact of this method.
I may need to do some performance tests to find out the impact of this method.
Thank you for your reply.
regards.
--
Tomohiro Hiramitsu
NTT Open Source Software Center
--
Tomohiro Hiramitsu
NTT Open Source Software Center
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Kasahara Tatsuhito
Дата:
Hi, On Wed, Jan 6, 2021 at 11:13 AM Kasahara Tatsuhito <kasahara.tatsuhito@gmail.com> wrote: > > Hi, > > On Mon, Dec 28, 2020 at 4:31 PM tomohiro hiramitsu <hiramit.tm@gmail.com> wrote: > > > > Hi > > > > > Oh meh. Yeah, I didn't think about the actual implementation of doing it :/ > > > > > > That said, I'd really prefer to see *some* kind of counter that could > > > get people noticing this a bit earlier, rather than just bugging them > > > in the logfile once it's gone over a threshold. A statistics counter > > > maybe, but I guess that'd have to be tracked at a per-table level to > > > be really useful, which would make it potentially fairly expensive to > > > keep around... > > > > I think the statistics on OID assignments per-table is a good idea, but how about simply logging the event to notifyusers as the first step? > > I think it's easy to implement and helpful for users. > +1 > It is a simple and quick way for users to know if the OID assignment > is taking a long time or not. > At least it will be better than now. > > > First of all, I would like to provide a way for the user to know that it is taking a long time to generate the OID, andprovide an opportunity for the user to deal with this problem with statement_timeout etc. > > This is an interim measure, but I think it's better than it is. > > > > Since it is complicated to handle WaitEvent and statistics for each table, I first considered a patch that simply outputslogs, giving priority to avoiding a hang state without outputting logs. > > > > * v01-0001-OID_log_output.patch > > In this patch, GetNewOidWithIndex outputs a log when it loops too many times to assign a new OID. > > Also, if the log is output even once, it will be output to the log when the OID is assigned. > > > > * This patch does not cause an error even if it loops too many times. > > How much time can be tolerated in the OID search loop depends on the user, so I think that the user should decidethe time to make an error with statement_timeout. > > > > * The log output interval threshold increases by "* = 2"(exponential backoff) and the maximum number of log outputsis set to 5. > > The reason for setting this threshold is to reduce the number of log outputs and output the log before stoppingat statement_timeout etc. I think the appropriate value for this threshold is controversial. > > Since it is not necessary to keep logging, the upper limit is set to 5 times. > > > > * The log output interval threshold is set to 1 million for the first log output. > > In my environment, it takes about 4 seconds to loop 1 million times, and the 5th log is output after about 2 minutes.Ithink this threshold is just right considering statements_timeout, but it's controversial. > > > > Log output does not need to continue output during OID assignment. I think the first few and the last log are enough. > > > > I will add this patch to the commitfest. I look forward to your feedback about the patch. I tested the patch. The patch can be applied with the current HEAD (eb42110d952f8d1ad4049b8f2491e9dfba75ffed) and the regression test passed. I actually tried to reproduce OID assignment conflicts using the following procedure. ================================================================================================ DROP TABLE IF EXISTS t1; -- Create a table. SET TOAST_TUPLE_TARGET for generate much TOAST records. SELECT 'CREATE UNLOGGED TABLE t1 ( c1 int,' || string_agg(str, ',') || ') WITH (TOAST_TUPLE_TARGET = 128);' FROM (SELECT ' c' || i || ' text' as str FROM generate_series(2,21)i )t;\gexec -- Generate data and insert. This query generate 20 TOAST records per-rows. -- So this results in 16 million TOAST rows (same as GETNEWOID_NOTIFICATION_LIMIT). WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT md5(clock_timestamp()::text) AS str FROM generate_series(1,4)t)s) INSERT INTO t1 SELECT generate_series(1,800000) ,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r; -- Check min(chunk_id) to rewind OID following test SELECT 'SELECT min(chunk_id) FROM pg_toast.' || relname FROM pg_class WHERE oid = (SELECT reltoastrelid FROM pg_class WHERE relname = 't1');\gexec -- Check normal(non OID conflicting) insert's duration \timing WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT md5(clock_timestamp()::text) AS str FROM generate_series(1,4)t)s) INSERT INTO t1 SELECT 1,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r; INSERT 0 1 Time: 0.899 ms -- Reset OID to the value of min(chunk_id) by using pg_resetwal (-o) -- Check problematic(OID conflicting) insert's duration \timing WITH r AS (SELECT string_agg(str, '') as d FROM (SELECT md5(clock_timestamp()::text) AS str FROM generate_series(1,4)t)s) INSERT INTO t1 SELECT 1,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d,d FROM r; INSERT 0 1 Time: 36399.478 ms (00:36.399) ================================================================================================ And I got following log messages. (The STATEMENT section is long and has been cut) =================================================================================================== 2021-02-18 19:04:59.055 JST [32153] LOG: failed to assign new OID in relation "pg_toast_32018451" after 1000000 retries 2021-02-18 19:04:59.055 JST [32153] STATEMENT: WITH r ... 2021-02-18 19:05:01.299 JST [32153] LOG: failed to assign new OID in relation "pg_toast_32018451" after 2000000 retries 2021-02-18 19:05:01.299 JST [32153] STATEMENT: WITH r ... 2021-02-18 19:05:05.802 JST [32153] LOG: failed to assign new OID in relation "pg_toast_32018451" after 4000000 retries 2021-02-18 19:05:05.802 JST [32153] STATEMENT: WITH r ... 2021-02-18 19:05:14.853 JST [32153] LOG: failed to assign new OID in relation "pg_toast_32018451" after 8000000 retries 2021-02-18 19:05:14.853 JST [32153] STATEMENT: WITH r ... 2021-02-18 19:05:33.185 JST [32153] LOG: failed to assign new OID in relation "pg_toast_32018451" after 16000000 retries 2021-02-18 19:05:33.185 JST [32153] STATEMENT: WITH r ... 2021-02-18 19:05:33.185 JST [32153] LOG: the new OID has been assigned in relation "pg_toast_32018451" after 16000021 retries 2021-02-18 19:05:33.185 JST [32153] STATEMENT: WITH r ... =================================================================================================== It works as expected, and I think the message is useful. Here are some comments about the code. +#define GETNEWOID_NOTIFICATION_INTERVAL_FACTOR 1000000 It seems to be more of a minimum value for conflict detection rather than an interval factor. So, it would be better to use another name, such as GETNEWOID_NOTIFICATION_MINVAL or others. + /* retry count and notification limit check */ + if (retry_count == next_notify && next_notify <= GETNEWOID_NOTIFICATION_LIMIT) + { + ereport(LOG, + (errmsg("failed to assign new OID in relation \"%s\" after "UINT64_FORMAT" retries", + RelationGetRelationName(relation), retry_count))); + next_notify *= 2; /* double it for the next notification */ + } + retry_count++; I think that we should add more comments about why output this LOG message. Best regards, > > > > > > By the way > > How about distributing OIDs so that the GetNewOidWithIndex function doesn't take long to find an available OID? > > For example, skip one and assign an OID. > > > > As a side effect of this method, the OID progresses faster. > > (Maybe the btree index will be fragmented faster) > I think the idea of assigning OIDs to sparse is interesting. > However, I think it needs to find out how much it will affect the performance. > It could be a non-negligible overhead, especially for large amounts of > data insertion/updating. > > Best regards, > > > It may be better to include skips only for TOAST chunk_id, as this method may have other side effects. > > > > regards. > > -- > > Tomohiro Hiramitsu > > NTT Open Source Software Center > > > > 2020年12月21日(月) 16:44 Magnus Hagander <magnus@hagander.net>: > >> > >> On Wed, Nov 18, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote: > >> > > >> > Hi, > >> > > >> > On 2020-11-18 11:57:05 +0100, Magnus Hagander wrote: > >> > > This definitely sounds like something that's worth putting out as a > >> > > wait event. Even before you start traversing millions of OIDs it might > >> > > gradually start to show up, and being able to monitor that would > >> > > definitely be useful. > >> > > >> > I don't think this is likely to work well as a wait event. All the index > >> > traversals etc will do IO, acquire locks, etc, which will overwrite the > >> > wait event and reset it to nothing once done. > >> > >> Oh meh. Yeah, I didn't think about the actual implementation of doing it :/ > >> > >> That said, I'd really prefer to see *some* kind of counter that could > >> get people noticing this a bit earlier, rather than just bugging them > >> in the logfile once it's gone over a threshold. A statistics counter > >> maybe, but I guess that'd have to be tracked at a per-table level to > >> be really useful, which would make it potentially fairly expensive to > >> keep around... > >> > >> -- > >> Magnus Hagander > >> Me: https://www.hagander.net/ > >> Work: https://www.redpill-linpro.com/ > >> > >> > >> > >> > > > -- > Tatsuhito Kasahara > kasahara.tatsuhito _at_ gmail.com -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Fujii Masao
Дата:
On 2021/03/17 14:30, tomohiro hiramitsu wrote: > v4-0001-GetNewOidWithIndex_log_output.patch Thanks for updating the patch! I applied the following and cosmetic changes to the patch. Attached is the updated version of the patch. + ereport(LOG, + (errmsg("still finding an unused OID within relation \"%s\"", + ereport(LOG, + (errmsg("the new OID has been assigned in relation \"%s\" after \"%llu\" retries", WARNING is more suitable for these kinds of messages? I changed the log level here to WARNING. +#define GETNEWOID_NOTIFY_MINVAL 1000000 +#define GETNEWOID_NOTIFY_MAXVAL 100000000 If we use WARNING, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL sound better to me for the meanings of them. I renamed them. + if (retry_count >= GETNEWOID_NOTIFY_MINVAL) ">=" should be ">"? Otherwise, the message for completion of OID assignment can be logged even when no message "still finding ..." is output. I fixed this. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
tomohiro hiramitsu
Дата:
On Fri, Mar 19, 2021 at 1:32 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Thanks for updating the patch! I applied the following and cosmetic changes> to the patch. Attached is the updated version of the patch.
Thanks for fixing the patch!
I think your fix is fine.
+#define GETNEWOID_WARN_THRESHOLD 1000000
+#define GETNEWOID_WARN_MAX_INTERVAL 100000000
I think it is better to use the following equation to determine the value of GETNEWOID_WARN_MAX_INTERVAL.
GETNEWOID_WARN_MAX_INTERVAL = 1000000 * 2^n
Example 1:
GETNEWOID_WARN_MAX_INTERVAL = 100000000 /* (Current setting) */
DETAIL: OID candidates were checked "1000000" times, but no unused OID is yet found.
DETAIL: OID candidates were checked "2000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "4000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "8000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "16000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "32000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "64000000" times, but no unused OID is yet found. (*=2) <---GETNEWOID_WARN_MAX_INTERVAL=100000000
DETAIL: OID candidates were checked "164000000" times, but no unused OID is yet found. (+= GETNEWOID_WARN_MAX_INTERVAL)
DETAIL: OID candidates were checked "264000000" times, but no unused OID is yet found. (+= GETNEWOID_WARN_MAX_INTERVAL)
DETAIL: OID candidates were checked "364000000" times, but no unused OID is yet found. (+= GETNEWOID_WARN_MAX_INTERVAL)
Example 2:
GETNEWOID_WARN_MAX_INTERVAL= 128000000 /* (1000000 * 2^7) */
DETAIL: OID candidates were checked "1000000" times, but no unused OID is yet found.
DETAIL: OID candidates were checked "2000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "4000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "8000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "16000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "32000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "64000000" times, but no unused OID is yet found. (*=2)
DETAIL: OID candidates were checked "128000000" times, but no unused OID is yet found. (*=2) <---GETNEWOID_WARN_MAX_INTERVAL=128000000
DETAIL: OID candidates were checked "256000000" times, but no unused OID is yet found. (+= GETNEWOID_WARN_MAX_INTERVAL)
DETAIL: OID candidates were checked "384000000" times, but no unused OID is yet found. (+= GETNEWOID_WARN_MAX_INTERVAL)
I like example 2 better for the notification timing after the threshold is exceeded.
Best regards,
--
Tomohiro Hiramitsu
NTT Open Source Software Center
Вложения
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
tomohiro hiramitsu
Дата:
On Fri, Mar 19, 2021 at 1:32 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> WARNING is more suitable for these kinds of messages?
> I changed the log level here to WARNING.
>
> +#define GETNEWOID_NOTIFY_MINVAL 1000000
> +#define GETNEWOID_NOTIFY_MAXVAL 100000000
>
> If we use WARNING, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL
> sound better to me for the meanings of them. I renamed them.
When "LOG" is set, SQL statements can be recorded in the server log even if log_min_error_statement is set to "ERROR"(default).
It would be useful for users to know which SQL statement is retrying to generate OIDs.
In this case, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL should be renamed to the following names.
* GETNEWOID_LOG_THRESHOLD
* GETNEWOID_LOG_MAX_INTERVAL
Best regards,
--
Tomohiro Hiramitsu
NTT Open Source Software Center
> WARNING is more suitable for these kinds of messages?
> I changed the log level here to WARNING.
>
> +#define GETNEWOID_NOTIFY_MINVAL 1000000
> +#define GETNEWOID_NOTIFY_MAXVAL 100000000
>
> If we use WARNING, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL
> sound better to me for the meanings of them. I renamed them.
When "LOG" is set, SQL statements can be recorded in the server log even if log_min_error_statement is set to "ERROR"(default).
It would be useful for users to know which SQL statement is retrying to generate OIDs.
In this case, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL should be renamed to the following names.
* GETNEWOID_LOG_THRESHOLD
* GETNEWOID_LOG_MAX_INTERVAL
Best regards,
--
Tomohiro Hiramitsu
NTT Open Source Software Center
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Fujii Masao
Дата:
On 2021/03/22 14:43, tomohiro hiramitsu wrote: > Example 2: > GETNEWOID_WARN_MAX_INTERVAL= 128000000 /* (1000000 * 2^7) */ This looks better to me. On 2021/03/22 18:19, tomohiro hiramitsu wrote: > When "LOG" is set, SQL statements can be recorded in the server log even if log_min_error_statement is set to "ERROR"(default). > It would be useful for users to know which SQL statement is retrying to generate OIDs. This makes sense. > In this case, GETNEWOID_WARN_THRESHOLD and GETNEWOID_WARN_MAX_INTERVAL should be renamed to the following names. > > * GETNEWOID_LOG_THRESHOLD > * GETNEWOID_LOG_MAX_INTERVAL Ok, so I updated the patch that way. Patch attached. Barring any objection, I will commit this patch Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
Fujii Masao
Дата:
On 2021/03/22 19:53, Fujii Masao wrote: > Ok, so I updated the patch that way. Patch attached. > Barring any objection, I will commit this patch Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
От
tomohiro hiramitsu
Дата:
On Wed, Mar 24, 2021 at 10:40 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2021/03/22 19:53, Fujii Masao wrote:
> Ok, so I updated the patch that way. Patch attached.
> Barring any objection, I will commit this patch
Pushed. Thanks!
Thank you very much!
--
Tomohiro Hiramitsu
NTT Open Source Software Center
Tomohiro Hiramitsu
NTT Open Source Software Center