Re: BUG #15427: DROP INDEX did not free up disk space
От | Andres Freund |
---|---|
Тема | Re: BUG #15427: DROP INDEX did not free up disk space |
Дата | |
Msg-id | 20181012045148.rhohmjjy7ehrczsi@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #15427: DROP INDEX did not free up disk space (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
Hi, On 2018-10-12 00:33:14 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-10-11 23:57:16 -0400, Tom Lane wrote: > >> Uh, what's that got to do with it? > > > If you look at the bugreport: As soon as the op, on my suggestion, > > triggered sinval processing (by issuing a SELECT 1;) the space was > > freed. So clearly the open FDs were part of the problem. > > TBH, I think the space-freeup was more likely driven off a background > checkpoint completion, which is where the truncation happens. Uh, as I wrote, mdunlinkfork(), which backs dropping an index via index_drop()->RelationDropStorage() and then smgrDoPendingDeletes()->smgrdounlinkall()->mdunlink()->mdunlinkfork(), unlinks all segments beyond the first itself: static void mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo) { char *path; int ret; path = relpath(rnode, forkNum); /* * Delete or truncate the first segment. */ if (isRedo || forkNum != MAIN_FORKNUM || RelFileNodeBackendIsTemp(rnode)) { ret = unlink(path); if (ret < 0 && errno != ENOENT) ereport(WARNING, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); } else { /* truncate(2) would be easier here, but Windows hasn't got it */ int fd; fd = OpenTransientFile(path, O_RDWR | PG_BINARY); if (fd >= 0) { int save_errno; ret = ftruncate(fd, 0); save_errno = errno; CloseTransientFile(fd); errno = save_errno; } else ret = -1; if (ret < 0 && errno != ENOENT) ereport(WARNING, (errcode_for_file_access(), errmsg("could not truncate file \"%s\": %m", path))); /* Register request to unlink first segment later */ register_unlink(rnode); } /* * Delete any additional segments. */ if (ret >= 0) { char *segpath = (char *) palloc(strlen(path) + 12); BlockNumber segno; /* * Note that because we loop until getting ENOENT, we will correctly * remove all inactive segments as well as active ones. */ for (segno = 1;; segno++) { sprintf(segpath, "%s.%u", path, segno); if (unlink(segpath) < 0) { /* ENOENT is expected after the last segment... */ if (errno != ENOENT) ereport(WARNING, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", segpath))); break; } } pfree(segpath); } pfree(path); } As you clearly can see, unlink() is called directly here for anything but the first segment (which is registered to be unlinked in checkpointer via register_unlink()). Note that the checkpointer based machinery doesn't even *support* unlinking anything beyond the first segment: void mdpostckpt(void) { ... while (pendingUnlinks != NIL) ... /* Unlink the file */ path = relpathperm(entry->rnode, MAIN_FORKNUM); if (unlink(path) < 0) there's no segment handling here. You're right that mdtruncate() leaves later segments around in a truncated manner. But that's because of an orthogonal concern: * The full and partial segments are collectively the "active" segments. * Inactive segments are those that once contained data but are currently * not needed because of an mdtruncate() operation. The reason for leaving * them present at size zero, rather than unlinking them, is that other * backends and/or the checkpointer might be holding open file references to * such segments. If the relation expands again after mdtruncate(), such * that a deactivated segment becomes active again, it is important that * such file references still be valid --- else data might get written * out to an unlinked old copy of a segment file that will eventually * disappear. that doesn't apply to dropping relations. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Следующее
От: Kyotaro HORIGUCHIДата:
Сообщение: Re: BUG #15412: "invalid contrecord length" during WAL replicarecovery