Re: BUG #15427: DROP INDEX did not free up disk space

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #15427: DROP INDEX did not free up disk space
Дата
Msg-id 20181012045148.rhohmjjy7ehrczsi@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #15427: DROP INDEX did not free up disk space  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

On 2018-10-12 00:33:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-11 23:57:16 -0400, Tom Lane wrote:
> >> Uh, what's that got to do with it?
>
> > If you look at the bugreport: As soon as the op, on my suggestion,
> > triggered sinval processing (by issuing a SELECT 1;) the space was
> > freed. So clearly the open FDs were part of the problem.
>
> TBH, I think the space-freeup was more likely driven off a background
> checkpoint completion, which is where the truncation happens.

Uh, as I wrote, mdunlinkfork(), which backs dropping an index via
index_drop()->RelationDropStorage() and then
smgrDoPendingDeletes()->smgrdounlinkall()->mdunlink()->mdunlinkfork(),
unlinks all segments beyond the first itself:

static void
mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
{
    char       *path;
    int            ret;

    path = relpath(rnode, forkNum);

    /*
     * Delete or truncate the first segment.
     */
    if (isRedo || forkNum != MAIN_FORKNUM || RelFileNodeBackendIsTemp(rnode))
    {
        ret = unlink(path);
        if (ret < 0 && errno != ENOENT)
            ereport(WARNING,
                    (errcode_for_file_access(),
                     errmsg("could not remove file \"%s\": %m", path)));
    }
    else
    {
        /* truncate(2) would be easier here, but Windows hasn't got it */
        int            fd;

        fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
        if (fd >= 0)
        {
            int            save_errno;

            ret = ftruncate(fd, 0);
            save_errno = errno;
            CloseTransientFile(fd);
            errno = save_errno;
        }
        else
            ret = -1;
        if (ret < 0 && errno != ENOENT)
            ereport(WARNING,
                    (errcode_for_file_access(),
                     errmsg("could not truncate file \"%s\": %m", path)));

        /* Register request to unlink first segment later */
        register_unlink(rnode);
    }

    /*
     * Delete any additional segments.
     */
    if (ret >= 0)
    {
        char       *segpath = (char *) palloc(strlen(path) + 12);
        BlockNumber segno;

        /*
         * Note that because we loop until getting ENOENT, we will correctly
         * remove all inactive segments as well as active ones.
         */
        for (segno = 1;; segno++)
        {
            sprintf(segpath, "%s.%u", path, segno);
            if (unlink(segpath) < 0)
            {
                /* ENOENT is expected after the last segment... */
                if (errno != ENOENT)
                    ereport(WARNING,
                            (errcode_for_file_access(),
                             errmsg("could not remove file \"%s\": %m", segpath)));
                break;
            }
        }
        pfree(segpath);
    }

    pfree(path);
}

As you clearly can see, unlink() is called directly here for anything
but the first segment (which is registered to be unlinked in
checkpointer via register_unlink()).

Note that the checkpointer based machinery doesn't even *support*
unlinking anything beyond the first segment:

void
mdpostckpt(void)
{
...
    while (pendingUnlinks != NIL)
...
        /* Unlink the file */
        path = relpathperm(entry->rnode, MAIN_FORKNUM);
        if (unlink(path) < 0)

there's no segment handling here.


You're right that mdtruncate() leaves later segments around in a
truncated manner.  But that's because of an orthogonal concern:
 *    The full and partial segments are collectively the "active" segments.
 *    Inactive segments are those that once contained data but are currently
 *    not needed because of an mdtruncate() operation.  The reason for leaving
 *    them present at size zero, rather than unlinking them, is that other
 *    backends and/or the checkpointer might be holding open file references to
 *    such segments.  If the relation expands again after mdtruncate(), such
 *    that a deactivated segment becomes active again, it is important that
 *    such file references still be valid --- else data might get written
 *    out to an unlinked old copy of a segment file that will eventually
 *    disappear.

that doesn't apply to dropping relations.

Greetings,

Andres Freund


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15427: DROP INDEX did not free up disk space
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: BUG #15412: "invalid contrecord length" during WAL replicarecovery