Обсуждение: Problem with repalloc downsizing patch

Поиск
Список
Период
Сортировка

Problem with repalloc downsizing patch

От
Tom Lane
Дата:
skink just found what seems like a serious problem with commit
c477f3e44:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2019-10-04%2010%3A15%3A05

performing post-bootstrap initialization ... TRAP: FailedAssertion("newlist == list", File:
"/home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/backend/nodes/list.c",Line: 200) 
Aborted

That assert is here:

        /*
         * Currently, asking aset.c to reduce the allocated size of the List
         * header is pointless in terms of reclaiming space, unless the list
         * is very long.  However, it seems worth doing anyway to cause the
         * no-longer-needed initial_elements[] space to be cleared in
         * debugging builds.
         */
        newlist = (List *) repalloc(list, offsetof(List, initial_elements));

        /* That better not have failed, nor moved the list header */
        Assert(newlist == list);

If we invoke realloc() then it's very much within its rights to return
a block that's not where the original block was.  I'm a bit surprised
that initdb runs any SQL commands that create Lists long enough to
reach the threshold where c477f3e44 would make a difference (~1000
initial elements would be needed), but there you have it.

This is a bit sticky to resolve.  I'm inclined to think that the fault
is on list.c, as it really shouldn't assume that repalloc doesn't move
the block.  (In fact, the quoted comment is self-contradictory, since
it evidently thinks that space *could* be given back when a large
enough block is involved; which is exactly what didn't happen before
c477f3e44.)  But I do not think we can give up the invariant that List
headers don't move around.  Hence we basically can't do repalloc here
after all.  Perhaps, to limit the cost of that, we should eat the cost
of separately palloc'ing the ListCell array when the initial list length
exceeds some threshold.

I'm also wondering a bit whether there's anyplace *else* that is
cheating by assuming that a downsizing repalloc doesn't move the block.
We could investigate that by testing with a modified form of
AllocSetRealloc that always moves the block, but of course that won't
find bugs in untested code paths.  So another possibility is to revert
c477f3e44 and then document that AllocSetRealloc does not move a block
when reducing its size.  That does not seem very attractive though.

Any opinions about this?

            regards, tom lane



Re: Problem with repalloc downsizing patch

От
Tom Lane
Дата:
I wrote:
> I'm also wondering a bit whether there's anyplace *else* that is
> cheating by assuming that a downsizing repalloc doesn't move the block.
> We could investigate that by testing with a modified form of
> AllocSetRealloc that always moves the block, but of course that won't
> find bugs in untested code paths.  So another possibility is to revert
> c477f3e44 and then document that AllocSetRealloc does not move a block
> when reducing its size.  That does not seem very attractive though.

I did that testing, and found that check-world does not expose any other
trouble spots beyond the one in enlarge_list().  So I think this option
should be rejected.

That leaves us with needing to decide whether we should or shouldn't
forcibly split off the initial ListCell array if it's large.  I'm
kind of leaning to not doing so, because doing that would add an
extra test (at least) to each list creation, and the frequency of
being able to reclaim space seems like it'd be pretty small.  You
need a large initial list, *plus* a request to make it even larger.

(I haven't been able to reproduce skink's failure though, so maybe
there's something I'm missing.)

Thoughts?

            regards, tom lane