Обсуждение: Bugs in our qsort implementation

Поиск
Список
Период
Сортировка

Bugs in our qsort implementation

От
Tom Lane
Дата:
I've been trying to figure out the crash in qsort reported here:
http://www.postgresql.org/message-id/flat/CAL8HZuNR2FR1owZHWG-p64GJtNfbbmPx1Y2OXmJ_XuQ3P8YtVA@mail.gmail.com

I first noticed that our qsort code uses an int to hold some transient
values representing numbers of elements.  Since the passed array size
argument is a size_t, this is broken; the integer could overflow.
I do not think this is a live bug so far as our core code is concerned,
because tuplesort.c still limits itself to no more than INT_MAX items
to be sorted, and nothing else in the core would even approach that.
However, it's in principle a hazard for third-party modules that might try
to sort more than that; and in any case it's a bug waiting to bite us on
the rear whenever somebody decides they have enough RAM that they should
be able to sort more than INT_MAX items.

However, Yiqing reported the crash as occurring here:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000785180 in med3_tuple (a=0x7f31613f1028, b=0x7f31613f1040, c=0xffffffff3ffffffd,
cmp_tuple=0x7f43613f1010,state=0x1) at qsort_tuple.c:66
 
66    {

which is a bit curious because that function does not itself access any
of the data --- it just calls the cmp_tuple function, so even if we'd
somehow computed a bad address, the crash should occur inside the
comparator function, not here.

After awhile a theory occurred to me: the qsort functions recurse without
bothering with a stack depth check, so maybe the SEGV actually represents
running out of stack space.  And after a bit of research, that theory
seems pretty plausible.  It turns out that qsort is guaranteed to recurse
no deeper than log(N) levels, but *only if you take care to recurse on the
smaller partition and iterate on the larger one*.  And we're not doing
that, we just blindly recurse on the left partition.  So given a fairly
huge amount of data and some bad luck in partition-picking, it seems
possible that stack overflow explains this report.

I propose to
(1) fix the code to use a size_t variable rather than int where
appropriate;
(2) teach it to recurse on the smaller partition.

It's possible that this issue can only manifest on 9.4 and up where
we have the ability for tuplesort to allocate work arrays approaching
INT_MAX elements.  But I don't have a lot of faith in that; I think the
worst-case stack depth for the way we have it now could be as bad as O(N),
so in principle a crash could be possible with significantly smaller input
arrays.  I think we'd better back-patch this all the way.
        regards, tom lane



Re: Bugs in our qsort implementation

От
Peter Geoghegan
Дата:
On Thu, Jul 16, 2015 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's possible that this issue can only manifest on 9.4 and up where
> we have the ability for tuplesort to allocate work arrays approaching
> INT_MAX elements.  But I don't have a lot of faith in that; I think the
> worst-case stack depth for the way we have it now could be as bad as O(N),
> so in principle a crash could be possible with significantly smaller input
> arrays.  I think we'd better back-patch this all the way.

+1.

If you want to generate a worst case, McIlroy wrote a program that
will generate one [1]. AFAIR, it will generate a series of
self-consistent comparisons in the "gas" comparator that produce a
worst-case outcome (as opposed to producing a simple worse-case input,
which would be more convenient in this kind of scenario). This is
known specifically to affect the Bentley & McIlroy implementation, as
the paper goes into.

[1] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
-- 
Peter Geoghegan



Re: Bugs in our qsort implementation

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Jul 16, 2015 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's possible that this issue can only manifest on 9.4 and up where
>> we have the ability for tuplesort to allocate work arrays approaching
>> INT_MAX elements.  But I don't have a lot of faith in that; I think the
>> worst-case stack depth for the way we have it now could be as bad as O(N),
>> so in principle a crash could be possible with significantly smaller input
>> arrays.  I think we'd better back-patch this all the way.

> +1.

> If you want to generate a worst case, McIlroy wrote a program that
> will generate one [1]. AFAIR, it will generate a series of
> self-consistent comparisons in the "gas" comparator that produce a
> worst-case outcome (as opposed to producing a simple worse-case input,
> which would be more convenient in this kind of scenario). This is
> known specifically to affect the Bentley & McIlroy implementation, as
> the paper goes into.
> [1] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Wow, interesting article.  I stuck that into a test program with our
qsort, and found out that

(1) our "presort" check totally beats that oracle, because it causes
the oracle to freeze the "gas" in perfectly sequential order, so that
the presort run gets all the way through and decides that the input
is sorted.  Hence, N-1 comparisons and no recursion.

(2) if you dike out the presort check, you do see quadratic comparison
behavior but the max recursion depth is only 1.  Apparently, the oracle
effectively always makes the right-hand partition as large as possible,
so that recursing on the left partition is the optimal policy as far
as stack depth goes.

However, you can trivially modify this oracle to break our code: just
negate its comparison result, ie

-    return val[x] - val[y];        /* only the sign matters */
+    return -(val[x] - val[y]);    /* only the sign matters */

This stops the presort check immediately, since now the data looks
to be reverse-sorted instead of correctly ordered; and now it makes
the left-hand partition as large as possible, instead of the right.

With our unmodified code and the tweaked oracle, I see maximum recursion
depth in the vicinity of N/4.  So it's definitely possible to get O(N)
stack growth without a fix, though it may take very unusual input.  With
the correction to recurse to the smaller partition, we get max recursion
depth of just 1, since that always chooses a very small partition to
recurse to.
        regards, tom lane