Обсуждение: Bug in abbreviated keys abort handling (found with amcheck)

Поиск
Список
Период
Сортировка

Bug in abbreviated keys abort handling (found with amcheck)

От
Peter Geoghegan
Дата:
I found another bug as a result of using amcheck on Heroku customer
databases. This time, the bug is in core Postgres. It's one of mine.

There was a thinko in tuplesort's abbreviation abort logic, causing
certain SortTuples to be spuriously marked NULL (and so, subsequently
sorted as a NULL tuple, despite not actually changing anything about
the representation of caller tuples). The attached patch fixes this
bug.

I noticed this following a complaint by amcheck about a tuple in the
wrong order on a leaf page in some random text index. The leaf page
was entirely full of NULL values, aside from this one tuple at some
seemingly random position. All non-NULL index tuples were of the kind
that you'd expect to trigger abbreviation to abort (many distinct
values, but with little entropy at the beginning).

I believe that this particular problem has been observed on a tiny
fraction of all databases tested, so I don't think it's very common in
the wild.

I'd be surprised if amcheck does not bring more bugs like this to my
attention before too long. We should work on improving it, so that we
have greater visibility into problems that occur in the field.
--
Peter Geoghegan

Вложения

Re: Bug in abbreviated keys abort handling (found with amcheck)

От
Robert Haas
Дата:
On Fri, Aug 19, 2016 at 6:07 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I found another bug as a result of using amcheck on Heroku customer
> databases. This time, the bug is in core Postgres. It's one of mine.
>
> There was a thinko in tuplesort's abbreviation abort logic, causing
> certain SortTuples to be spuriously marked NULL (and so, subsequently
> sorted as a NULL tuple, despite not actually changing anything about
> the representation of caller tuples). The attached patch fixes this
> bug.

Ugh, that sucks. Thanks for the report and patch.  Committed and
back-patched to 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bug in abbreviated keys abort handling (found with amcheck)

От
Peter Geoghegan
Дата:
On Mon, Aug 22, 2016 at 12:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Ugh, that sucks. Thanks for the report and patch.  Committed and
> back-patched to 9.5.

Thanks.

Within Heroku, there is a lot of enthusiasm for the idea of sharing
hard data about the prevalence of problems like this. I hope to be
able to share figures in the next few weeks, when I finish working
through the backlog.

Separately, I would like amcheck to play a role in how we direct users
to REINDEX, as issues like this come to light. It would be much more
helpful if we didn't have to be so conservative. I hesitate to say
that amcheck will detect cases where this bug led to corruption with
100% reliability, but I think that any case that one can imagine in
which amcheck fails here is unlikely in the extreme. The same applies
to the glibc abbreviated keys issue.

I actually didn't find any glibc strxfrm() issues yet, even though any
instances of corruption of text indexes I've seen originated before
the point release in which strxfrm() became distrusted. I guess that
not that many Heroku users use the "C" locale, which would still be
affected with the latest point release.

-- 
Peter Geoghegan