Re: 16-bit page checksums for 9.2

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: 16-bit page checksums for 9.2
Дата
Msg-id CA+TgmoYuSm_rzkrY0JvD-jLLVHswfPWMChTFGL2mNp6ZWZ3D1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 16-bit page checksums for 9.2  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: 16-bit page checksums for 9.2  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Sun, Feb 19, 2012 at 6:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I thought it was a reasonable and practical idea from Heikki. The bit
> is not selected arbitrarily, it is by design adjacent to one of the
> other bits. So overall, 3 bits need to be set to a precise value and a
> run of 1s or 0s will throw and error.

Sure, but who is to say that a typical error will look anything like
that?  Anyway, you could check even more bits just as easily; we know
exactly which ones have a plausible reason for being non-zero.

>> That having been said, I don't feel very good about the idea of
>> relying on the contents of the page to tell us whether or not the page
>> has a checksum.  There's no guarantee that an error that flips the
>> has-checksum bit will flip any other bit on the page, or that it won't
>> flip everything else we're relying on as a backstop in exactly the
>> manner that foils whatever algorithm we put in place.  Random
>> corruption is, perhaps, unlikely to do that, but somehow I feel like
>> it'll happen more often than random chance suggests.  Things never
>> fail the way you want them to.
>
> You're right. This patch is not the best possible world, given a clean
> slate. But we don't have a clean slate.
>
> The fact is this patch will detect corruptions pretty well and that's
> what Postgres users want.
>
> While developing this, many obstacles could have been blockers. I
> think we're fairly lucky that I managed to find a way through the
> minefield of obstacles.

I think we could do worse than this patch, but I don't really believe
it's ready for commit.  We don't have a single performance number
showing how much of a performance regression this causes, either on
the master or on the standby, on any workload, much less those where a
problem is reasonably forseeable from the design you've chosen.  Many
controversial aspects of the patch, such as the way you're using the
buffer header spinlocks as a surrogate for x-locking the buffer, or
the right way of obtaining the bit-space the patch requires, haven't
really been discussed, and to the extent that they have been
discussed, they have not been agreed.

On the former point, you haven't updated
src/backend/storage/buffer/README at all; but updating is not by
itself sufficient.  Before we change the buffer-locking rules, we
ought to have some discussion of whether it's OK to do that, and why
it's necessary.  "Why it's necessary" would presumably include
demonstrating that the performance of the straightforward
implementation stinks, and that with this change is better.  You
haven't made any effort to do that whatsoever, or if you have, you
haven't posted the results here.  I'm pretty un-excited by that
change, first because I think it's a modularity violation and possibly
unsafe, and second because I believe we've already got a problem with
buffer header spinlock contention which this might exacerbate.

>> Another disadvantage of the current scheme is that there's no
>> particularly easy way to know that your whole cluster has checksums.
>> No matter how we implement checksums, you'll have to rewrite every
>> table in the cluster in order to get them fully turned on.  But with
>> the current design, there's no easy way to know how much of the
>> cluster is actually checksummed.
>
> You can read every block and check.

Using what tool?

>> If you shut checksums off, they'll
>> linger until those pages are rewritten, and there's no easy way to
>> find the relations from which they need to be removed, either.
>
> We can't have it both ways. Either we have an easy upgrade, or we
> don't. I'm told that an easy upgrade is essential.

Easy upgrade and removal of checksums are unrelated issues AFAICS.

>> I'm tempted to suggest a relation-level switch: when you want
>> checksums, you use ALTER TABLE to turn them on, and when you don't
>> want them any more you use ALTER TABLE to shut them off again, in each
>> case rewriting the table.  That way, there's never any ambiguity about
>> what's in the data pages in a given relation: either they're either
>> all checksummed, or none of them are.  This moves the decision about
>> whether checksums are enabled or disabled quite a but further away
>> from the data itself, and also allows you to determine (by catalog
>> inspection) which parts of the cluster do in fact have checksums.  It
>> might be kind of a pain to implement, though: you'd have to pass the
>> information about how any given relation was configured down to the
>> place where we validate page sanity.  I'm not sure whether that's
>> practical.
>
> It's not practical as the only mechanism, given the requirement for upgrade.

Why not?

> The version stuff is catered for by the current patch.

Your patch reuses the version number field for an unrelated purpose
and includes some vague hints of how we might do versioning using some
of the page-level flag bits.  Since bit-space is fungible, I think it
makes more sense to keep the version number where it is and carve the
checksum out of whatever bit space you would have moved the version
number into.  Whether that's pd_tli or pd_flags is somewhat secondary;
the point is that you can certainly arrange to leave the 8 bits that
currently contain the version number as they are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Billy Earney
Дата:
Сообщение: Re: Future of our regular expression code
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Future of our regular expression code