Re: Why we lost Uber as a user

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Why we lost Uber as a user
Дата
Msg-id 20160802201220.GW4028@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Why we lost Uber as a user  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Why we lost Uber as a user  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Aug 2, 2016 at 3:07 PM, Alfred Perlstein <alfred@freebsd.org> wrote:
> > You are quite technical, my feeling is that you will understand it, however it will need to be a self learned
lesson.
>
> I don't know what this is supposed to mean, but I think that Geoff's
> point is somewhat valid.  No matter how you replicate data, there is
> always the possibility that you will replicate any corruption along
> with the data - or that your copy will be unfaithful to the original.

I believe what Geoff was specifically getting at is probably best
demonstrated with an example.

Consider a bug in the btree index code which will accept a value but not
store it correctly.

INSERT INTO mytable (indexed_column) VALUES (-1000000000);

/* oops, bug, this value gets stored in the wrong place in the btree */

We happily accept the record and insert it into the btree index, but
that insert is incorrect and results in the btree being corrupted
because some bug doesn't handle such large values correctly.

In such a case, either approach to replication (replicating the query
statement, or replicating the changes to the btree page exactly) would
result in corruption on the replica.

The above represents a bug in *just* the btree side of things (the
physical replication did its job correctly, even though the result is a
corrupted index on the replica).

With physical replication, there is the concern that a bug in *just* the
physical (WAL) side of things could cause corruption.  That is, we
correctly accept and store the value on the primary, but the records
generated to send that data to the replica are incorrect and result in
an invalid state on the replica.

Of course, a bug in the physical side of things which caused corruption
would mean that *crash recovery* would also cause corruption.  As I
understand it, that same concern exists for MySQL, so, moving to logical
replication doesn't actually mean you don't need to worry about bugs in
the crash recovery side of things, assuming you depend on the database
to come back up in a consistent manner after a crash.

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: No longer possible to query catalogs for index capabilities?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: HandleParallelMessages contains CHECK_FOR_INTERRUPTS?