Re: Errors with schema migration and logical replication — expected?

Поиск
Список
Период
Сортировка
От Mike Lissner
Тема Re: Errors with schema migration and logical replication — expected?
Дата
Msg-id CAMp9=EzVFLVSf4ff80M7VYpYz0u-nyGPe9eDVjPEX5fAxdTwLw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Errors with schema migration and logical replication — expected?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: Errors with schema migration and logical replication — expected?
Список pgsql-general
This sounds *very* plausible. So I think there are a few takeaways:

1. Should the docs mention that additive changes with NOT NULL constraints are bad?

2. Is there a way this could work without completely breaking replication? For example, should Postgresql realize replication can't work in this instance and then stop it until schemas are back in sync, like it does with other incompatible schema changes? That'd be better than failing in this way and is what I'd expect to happen.

3. Are there other edge cases like this that aren't well documented that we can expect to creep up on us? If so, should we try to spell out exactly *which* additive changes *are* OK?

This feels like a major "gotcha" to me, and I'm trying to avoid those. I feel like the docs are pretty lacking here and that others will find themselves in similarly bad positions.

Better schema migration docs would surely help, too.

Mike


On Wed, Dec 12, 2018 at 7:11 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/12/18 12:15 AM, Mike Lissner wrote:
>
>
> On Tue, Dec 11, 2018 at 3:10 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>
>      >     Well, I was able to fix this by briefly allowing nulls on the
>      >     subscriber, letting it catch up with the publisher, setting all
>      >     nulls to empty strings (a Django convention), and then
>     disallowing
>      >     nulls again. After letting it catch up, there were 118 nulls
>     on the
>      >     subscriber in this column:
>
>     So recap_sequence_number is not actually a number, it is a code?
>
>
> It has sequential values, but they're not necessarily numbers.
>
>
>      >
>      >     I appreciate all the responses. I'm scared to say so, but I think
>      >     this is a bug in logical replication. Somehow a null value
>     appeared
>      >     at the subscriber that was never in the publisher.
>      >
>      >     I also still have this question/suggestion from my first email:
>      >
>      >      > Is the process for schema migrations documented somewhere
>     beyond
>      >     the above?
>
>     Not that I know of. It might help, if possible, to detail the steps in
>     the migration. Also what program you used to do it. Given that is
>     Django
>     I am assuming some combination of migrate, makemigrations and/or
>     sqlmigrate.
>
>
> Pretty simple/standard, I think:
>   - Changed the code.
>   - Generated the migration using manage.py makemigration
>   - Generated the SQL using sqlmigrate
>   - Ran the migration using manage.py migrate on the master and using
> psql on the replica

The only thing I can think of involves this sequence on the subscriber:

ALTER TABLE "search_docketentry" ADD COLUMN "recap_sequence_number"
varchar(50) DEFAULT '' NOT NULL;
ALTER TABLE "search_docketentry" ALTER COLUMN "recap_sequence_number"
DROP DEFAULT;

and then this:

https://www.postgresql.org/docs/11/logical-replication-subscription.html

"Columns of a table are also matched by name. A different order of
columns in the target table is allowed, but the column types have to
match. The target table can have additional columns not provided by the
published table. Those will be filled with their default values."

https://www.postgresql.org/docs/10/sql-createtable.html

"If there is no default for a column, then the default is null."

So the subscriber finished the migration first, as alluded to in an
earlier post. There is no data for recap_sequence_number coming from the
provider so Postgres place holds the data with NULL until such time as
the migration on the provider finishes and actual data for
recap_sequence_number starts flowing.

Going forward, options I see:

1) Making sure there is a DEFAULT other then NULL for a NOT NULL column.

2) Stop the replication and do the migration scripts on both provider
and subscriber until they both complete and then start replication again.




> Mike


--
Adrian Klaver
adrian.klaver@aklaver.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: explain analyze cost
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: psql output in Japanese Code Page