Re: Online enabling of checksums

Поиск

Список

Период

Сортировка

От	Daniel Gustafsson
Тема	Re: Online enabling of checksums
Дата	7 апреля 2018 г. 05:04:50
Msg-id	8A3A67FB-0C69-4736-8E52-D866EBC53C56@yesql.se обсуждение исходный текст
Ответ на	Re: Online enabling of checksums (Andres Freund <andres@anarazel.de>)
Ответы	Re: Online enabling of checksums
Список	pgsql-hackers

Дерево обсуждения

> On 07 Apr 2018, at 00:23, Andres Freund <andres@anarazel.de> wrote:

> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
>> Applying this makes the _cancel test pass, moving the failure instead to the
>> following _enable test (which matches what coypu and mylodon are seeing).
>
> FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
> get the buildfarm green again.

Sorry about that, I’m a bit slow due to various $family situations at the
moment.

> I'm fairly certain that the bug here is a simple race condition in the
> test (not the main code!):

I wonder if it may perhaps be a case of both?

> It's
> exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> make sure that a process has finished exiting.  Then followup tests fail
> because the process is still running

I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination.  checksumhelper cancellation use pg_atomic_unlocked_test_flag() to
test if it’s running when asked to abort, something which seems unsafe to do in
semaphore simulation as it always returns true.  If I for debugging synthesize
a flag test with testset/clear, the tests pass green (with the upstream patch
for pg_atomic_test_set_flag_impl() applied).  Cancelling with semaphore sim is
thus doomed to never work IIUC.  Or it’s a red herring.

As Magnus mentioned upstream, rewriting to not use an atomic flag is probably
the best option, once the current failure is understood.

> really?  Let's just force the test take at least 6s purely from
> sleeping?

The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester?  I have failed to find a good way to repeat a step like
that, but I might be missing something.

cheers ./daniel

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 07 апреля 2018 г., 05:04:01
Сообщение: Re: PostgreSQL 11 Release Management Team & Feature Freeze

Следующее

От: Andres Freund
Дата: 07 апреля 2018 г., 05:13:56
Сообщение: Re: Online enabling of checksums

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Online enabling of checksums

Предыдущее

Следующее