Re: failures in t/031_recovery_conflict.pl on CI

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: failures in t/031_recovery_conflict.pl on CI
Дата	3 мая 2022 г. 21:20:25
Msg-id	20220503182025.wvbebs2ojk6vpi5f@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: failures in t/031_recovery_conflict.pl on CI (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: failures in t/031_recovery_conflict.pl on CI (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2022-05-03 01:16:46 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-05-02 23:44:32 -0400, Tom Lane wrote:
> >> I can poke into that tomorrow, but are you sure that that isn't an
> >> expectable result?
> 
> > It's not expected. But I think I might see what the problem is:
> > We wait for the FETCH (and thus the buffer pin to be acquired). But that
> > doesn't guarantee that the lock has been acquired. We can't check that with
> > pump_until() afaics, because there'll not be any output. But a query_until()
> > checking pg_locks should do the trick?
> 
> Irritatingly, it doesn't reproduce (at least not easily) in a manual
> build on the same box.

Odd, given how readily it seem to reproduce on the bf. I assume you built with
> Uses -fsanitize=alignment -DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS

> So it's almost surely a timing issue, and your theory here seems plausible.

Unfortunately I don't think my theory holds, because I actually had added a
defense against this into the test that I forgot about momentarily...

# just to make sure we're waiting for lock already
ok( $node_standby->poll_query_until(
        'postgres', qq[
SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
], 'waiting'),
    "$sect: lock acquisition is waiting");

and on longfin that step completes sucessfully.

I think what happens is that we get a buffer pin conflict, because these days
we can actually process buffer pin conflicts while waiting for a lock. The
easiest way to get around that is to increase the replay timeout for that
test, I think?

I think we need a restart, not a reload, because reloads aren't guaranteed to
be processed at any certain point in time :/.

Testing a fix in a variety of timing circumstances now...

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 03 мая 2022 г., 21:13:54
Сообщение: Re: fix cost subqueryscan wrong parallel cost

Следующее

От: Tom Lane
Дата: 03 мая 2022 г., 21:23:23
Сообщение: Re: failures in t/031_recovery_conflict.pl on CI

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: failures in t/031_recovery_conflict.pl on CI

Предыдущее

Следующее