RE: Random pg_upgrade test failure on drongo

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Random pg_upgrade test failure on drongo
Дата
Msg-id TY3PR01MB98894D8BE99AE53217C96C0AF56A2@TY3PR01MB9889.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Random pg_upgrade test failure on drongo  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Random pg_upgrade test failure on drongo  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
Dear Amit, Alexander,

> > We get the effect discussed when the background writer process decides to
> > flush a file buffer for pg_largeobject during stage 1.
> > (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
> > the result must be the same.)
> > And another important factor is shared_buffers = 1MB (set during the test).
> > With the default setting of 128MB I couldn't see the failure.
> >
> > It can be reproduced easily (on old Windows versions) just by running
> > pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
> > default cluster)).
> > If an old cluster contains dozen of databases, this increases the failure
> > probability significantly (with 10 additional databases I've got failures
> > on iterations 4, 1, 6).
> >
> 
> I don't have an old Windows environment to test but I agree with your
> analysis and theory. The question is what should we do for these new
> random BF failures? I think we should set bgwriter_lru_maxpages to 0
> and checkpoint_timeout to 1hr for these new tests. Doing some invasive
> fix as part of this doesn't sound reasonable because this is an
> existing problem and there seems to be another patch by Thomas that
> probably deals with the root cause of the existing problem [1] as
> pointed out by you.
> 
> [1] - https://commitfest.postgresql.org/40/3951/

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as well.

Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: SQL:2011 application time
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Emit fewer vacuum records by reaping removable tuples during pruning