Re: xlog.c: removing ReadRecPtr and EndRecPtr

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: xlog.c: removing ReadRecPtr and EndRecPtr
Дата
Msg-id CA+TgmoaBQvZY6cNb5moWNH11Gw2eB8=UBkgCQx9L+8bWHxs+Fw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: xlog.c: removing ReadRecPtr and EndRecPtr  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: xlog.c: removing ReadRecPtr and EndRecPtr  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Nov 18, 2021 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > There's a second place where the patch needs to wait for something
> > also, and that one I've crudely kludged with sleep(10). If anybody
> > around here who is good at figuring out how to write clever TAP tests
> > can tell me how to fix this test to be non-stupid, I will happily do
> > so.
>
> As far as that goes, if you conceptualize it as "wait for this text
> to appear in the log file", there's prior art in existing TAP tests.
> Basically, sleep for some reasonable short period and check the
> log file; if not there, repeat until timeout.

Yeah, there's something like that in the form of find_in_log in
019_replslot_init.pl. I thought about copying that, but that didn't
seem great, and I also thought about trying to move into a common
module, which seems maybe better but also more work, and thus not
worth doing unless we have agreement that it's what we should do.

> I'm a little dubious that this test case is valuable enough to
> mess around with a nonstandard postmaster startup protocol, though.
> The main reason I dislike that idea is that any fixes we apply to
> the TAP tests' normal postmaster-startup code would almost inevitably
> miss fixing this test.  IIRC there have been security-related fixes in
> that area (e.g. where do we put the postmaster's socket), so I find
> that prospect pretty scary.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal. It actually doesn't look bad if
you just run a coverage report, but there are a shazillion flag
variables in that file and bugs like this make it quite clear that we
don't come close to testing all the possible combinations. It's really
borderline unmaintainable. I don't know whether there's a specific
individual who wrote most of this code and didn't get the memo that
global variables are best avoided, or whether this is sort of case
where we started over 1 or 2 and then it just gradually ballooned into
the giant mess that it now is, but the present situation is pretty
outrageous. It's taking me weeks of time to figure out how to make
changes that would normally take days, or maybe hours. We clearly need
to try both to get the number of cases under control by eliminating
stupid variables that are almost but not quite the same as something
else, and also get proper test coverage for the things that remain so
that it's possible to modify the code without excesive risk of
shooting ourselves in the foot.

That said, I'm not wedded to this particular test case, either. It's
an extremely specific bug that is unlikely to reappear once squashed,
and making the test case robust enough to avoid having the buildfarm
complain seems fraught.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [RFC] building postgres with meson
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Mixing CC and a different CLANG seems like a bad idea