Обсуждение: [MASSMAIL]Differential code coverage between 16 and HEAD

Поиск
Список
Период
Сортировка

[MASSMAIL]Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

To see how well we're doing testing newly introduced code, I computed the
differential code coverage between REL_16_STABLE and HEAD.

While arguably comparing HEAD to the the merge-base between REL_16_STABLE and
HEAD would be more accurate, I chose REL_16_STABLE because we've backpatched
bugfixes with tests etc.


I first got some nonsensical differences. That turns out to be due to
immediate shutdowns in the tests, which

a) can loose coverage, e.g. there were no hits for most of walsummarizer.c,
   because the test shuts always shuts it down immediately
b) can cause corrupted coverage files if a process is shut down while writing
   out coverage files

I partially worked around a) by writing out coverage files during abnormal
shutdowns. That requires some care, I'll send a separate email about that. I
worked around b) by rerunning tests until that didn't occur.


The differential code coverage view in lcov is still somewhat raw. I had to
weaken two error checks to get it to succeed in postgres.  You can hover over
the code coverage columns to get more details about what the various three
letter acronyms mean. The most important ones are:
- UNC - uncovered new code, i.e. we added code that's not tested
- LBC - lost baseline coverage, i.e previously tested code isn't anymore
- UBC - untested baseline, i.e. code that's still not tested
- GBC - gained baseline coverage - unchanged code that's now tested
- GNC - gained new coverage - new code that's tested

https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/


This includes "branch coverage" - I'm not sure that's worth the additional
clutter it generates.

Looking at the differential coverage results, at least the following seem
notable:

- We added a bit less uncovered code than last year, but it's not quite a fair
  comparison, because I ran the numbers for 16 2023-04-08. Since the feature
  freeze, 17's coverage has improved by a few hundred lines (8225c2fd40c).

- A good bit of the newly uncovered code is in branches that are legitimately
  hard to reach (unlikely errors etc).

- Some of the new walsummary code could use more tests.
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/backup/walsummaryfuncs.c.gcov.html#L69
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/pg_combinebackup.c.gcov.html#L424
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/blkreftable.c.gcov.html#L790

- the new buffer eviction paths aren't tested at all:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L6023
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/contrib/pg_buffercache/pg_buffercache_pages.c.gcov.html#L356
  It looks like it should be fairly trivial to test at least the basics?

- Coverage for some of the new unicode code is pretty poor:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/unicode_category.c.gcov.html#L122

- Some of the new nbtree code could use a bit more tests:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/nbtree/nbtutils.c.gcov.html#L1468

- Our coverage of the non-default copy modes of pg_upgrade, pg_combinebackup
  is nonexistent, and that got worse with the introduction of a new method
  this release:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_upgrade/file.c.gcov.html#L360
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_upgrade/file.c.gcov.html#L400
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/copy_file.c.gcov.html#L209

- Code coverage of acl.c is atrocious and got worse.

- The new bump allocator has a fair amount of uncovered functionality:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L293
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L613

- A lot of the new resowner functions aren't covered, but I guess the
  equivalent functionality wasn't covered before, either:

  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/cache/catcache.c.gcov.html#L2317
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/cache/relcache.c.gcov.html#L6868
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L3608
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/storage/buffer/bufmgr.c.gcov.html#L5978
  ...

Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
Peter Geoghegan
Дата:
On Sun, Apr 14, 2024 at 6:33 PM Andres Freund <andres@anarazel.de> wrote:
> - Some of the new nbtree code could use a bit more tests:
>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/nbtree/nbtutils.c.gcov.html#L1468

I made a conscious decision to not add coverage for the function that
you've highlighted here (_bt_rewind_nonrequired_arrays) back when I
reviewed the coverage situation for the patch, which was about a month
ago now. (FWIW I also decided against adding coverage for the
recursive call to _bt_advance_array_keys, for similar reasons.)

I don't mind adding _bt_rewind_nonrequired_arrays coverage now. I
already wrote 4 tests that show wrong answers (and assertion failures)
when the call to _bt_rewind_nonrequired_arrays is temporarily
commented out. The problem with committing such a test, if any, is
that it'll necessitate creating an index with at least 3 columns,
crafted to trip up this exact issue with non-required arrays -- and it
has to be bigger than one page (probably several pages, at a minimum).
That's a relatively large number of cycles to spend on this fairly
narrow issue -- it's at least a lot relative to the prevailing
standard for these things. Plus I'd be relying on implementation
details that might change, as well as relying on things like BLCKSZ
(not that it'd be the first time that I committed a test like that).

Note also that there is a general rule (explained above
_bt_rewind_nonrequired_arrays) requiring that all non-required arrays
be reset to their initial positions/element (first in the current scan
direction) once _bt_first is reached. If _bt_advance_array_keys
somehow failed to follow that rule (not necessarily due to an issue
with missing the call to _bt_rewind_nonrequired_arrays), then we'd get
an assertion failure within _bt_first -- the
_bt_verify_arrays_bt_first assertion would catch the violation of the
invariant (the_bt_advance_array_keys postcondition invariant/_bt_first
precondition invariant). So we kinda do have some test coverage for
this function already.

--
Peter Geoghegan



Re: Differential code coverage between 16 and HEAD

От
Robert Haas
Дата:
On Sun, Apr 14, 2024 at 6:33 PM Andres Freund <andres@anarazel.de> wrote:
> - Some of the new walsummary code could use more tests.
>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/backup/walsummaryfuncs.c.gcov.html#L69

So this is pg_wal_summary_contents() and
pg_get_wal_summarizer_state(). I was reluctant to try to cover these
because I thought it would be hard to get the tests to be stable. The
difficulties in stabilizing src/bin/pg_walsummary/t/002_blocks.pl seem
to demonstrate that this concern wasn't entire unfounded, but as far
as I know that test is now stable, so we could probably use the same
technique to test pg_wal_summary_contents(), maybe even as part of the
same test case. I don't really know what a good test for
pg_get_wal_summarizer_state() would look like, though.

>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/pg_combinebackup.c.gcov.html#L424

I guess we could test this by adding a tablespace, and a tablespace
mapping, to one of the pg_combinebackup tests.

>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/blkreftable.c.gcov.html#L790

This is dead code. I thought we might need to use this as a way of
managing memory pressure, but it didn't end up being needed. We could
remove it, or mark it #if NOT_USED, or whatever.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

On 2024-04-15 15:36:04 -0400, Robert Haas wrote:
> On Sun, Apr 14, 2024 at 6:33 PM Andres Freund <andres@anarazel.de> wrote:
> > - Some of the new walsummary code could use more tests.
> >   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/backup/walsummaryfuncs.c.gcov.html#L69
> 
> So this is pg_wal_summary_contents() and
> pg_get_wal_summarizer_state(). I was reluctant to try to cover these
> because I thought it would be hard to get the tests to be stable. The
> difficulties in stabilizing src/bin/pg_walsummary/t/002_blocks.pl seem
> to demonstrate that this concern wasn't entire unfounded, but as far
> as I know that test is now stable, so we could probably use the same
> technique to test pg_wal_summary_contents(), maybe even as part of the
> same test case. I don't really know what a good test for
> pg_get_wal_summarizer_state() would look like, though.

I think even just reaching the code, without a lot of of verification of the
returned data, is better than not reaching the code at all. I.e. the test
could just check that the pid is set, the tli is right.

That'd also add at least some coverage of
https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/postmaster/walsummarizer.c.gcov.html#L433


> >
https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/bin/pg_combinebackup/pg_combinebackup.c.gcov.html#L424
> 
> I guess we could test this by adding a tablespace, and a tablespace
> mapping, to one of the pg_combinebackup tests.

Seems worthwhile to me.


> >   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/blkreftable.c.gcov.html#L790
> 
> This is dead code. I thought we might need to use this as a way of
> managing memory pressure, but it didn't end up being needed. We could
> remove it, or mark it #if NOT_USED, or whatever.

Don't really have an opinion on that. How likely do you think we'll need it
going forward?


Note that I didn't look exhaustively through the coverage of the walsummarizer
code - I just looked at a few things that stood out.  I looked for a few
minutes more:

- It seems worth explicitly covering the various
  record types that walsummarizer needs to understand:
  https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/postmaster/walsummarizer.c.gcov.html#L1184
  i.e. XLOG_SMGR_TRUNCATE, XLOG_XACT_COMMIT_PREPARED, XLOG_XACT_ABORT, XLOG_XACT_ABORT_PREPARED.

- Another thing that looks to be not covered is dealing with
  enabling/disabling summarize_wal, that also seems worth testing?


Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
David Rowley
Дата:
On Mon, 15 Apr 2024 at 10:33, Andres Freund <andres@anarazel.de> wrote:
> - The new bump allocator has a fair amount of uncovered functionality:
>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L293

The attached adds a test to tuplesort to exercise BumpAllocLarge()

>   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L613

I don't see a way to exercise those. They're meant to be "can't
happen" ERRORs.  I could delete them and use BogusFree, BogusRealloc,
BogusGetChunkContext, BogusGetChunkSpace instead, but the ERROR
message would be misleading. I think it's best just to leave this.

David

Вложения

Re: Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

On 2024-04-16 10:26:57 +1200, David Rowley wrote:
> On Mon, 15 Apr 2024 at 10:33, Andres Freund <andres@anarazel.de> wrote:
> > - The new bump allocator has a fair amount of uncovered functionality:
> >   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L293
> 
> The attached adds a test to tuplesort to exercise BumpAllocLarge()

Cool.

> >   https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/utils/mmgr/bump.c.gcov.html#L613
> 
> I don't see a way to exercise those. They're meant to be "can't
> happen" ERRORs.  I could delete them and use BogusFree, BogusRealloc,
> BogusGetChunkContext, BogusGetChunkSpace instead, but the ERROR
> message would be misleading. I think it's best just to leave this.

I guess was thinking more about BumpIsEmpty() and BumpStats() then the "bogus"
cases. But BumpIsEmpty() likely is unreachable as well. BumpStats() is
reachable, but perhaps it's not worth it?

BEGIN;
DECLARE foo CURSOR FOR SELECT LEFT(a,10),b FROM (VALUES(REPEAT('a', 512 * 1024),1),(REPEAT('b', 512 * 1024),2)) v(a,b)
ORDERBY v.a DESC;
 
FETCH 1 FROM foo;
SELECT * FROM pg_backend_memory_contexts WHERE name = 'Caller tuples';


Hm, independent of this, seems a bit odd that we don't include the memory
context type in pg_backend_memory_contexts?

Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
Jeff Davis
Дата:
On Sun, 2024-04-14 at 15:33 -0700, Andres Freund wrote:
> - Coverage for some of the new unicode code is pretty poor:
>  
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/unicode_category.c.gcov.html#L122

Thank you for looking. Those functions are tested by category_test.c
which is run with the 'update-unicode' target.

Better testing in the SQL tests might be good, but the existing tests
are near-exhaustive, so I'm not terribly worried. Also, it's possible
not all of them are reachable by SQL, yet, because some of the later
patches in the series didn't land in 17.

Regards,
    Jeff Davis




Re: Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

On 2024-04-15 16:53:48 -0700, Jeff Davis wrote:
> On Sun, 2024-04-14 at 15:33 -0700, Andres Freund wrote:
> > - Coverage for some of the new unicode code is pretty poor:
> >  
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/common/unicode_category.c.gcov.html#L122
> 
> Thank you for looking. Those functions are tested by category_test.c
> which is run with the 'update-unicode' target.

Testing just during update-unicode doesn't strike me as a great - that way
portability issues wouldn't be found. And if it were tested that way, coverage
would understand it too.   I can just include update-unicode when running
coverage, but that doesn't seem great.

Can't we test this as part of the normal testsuite?

I don't at all like that the tests depend on downloading new unicode
data. What if there was an update but I just want to test the current state?

Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
Jeff Davis
Дата:
On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
> Can't we test this as part of the normal testsuite?

One thing that complicates things a bit is that the test compares the
results against ICU, so a mismatch in Unicode version between ICU and
Postgres can cause test failures. The test ignores unassigned code
points, so normally it just results in less-exhaustive test coverage.
But sometimes things really do change, and that would cause a failure.

I'm not quite sure how we should handle that -- maybe only run the test
when the ICU version is known to be in a range where that's not a
problem?

Another option is to look for another way to test this code without
ICU. We could generate a list of known mappings and compare to that,
but we'd have to do it some way other than what the code is doing now,
otherwise we'd just be testing the code against itself. Maybe we can
load the Unicode data into a Postgres table and then test with a SELECT
statement or something?

I am worried that it will end looking like an over-engineered way to
compare a text file to itself.

Stepping back a moment, my top worry is really not to test those C
functions, but to test the perl code that parses the text files and
generates those arrays. Imagine a future Unicode version does something
that the perl scripts didn't anticipate, and they fail to add array
entries for half the code points, or something like that. By testing
the arrays generated from freshly-parsed files exhaustively against
ICU, then we have a good defense against that. That situation really
only comes up when updating Unicode.

That's not to say that the C code shouldn't be tested, of course. Maybe
we can just do some spot checks for the functions that are reachable
via SQL and get rid of the functions that aren't yet reachable (and re-
add them when they are)?

> I don't at all like that the tests depend on downloading new unicode
> data. What if there was an update but I just want to test the current
> state?

I was mostly following the precedent for normalization. Should we
change that, also?

Regards,
    Jeff Davis




Re: Differential code coverage between 16 and HEAD

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
>> I don't at all like that the tests depend on downloading new unicode
>> data. What if there was an update but I just want to test the current
>> state?

> I was mostly following the precedent for normalization. Should we
> change that, also?

It's definitely not OK for the standard test suite to include
internet access.  Seems like we need to separate "download new
source files" from "generate the derived files".

            regards, tom lane



Re: Differential code coverage between 16 and HEAD

От
David Rowley
Дата:
On Tue, 16 Apr 2024 at 10:57, Andres Freund <andres@anarazel.de> wrote:
> I guess was thinking more about BumpIsEmpty() and BumpStats() then the "bogus"
> cases. But BumpIsEmpty() likely is unreachable as well.

The only call to MemoryContextIsEmpty() I see is AtSubCommit_Memory()
and it's on an aset.c context type. I see generation.c has the same
issue per [1].

> BumpStats() is
> reachable, but perhaps it's not worth it?
>
> BEGIN;
> DECLARE foo CURSOR FOR SELECT LEFT(a,10),b FROM (VALUES(REPEAT('a', 512 * 1024),1),(REPEAT('b', 512 * 1024),2))
v(a,b)ORDER BY v.a DESC;
 
> FETCH 1 FROM foo;
> SELECT * FROM pg_backend_memory_contexts WHERE name = 'Caller tuples';

I think primarily it's good to exercise that code just to make sure it
does not crash.  Looking at the output of the above on my machine:

     name      | ident |     parent     | level | total_bytes |
total_nblocks | free_bytes | free_chunks | used_bytes
---------------+-------+----------------+-------+-------------+---------------+------------+-------------+------------
 Caller tuples |       | TupleSort sort |     6 |     1056848 |
     3 |       8040 |           0 |    1048808
(1 row)

I'd say:

Name: stable
ident: stable
parent: stable
level: could change from a refactor of code
total_bytes: could be different on other platforms or dependent on
MEMORY_CONTEXT_CHECKING
total_nblocks: stable enough
free_bytes: could be different on other platforms or dependent on
MEMORY_CONTEXT_CHECKING
free_chunks: always 0
used_bytes: could be different on other platforms or dependent on
MEMORY_CONTEXT_CHECKING

I've attached a patch which includes your test with unstable columns
stripped out.

I cut the 2nd row down to just 512 bytes as I didn't see the need to
add two large datums.  Annoyingly it still uses 3 blocks as I've opted
to do dlist_push_head(&set->blocks, &block->node); in BumpAllocLarge()
which is the block that's picked up again in BumpAlloc() per block =
dlist_container(BumpBlock, node, dlist_head_node(&set->blocks));
wonder if the large blocks should push tail instead.

> Hm, independent of this, seems a bit odd that we don't include the memory
> context type in pg_backend_memory_contexts?

That seems like useful information to include.  It sure would be
useful to have in there to verify that I'm testing BumpStats().  I've
written a patch [2].

David

[1] https://coverage.postgresql.org/src/backend/utils/mmgr/generation.c.gcov.html#997
[2] https://postgr.es/m/CAApHDvrXX1OR09Zjb5TnB0AwCKze9exZN=9Nxxg1ZCVV8W-3BA@mail.gmail.com

Вложения

Re: Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

On 2024-04-16 13:50:14 +1200, David Rowley wrote:
> I think primarily it's good to exercise that code just to make sure it
> does not crash.  Looking at the output of the above on my machine:

Agreed.


>      name      | ident |     parent     | level | total_bytes |
> total_nblocks | free_bytes | free_chunks | used_bytes
>
---------------+-------+----------------+-------+-------------+---------------+------------+-------------+------------
>  Caller tuples |       | TupleSort sort |     6 |     1056848 |
>      3 |       8040 |           0 |    1048808
> (1 row)
> 
> I'd say:
> 
> Name: stable
> ident: stable
> parent: stable
> level: could change from a refactor of code
> total_bytes: could be different on other platforms or dependent on
> MEMORY_CONTEXT_CHECKING
> total_nblocks: stable enough
> free_bytes: could be different on other platforms or dependent on
> MEMORY_CONTEXT_CHECKING
> free_chunks: always 0
> used_bytes: could be different on other platforms or dependent on
> MEMORY_CONTEXT_CHECKING

I think total_nblocks might also not be entirely stable?  How about just
checking if total_bytes, total_nblocks, free_bytes and used_bytes are bigger
than 0?

> I cut the 2nd row down to just 512 bytes as I didn't see the need to
> add two large datums.

Agreed, I just quickly hacked the statement up based on your earlier one.


Looks good to me, either testing the other columns with > 0 or as you have it.


> > Hm, independent of this, seems a bit odd that we don't include the memory
> > context type in pg_backend_memory_contexts?
> 
> That seems like useful information to include.  It sure would be
> useful to have in there to verify that I'm testing BumpStats().  I've
> written a patch [2].

Nice!

Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
David Rowley
Дата:
On Tue, 16 Apr 2024 at 14:29, Andres Freund <andres@anarazel.de> wrote:
> I think total_nblocks might also not be entirely stable?

I think it is stable for this test.  However, I'll let the buildfarm
make the final call on that.

The reason I want to include it is that I'd like to push the large
allocations to the tail of the block list and make this workload use 2
blocks rather than 3.  If I fix that and update the test then it's a
bit of coverage to help ensure that doesn't get broken again.

> How about just
> checking if total_bytes, total_nblocks, free_bytes and used_bytes are bigger
> than 0?

Seems like a good idea.  I've done it that way and pushed.

Thanks

David



Re: Differential code coverage between 16 and HEAD

От
Andres Freund
Дата:
Hi,

On 2024-04-15 18:23:21 -0700, Jeff Davis wrote:
> On Mon, 2024-04-15 at 17:05 -0700, Andres Freund wrote:
> > Can't we test this as part of the normal testsuite?
> 
> One thing that complicates things a bit is that the test compares the
> results against ICU, so a mismatch in Unicode version between ICU and
> Postgres can cause test failures. The test ignores unassigned code
> points, so normally it just results in less-exhaustive test coverage.
> But sometimes things really do change, and that would cause a failure.

Hm, that seems annoying, even for update-unicode :/. But I guess it won't be
very common to have such failures?


> Stepping back a moment, my top worry is really not to test those C
> functions, but to test the perl code that parses the text files and
> generates those arrays. Imagine a future Unicode version does something
> that the perl scripts didn't anticipate, and they fail to add array
> entries for half the code points, or something like that. By testing
> the arrays generated from freshly-parsed files exhaustively against
> ICU, then we have a good defense against that. That situation really
> only comes up when updating Unicode.

That's a good point.


> That's not to say that the C code shouldn't be tested, of course. Maybe
> we can just do some spot checks for the functions that are reachable
> via SQL and get rid of the functions that aren't yet reachable (and re-
> add them when they are)?

Yes, I think that'd be a good start. I don't think we necessarily need
exhaustive coverage, just a bit more coverage than we have.


> > I don't at all like that the tests depend on downloading new unicode
> > data. What if there was an update but I just want to test the current
> > state?
> 
> I was mostly following the precedent for normalization. Should we
> change that, also?

Yea, I think we should. But I think it's less urgent if we end up testing more
of the code without those test binaries.  I don't immediately know what
dependencies would be best, tbh.

Greetings,

Andres Freund



Re: Differential code coverage between 16 and HEAD

От
Jeff Davis
Дата:
On Mon, 2024-04-15 at 21:35 -0400, Tom Lane wrote:
> It's definitely not OK for the standard test suite to include
> internet access.

The update-unicode target is not run as part of the standard test
suite.

>   Seems like we need to separate "download new
> source files" from "generate the derived files".

I'm not sure that's the right dividing line. There are three-ish steps:

1. Download the Unicode files
2. Generate the derived .h files
3. Run tests

If we stop after 1, then do we check in the Unicode files? If so, then
there's inconsistency between the Unicode files and the .h files, which
doesn't seem like a good idea. If we don't check in the files, then
nobody can skip to step 2, so I don't see the point in separating the
steps.

If we separate out step 3 that makes more sense: we check in the result
after step 2, and anyone can run step 3 without downloading anything.
The only problem with that is the tests I added depend on a recent-
enough version of ICU, so I'm not sure how many people will run it,
anyway.

Andres's complaints seem mainly about code coverage in the standard
test suite for the thin layer of C code above the generated arrays. I
agree: code coverage is a good goal by itself, and having a few more
platforms exercising that C code can't hurt. I think we should just
address that concern directly by spot-checking the results for a few
code points rather than trying to make the exhaustive ICU tests run on
more hosts.

Regards,
    Jeff Davis




Re: Differential code coverage between 16 and HEAD

От
Jeff Davis
Дата:
On Tue, 2024-04-16 at 11:58 -0700, Andres Freund wrote:
>
> Hm, that seems annoying, even for update-unicode :/. But I guess it
> won't be
> very common to have such failures?

Things don't change a lot between Unicode versions (and are subject to
the stability policy), but the tests are exhaustive, so even a single
character's property being changed will cause a failure when compared
against an older version of ICU. The case mapping test succeeds back to
ICU 64 (based on Unicode 12.1), but the category/properties test
succeeds only back to ICU 72 (based on Unicode 15.0).

I agree this is annoying, and I briefly documented it in
src/common/unicode/README. It means whoever updates Unicode for a
Postgres version should probably know how to build ICU from source and
point the Postgres build process at it. Maybe I should add more details
in the README to make that easier for others.

But it's also a really good test. The ICU parsing, interpretation of
data files, and lookup code is entirely independent of ours. Therefore,
if the results agree for all codepoints, we have a high degree of
confidence that the results are correct. That level of confidence seems
worth a bit of annoyance.

This kind of test is possible because the category/property and case
mapping functions accept a single code point, and there are only
0x10FFFF code points.

> > That's not to say that the C code shouldn't be tested, of course.
> > Maybe
> > we can just do some spot checks for the functions that are
> > reachable
> > via SQL and get rid of the functions that aren't yet reachable (and
> > re-
> > add them when they are)?
>
> Yes, I think that'd be a good start. I don't think we necessarily
> need
> exhaustive coverage, just a bit more coverage than we have.

OK, I'll submit a test module or something.

Regards,
    Jeff Davis