The past ~1 week, I tried to space out all other tasks on the machine, so as to ensure
that 1-min CPU is mostly <2 (and thus not many things hammering the disk) and with
that I see 0 failures these past few days. This isn't conclusive by any means, but it
does seem that reducing IO contention has helped remove the errors, like what
Alexander suspects / repros here.
Just a note, that I've reverted some of those recent changes now, and so if the theory
holds true, I wouldn't be surprised if some of these errors restarted on dodo.
Looking back at the test failures, I can see errors really reappeared just after your revert (at 2024-06-28), so that theory proved true, but I see none of those since 2024-07-02. Does it mean that you changed something on dodo/fixed that performance issue?
Could you please describe how you resolved this issue, just for the record?