Обсуждение: Re: BUG #15307: Low numerical precision of (Co-) Variance
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested This is my first PostgreSQL review. Hopefully I'm getting it right. I independently ran into the bug in question and foundthat the author had already written a patch. I'm happy the author wrote this. (1) I am not experienced with PostgreSQL internals, so I can't comment on the coding style or usage of internal functions. (2) The patch applies cleanly to ce4887bd025b95c7b455fefd817a418844c6aad3. "make check", "make check-world", and "make install"pass. (3) The patch addresses the numeric inaccuracy reported in the bug. (Yay!) I believe the tests are sufficient to confirmthat it works as intended. I don't think the test coverage *before* this patch was sufficient, and I appreciate theimproved test coverage of the combine functions. I verified the new tests manually. (4) The comments cite Youngs & Cramer (1971). This is a dated citation. It justifies its algorithms based on pre-IEEE754single-precision arithmetic. Most notably, it assumes that floating-point division is not exact. That said, thealgorithm is still a good choice; it is justified now because compared to the standard Welford variance, it is less proneto causing pipeline stalls on a modern CPU. Maybe link to Schubert & Gentz 2018 (https://dl.acm.org/citation.cfm?id=3223036)instead. The new float8_combine combine code is (almost) S&G eqn. 22; the float8_accumcode is S&G eqn. 28. float8_regr_accum and float8_regr_combine are S&G eqn. 22. (5) I think the comment /* Watch out for roundoff error producing a negative variance */ and associated check are obsoletenow. The new status of this patch is: Waiting on Author
By the way, I did not see the discussion thread or notice that an author of the paper I mentioned and the reporter of thebug were the same person until after I wrote the review. Oops.
On 27 September 2018 at 06:12, Madeleine Thompson <madeleineth@gmail.com> wrote: > This is my first PostgreSQL review. Hopefully I'm getting it right. I independently ran into the bug in question and foundthat the author had already written a patch. I'm happy the author wrote this. > Thanks for the review. And yes, this sort of review is exactly what we need. > (1) I am not experienced with PostgreSQL internals, so I can't comment on the coding style or usage of internal functions. > > (2) The patch applies cleanly to ce4887bd025b95c7b455fefd817a418844c6aad3. "make check", "make check-world", and "makeinstall" pass. > > (3) The patch addresses the numeric inaccuracy reported in the bug. (Yay!) I believe the tests are sufficient to confirmthat it works as intended. I don't think the test coverage *before* this patch was sufficient, and I appreciate theimproved test coverage of the combine functions. I verified the new tests manually. > Excellent. Thanks for testing. > (4) The comments cite Youngs & Cramer (1971). This is a dated citation. It justifies its algorithms based on pre-IEEE754single-precision arithmetic. Most notably, it assumes that floating-point division is not exact. That said, thealgorithm is still a good choice; it is justified now because compared to the standard Welford variance, it is less proneto causing pipeline stalls on a modern CPU. Maybe link to Schubert & Gentz 2018 (https://dl.acm.org/citation.cfm?id=3223036)instead. The new float8_combine combine code is (almost) S&G eqn. 22; the float8_accumcode is S&G eqn. 28. float8_regr_accum and float8_regr_combine are S&G eqn. 22. > Hmm, I think that Youngs & Cramer should be cited as the original inventors of the algorithm (which pre-dates the first version of IEEE754 by a few years), although I'm happy to also cite Schubert & Gentz as a more modern justification for the choice of algorithm. Regarding the combine functions, I think perhaps Chan, Golub & LeVeque [1] should be cited as the original inventors. I think they only did variance, not covariance, but that's a straightforward generalisation of their work. [1] Updating Formulae and a Pairwise Algorithm for Computing Sample Variances, T. F. Chan, G. H. Golub & R. J. LeVeque, COMPSTAT 1982. > (5) I think the comment /* Watch out for roundoff error producing a negative variance */ and associated check are obsoletenow. > Ah, good point. We clearly only ever add positive quantities to Sxx and Syy, and likewise when they are combined, so there is no way that they can ever become negative now. Another neat consequence of this algorithm. I'll post an updated patch in a while. Regards, Dean
On Thu, 27 Sep 2018 at 14:22, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I'll post an updated patch in a while. > I finally got round to looking at this again. Here is an update, based on the review comments. The next question is whether or not to back-patch this. Although this was reported as a bug, my inclination is to apply this to HEAD only, based on the lack of prior complaints. That also matches our decision for other similar patches, e.g., 7d9a4737c2. Regards, Dean
Вложения
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > The next question is whether or not to back-patch this. Although this > was reported as a bug, my inclination is to apply this to HEAD only, > based on the lack of prior complaints. That also matches our decision > for other similar patches, e.g., 7d9a4737c2. +1 for no back-patch. Even though the new results are hopefully more accurate, they're still different from before, and that might cause issues for somebody if it happens in a stable branch. regards, tom lane
On Wed, Oct 3, 2018 at 9:04 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Thu, 27 Sep 2018 at 14:22, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > I'll post an updated patch in a while. > > > > I finally got round to looking at this again. Here is an update, based > on the review comments. > > The next question is whether or not to back-patch this. Although this > was reported as a bug, my inclination is to apply this to HEAD only, > based on the lack of prior complaints. That also matches our decision > for other similar patches, e.g., 7d9a4737c2. This diff looks good to me. Also, it applies cleanly against abd9ca377d669a6e0560e854d7e987438d0e612e and passes `make check-world`. I agree that this is not suitable for a patch release. Madeleine
On Wed, 3 Oct 2018 at 15:58, Madeleine Thompson <madeleineth@gmail.com> wrote: > This diff looks good to me. Also, it applies cleanly against > abd9ca377d669a6e0560e854d7e987438d0e612e and passes `make > check-world`. > > I agree that this is not suitable for a patch release. > Pushed to master. Thanks for the review. Regards, Dean