Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
>> This answer is simply broken. You've caused it to estimate half
>> of the bucket, which is an insane estimate for the given bucket
>> boundaries and WHERE constraint.
> I'm fine if the isnan() logic changes, but the comment indicates it's intended
> to be hit for an infinite histogram bound, but that doesn't work for timestamps
> (convert_to_scalar() should return (double)INFINITY and not
> (double)INT64_MIN/MAX).
I suppose the code you're looking at is
binfrac = (val - low) / (high - low);
/*
* Watch out for the possibility that we got a NaN or
* Infinity from the division. This can happen
* despite the previous checks, if for example "low"
* is -Infinity.
*/
if (isnan(binfrac) ||
binfrac < 0.0 || binfrac > 1.0)
binfrac = 0.5;
This doesn't really have any goals beyond "make sure we get a result
between 0.0 and 1.0, even if the calculation went pear-shaped for
some reason". You could make an argument that it should be like
if (isnan(binfrac))
binfrac = 0.5; /* throw up our hands for NaN */
else if (binfrac <= 0.0)
binfrac = 0.0; /* clamp in case of -Inf or -0 */
else if (binfrac > 1.0)
binfrac = 1.0; /* clamp in case of +Inf */
which would probably produce saner results in edge cases like these.
I think it'd also obviate the need for fooling with the conversion in
convert_to_scalar: while DT_NOBEGIN/DT_NOEND wouldn't produce exactly
the same result (hard 0.0 or 1.0) as an infinity, they'd produce
results very close to that.
regards, tom lane