On 2/9/23 04:53, John Naylor wrote:
>
> On Wed, Feb 8, 2023 at 11:05 PM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>> Dmitry Dolgov <9erthalion6@gmail.com <mailto:9erthalion6@gmail.com>>
> writes:
>> >> On Wed, Feb 08, 2023 at 10:03:01AM -0500, Tom Lane wrote:
>> >> I'd believe this argument more readily if the calculation weren't being
>> >> done in float arithmetic. Since it is, you're at the mercy of roundoff
>> >> error ... and that small negative delta could certainly pass for
>> >> roundoff error.
>>
>> > Hmm...yeah, good point. In both the reproducer I've posted and the
>> > backtrace from the thread the delta is indeed rather small.
>>
>> I bet also it only fails when dealing with IPv6 addresses.
>> With 32-bit IPv4 addresses, a float8 would have enough mantissa
>> bits that the calculation wouldn't become imprecise.
>
> Addresses from different families are treated as a distance of one, so I
> don't think inserting a single IPv6 address would get this far, and in
> fact I still get a crash in an empty table by taking out the IPv6
> address from Dmitry's example:
>
> insert into brin_test values('127.0.0.1/0' <http://127.0.0.1/0'>);
> insert into brin_test values('0.0.0.0/12' <http://0.0.0.0/12'>);
>
> Adding some debug calls shows it starts off negative from the start:
>
> =# insert into brin_test values('127.0.0.1/0' <http://127.0.0.1/0'>);
> INSERT 0 1
> =# insert into brin_test values('0.0.0.0/12' <http://0.0.0.0/12'>);
> NOTICE: idx: 3 before div: -1.000000
> NOTICE: idx: 3 after div: -0.003906
> NOTICE: idx: 2 before div: -0.003906
> NOTICE: idx: 2 after div: -0.000015
> NOTICE: idx: 1 before div: -0.000015
> NOTICE: idx: 1 after div: -0.000000
> NOTICE: idx: 0 before div: -0.000000
> NOTICE: idx: 0 after div: -0.000000
>
> ...so something else is wrong here but I haven't dug deeper yet.
>
I believe the bug is pretty trivial - the code applies the netmask
incorrectly, so that with 127.0.0.1/0 it ends with 0.0.0.1, and because
it assumes 0.0.0.1 < 0.0.0.0 it ends with negative delta.
In particular, the issue is that the code does this:
lena = ip_bits(ipa); -- 0
len = ip_addrsize(ipa); -- 4
for (for (i = 0; i < len; i++)
{
nbits = lena - (i * 8);
...
mask = (0xFF << (8 - nbits));
...
}
But for 127.0.0.1/0 we get lena=0, so for i>0 nbits gets negative, and
the shift is probably going to do something silly (not sure what
exactly, but AFAICS it's undefined behavior).
Attached is a fixup that resolves this failure for me. I need to look a
bit closer if there are some other issues (e.g. with the float rounding
errors, etc.).
Good thing is this is this won't break anything without the assert - we
may pick incorrect ranges to merge, so the index is a bit less
efficient, but still valid/correct.
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company