Обсуждение: [PATCH] btree_gist: fix union implementation for variable length columns
Hi hackers, while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed that gbt_var_union() mistreats the first vector element. Patch is attached. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1544349 Pavel
Вложения
Pavel Raiskup <praiskup@redhat.com> writes:
> while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
> that gbt_var_union() mistreats the first vector element. Patch is attached.
Hi Pavel! For patches that purport to resolve bugs, we usually like to
add a regression test case that demonstrates the bug in unpatched code.
Can you provide a small test case that does so? (The BZ you pointed to
doesn't seem to address this...)
regards, tom lane
Hi Tom,
On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
> Pavel Raiskup <praiskup@redhat.com> writes:
> > while I tried to debug 'gcc -fstack-protector -O3' problems in [1], I noticed
> > that gbt_var_union() mistreats the first vector element. Patch is attached.
>
> Hi Pavel! For patches that purport to resolve bugs, we usually like to
> add a regression test case that demonstrates the bug in unpatched code.
> Can you provide a small test case that does so? (The BZ you pointed to
> doesn't seem to address this...)
I've been looking at the reproducer for a while now... and I probably
need a hint if that's necessary.
Turns out the problem is only related to bit/bit varying type (so the
patch comments need to be reworded properly, at least) since those are the
only types which have implemented the f_l2n() callback.
Considering testcase 'bit.sql' (bit(33) type), it's pretty clear that on
little endian boxes the problematic strings would be '00100001*' (int32
value for '33', because there's varbit header). Big endian boxes would
have problems with char[] like {0, 0, 0, 33, ...}. But I fail to
construct right order of correctly picked elements into 'bit.data'.
The problem probably is (a) that picksplit reorders the elements very
often, and probably that (b) random() brings non-determinism into the
final tree shape (if the reproducer was to be based on many equal keys in
the index). It's amazing to see how the index fixes itself :-) for this
bug.
Can you help me with the reproducer idea, resp. can we live without the
reproducer in this particular case?
Pavel
Pavel Raiskup <praiskup@redhat.com> writes:
> On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote:
>> Hi Pavel! For patches that purport to resolve bugs, we usually like to
>> add a regression test case that demonstrates the bug in unpatched code.
>> Can you provide a small test case that does so? (The BZ you pointed to
>> doesn't seem to address this...)
> Turns out the problem is only related to bit/bit varying type (so the
> patch comments need to be reworded properly, at least) since those are the
> only types which have implemented the f_l2n() callback.
What I'm failing to wrap my head around is why this code exists at all.
AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to
an INTALIGN boundary, which it wouldn't necessarily be otherwise. But
why bother? That should have no effect on the behavior of bit_cmp().
So I'm speculating that the reason nobody has noticed a problem is that
there is no problem because this code is useless and could be ripped out.
It would be easier to figure this out if the btree_gist code weren't
so desperately undocumented. Teodor, do you remember why it's like
this?
regards, tom lane
On Wednesday, July 11, 2018 7:26:40 PM CEST Tom Lane wrote: > Pavel Raiskup <praiskup@redhat.com> writes: > > On Monday, July 9, 2018 7:41:59 PM CEST Tom Lane wrote: > >> Hi Pavel! For patches that purport to resolve bugs, we usually like to > >> add a regression test case that demonstrates the bug in unpatched code. > >> Can you provide a small test case that does so? (The BZ you pointed to > >> doesn't seem to address this...) > > > Turns out the problem is only related to bit/bit varying type (so the > > patch comments need to be reworded properly, at least) since those are the > > only types which have implemented the f_l2n() callback. > > What I'm failing to wrap my head around is why this code exists at all. > AFAICS, gbt_bit_xfrm just forces the bitstring to be zero-padded out to > an INTALIGN boundary, which it wouldn't necessarily be otherwise. > But why bother? That should have no effect on the behavior of bit_cmp(). The gbt_bit_xfrm() method actually also skips the varbit header (number of bits, stored in VarBit.bit_len), the magic is done by VARBITS macro. If the header is dropped, it's OK to just use existing byteacmp() for bit array comparison. Pavel
> It would be easier to figure this out if the btree_gist code weren't
> so desperately undocumented. Teodor, do you remember why it's like
> this?
Will look.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thursday, July 12, 2018 5:53:18 PM CET Teodor Sigaev wrote: > > It would be easier to figure this out if the btree_gist code weren't > > so desperately undocumented. Teodor, do you remember why it's like > > this? > > Will look. Ping on this issue. I guess the patch I proposed isn't wrong in this case, I'm just not sure about the commit message. Pavel