On Tue, Nov 29, 2022 at 1:36 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> While creating a benchmark for inserting into node128-inner, I found a bug. If a caller deletes from a node128, the
slotindex is set to invalid, but the child pointer is still valid. Do that a few times, and every child pointer is
valid,even if no slot index points to it. When the next inserter comes along, something surprising happens. This
function:
>
> /* Return an unused slot in node-128 */
> static int
> node_inner_128_find_unused_slot(rt_node_inner_128 *node, uint8 chunk)
> {
> int slotpos = 0;
>
> Assert(!NODE_IS_LEAF(node));
> while (node_inner_128_is_slot_used(node, slotpos))
> slotpos++;
>
> return slotpos;
> }
>
> ...passes an integer to this function, whose parameter is a uint8:
>
> /* Is the slot in the node used? */
> static inline bool
> node_inner_128_is_slot_used(rt_node_inner_128 *node, uint8 slot)
> {
> Assert(!NODE_IS_LEAF(node));
> return (node->children[slot] != NULL);
> }
>
> ...so instead of growing the node unnecessarily or segfaulting, it enters an infinite loop doing this:
>
> add eax, 1
> movzx ecx, al
> cmp QWORD PTR [rbx+264+rcx*8], 0
> jne .L147
>
> The fix is easy enough -- set the child pointer to null upon deletion,
Good catch!
> but I'm somewhat astonished that the regression tests didn't hit this. I do still intend to replace this code with
somethingfaster, but before I do so the tests should probably exercise the deletion paths more. Since VACUUM
Indeed, there are some tests for deletion but all of them delete all
keys in the node so we end up deleting the node. I've added tests of
repeating deletion and insertion as well as additional assertions.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com