Re: suspicious valgrind reports about radixtree/tidstore on arm64

Поиск
Список
Период
Сортировка
I wrote:
> I've reproduced what looks like about the same thing on
> my Pi 4 using Fedora 38: just run "make installcheck-parallel"
> under valgrind, and kaboom.  Definitely needs investigation.

The problem appears to be that RT_ALLOC_NODE doesn't bother to
initialize the chunks[] array when making a RT_NODE_16 node.
If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries,
then when RT_NODE_16_SEARCH_EQ applies vector operations that
read the entire array, it's operating partially on uninitialized
data.  Now, that's actually OK because of the "mask off invalid
entries" step, but aarch64 valgrind complains anyway.

I hypothesize that the reason we're not seeing equivalent failures
on x86_64 is one of

1. x86_64 valgrind is stupider than aarch64, and fails to track that
the contents of the SIMD registers are only partially defined.

2. x86_64 valgrind is smarter than aarch64, and is able to see
that the "mask off invalid entries" step removes all the
potentially-uninitialized bits.

The first attached patch, "radixtree-fix-minimal.patch", is enough
to stop the aarch64 valgrind failure for me.  However, I think
that the coding here is pretty penny-wise and pound-foolish,
and that what we really ought to do is the second patch,
"radixtree-fix-proposed.patch".  I do not believe that asking
memset to zero the three-byte RT_NODE structure produces code
that's either shorter or faster than having it zero 8 bytes
(as for RT_NODE_4) or having it do that and then separately
zero some more stuff (as for the larger node types).  Leaving
RT_NODE_4's chunks[] array uninitialized is going to bite us
someday, too, even if it doesn't right now.  So I think we
ought to just zero the whole fixed-size part of the nodes,
which is what radixtree-fix-proposed.patch does.

(The RT_NODE_48 case could be optimized a bit if we cared
to swap the order of its slot_idxs[] and isset[] arrays;
then the initial zeroing could just go up to slot_idxs[].
I don't know if there's any reason why the current order
is preferable.)

            regards, tom lane

diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 338e1d741d..e21f7be3f9 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -849,8 +849,14 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c
     switch (kind)
     {
         case RT_NODE_KIND_4:
-        case RT_NODE_KIND_16:
             break;
+        case RT_NODE_KIND_16:
+            {
+                RT_NODE_16 *n16 = (RT_NODE_16 *) node;
+
+                memset(n16->chunks, 0, sizeof(n16->chunks));
+                break;
+            }
         case RT_NODE_KIND_48:
             {
                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 338e1d741d..4510f7c4cd 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -845,27 +845,25 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c

     /* initialize contents */

-    memset(node, 0, sizeof(RT_NODE));
     switch (kind)
     {
         case RT_NODE_KIND_4:
+            memset(node, 0, offsetof(RT_NODE_4, children));
+            break;
         case RT_NODE_KIND_16:
+            memset(node, 0, offsetof(RT_NODE_16, children));
             break;
         case RT_NODE_KIND_48:
             {
                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;

-                memset(n48->isset, 0, sizeof(n48->isset));
+                memset(n48, 0, offsetof(RT_NODE_48, children));
                 memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs));
                 break;
             }
         case RT_NODE_KIND_256:
-            {
-                RT_NODE_256 *n256 = (RT_NODE_256 *) node;
-
-                memset(n256->isset, 0, sizeof(n256->isset));
-                break;
-            }
+            memset(node, 0, offsetof(RT_NODE_256, children));
+            break;
         default:
             pg_unreachable();
     }

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: IPC::Run accepts bug reports
Следующее
От: Tom Lane
Дата:
Сообщение: Re: suspicious valgrind reports about radixtree/tidstore on arm64