Re: [HACKERS] A design for amcheck heapam verification

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [HACKERS] A design for amcheck heapam verification
Дата
Msg-id CAEepm=3yUCJmeMON45ZOHGTT1gry0W1GvvN_gjzyXMrFOkGb=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] A design for amcheck heapam verification  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, Aug 30, 2017 at 7:58 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, May 11, 2017 at 4:30 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I spent only a few hours writing a rough prototype, and came up with
>> something that does an IndexBuildHeapScan() scan following the
>> existing index verification steps. Its amcheck callback does an
>> index_form_tuple() call, hashes the resulting IndexTuple (heap TID and
>> all), and tests it for membership of a bloom filter generated as part
>> of the main B-Tree verification phase. The IndexTuple memory is freed
>> immediately following hashing.
>
> I attach a cleaned-up version of this. It has extensive documentation.
> My bloom filter implementation is broken out as a separate patch,
> added as core infrastructure under "lib".

Some drive-by comments on the lib patch:

+bloom_add_element(bloom_filter *filter, unsigned char *elem, Size len)

I think the plan is to use size_t for new stuff[1].

+/*
+ * Which element of the sequence of powers-of-two is less than or equal to n?
+ *
+ * Used to size bitset, which in practice is never allowed to exceed 2 ^ 30
+ * bits (128MB).  This frees us from giving further consideration to int
+ * overflow.
+ */
+static int
+pow2_truncate(int64 target_bitset_size)
+{
+    int v = 0;
+
+    while (target_bitset_size > 0)
+    {
+        v++;
+        target_bitset_size = target_bitset_size >> 1;
+    }
+
+    return Min(v - 1, 30);
+}

This is another my_log2(), right?

It'd be nice to replace both with fls() or flsl(), though it's
annoying to have to think about long vs int64 etc.  We already use
fls() in two places and supply an implementation in src/port/fls.c for
platforms that lack it (Windows?), but not the long version.

+/*
+ * Hash function is taken from sdbm, a public-domain reimplementation of the
+ * ndbm database library.
+ */
+static uint32
+sdbmhash(unsigned char *elem, Size len)
+{
+    uint32    hash = 0;
+    int        i;
+
+    for (i = 0; i < len; elem++, i++)
+    {
+        hash = (*elem) + (hash << 6) + (hash << 16) - hash;
+    }
+
+    return hash;
+}

I see that this is used in gawk, BerkeleyDB and all over the place[2].
Nice.  I understand that this point of this is to be a hash function
that is different from our usual one, for use by k_hashes().  Do you
think it belongs somewhere more common than this?  It seems a bit like
our hash-related code is scattered all over the place but should be
consolidated, but I suppose that's a separate project.

Unnecessary braces here and elsewhere for single line body of for loops.

+bloom_prop_bits_set(bloom_filter *filter)
+{
+    int        bitset_bytes = NBITS(filter) / BITS_PER_BYTE;
+    int64    bits_set = 0;
+    int        i;
+
+    for (i = 0; i < bitset_bytes; i++)
+    {
+        unsigned char byte = filter->bitset[i];
+
+        while (byte)
+        {
+            bits_set++;
+            byte &= (byte - 1);
+        }
+    }

Sorry I didn't follow up with my threat[3] to provide a central
popcount() function to replace the implementations all over the tree.

[1] https://www.postgresql.org/message-id/25076.1489699457%40sss.pgh.pa.us
[2] http://www.cse.yorku.ca/~oz/hash.html
[3] https://www.postgresql.org/message-id/CAEepm%3D3g1_fjJGp38QGv%2B38BC2HHVkzUq6s69nk3mWLgPHqC3A%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken