Обсуждение: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

Поиск
Список
Период
Сортировка

[PATCH] Use cached hash to skip unnecessary key comparisons in dshash

От
CharSyam
Дата:
Dear PostgreSQL Hackers,

  This email proposes a patch to optimize hash table lookups and
  deletions in dshash.  The patch file,
  0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch,
  is attached.

  Problem:

  Each dshash_table_item already stores the hash value in its 'hash'
  field, but find_in_bucket() and delete_key_from_bucket() never use
  it.  They unconditionally call the compare function for every item
  in the bucket chain, incurring unnecessary overhead from DSM address
  translation and key comparison on non-matching items.

  Solution:

  This patch adds a hash pre-check (item->hash == hash) before calling
  equal_keys() in both find_in_bucket() and delete_key_from_bucket().
  Items with non-matching hash values are skipped with a single integer
  comparison, avoiding the expensive key comparison entirely.

  All callers -- dshash_find(), dshash_find_or_insert_extended(), and
  dshash_delete_key() -- already have the hash value computed, so it
  is simply passed through as a new parameter.  Since both functions
  are static, there is no API change.

  Benchmark:

  I ran a benchmark using the test_dsm_registry extension with 10,000
  string-keyed entries (key = 'key_1' .. 'key_10000') on an Apple
  Silicon machine:

    Test                     | Before    | After     | Improvement
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%

  The improvement is most significant when bucket chains are longer
  and key comparison is expensive (e.g., string keys).

  Testing & Compatibility:

  The patch compiles cleanly and passes all core regression tests
  (make check) on macOS (arm64).  The changes are limited to internal
  static functions in dshash.c and do not affect any public API, so
  full backward compatibility is maintained.

  I believe this patch is ready for review.  I look forward to any
  feedback and am happy to make revisions.

  Thank you,
  charsyam
Вложения

Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

От
Nathan Bossart
Дата:
On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> This patch adds a hash pre-check (item->hash == hash) before calling
> equal_keys() in both find_in_bucket() and delete_key_from_bucket().
> Items with non-matching hash values are skipped with a single integer
> comparison, avoiding the expensive key comparison entirely.

This relies on the fact that matching keys will have matching hashes, but
matching hashes does not necessarily imply matching keys.  IIUC this is a
safe assumption, although a short comment to this effect might be a nice
addition.

>   Test                     | Before    | After     | Improvement
>   -------------------------+-----------+-----------+------------
>   INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
>   LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
>   LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
>   LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%
> 
> The improvement is most significant when bucket chains are longer and key
> comparison is expensive (e.g., string keys).

Nice results.  Are there any regressions when the bucket chains are short
or when key comparisons are inexpensive?

> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

We are in feature-freeze for PostgreSQL v19 now, so this will unfortunately
need to wait until July when v20 development begins.  Please add an entry
to the commitfest site to ensure this doesn't get lost:

    https://commitfest.postgresql.org/

-- 
nathan



Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

От
Nathan Bossart
Дата:
On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

Sorry, I forgot to ask whether we could move the "pre-check" to within the
equal_keys() function so that it needn't be added to every one of its
callers.

-- 
nathan



Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

От
CharSyam
Дата:
Thank you for the thoughtful review. I've attached an updated patch (v2)
  addressing your feedback.

  1. Comment added

  I've added a comment in find_in_bucket() explaining the hash pre-check
  assumption:

    /*
     * If the hash values don't match, the keys certainly don't match
     * either, so we can skip the expensive key comparison.  Matching
     * hashes don't guarantee matching keys, so equal_keys() is still
     * needed for confirmation.
     */

  A short "See comment in find_in_bucket()" reference is also added in
  delete_key_from_bucket().

  2. No regressions with short keys / cheap comparisons

  I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
  where strcmp cost is minimal:

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 13.26 ms  |  6.44 ms  | ~51% faster
    LOOKUP 10000 hits        |  7.94 ms  |  5.03 ms  | ~37% faster
    LOOKUP 10000 misses      |  8.08 ms  |  4.80 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.05 ms  | 24.06 ms  | ~27% faster

  For reference, here are the original string key results ("key_1" ~
  "key_10000"):

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37% faster
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47% faster
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21% faster

  No regressions in either scenario.  Even with short keys, the integer
  hash pre-check is always cheaper than the DSM address translation plus
  compare function call it avoids.

  Both tests ran on macOS (arm64) using the test_dsm_registry extension
  with 10,000 entries.

  3. Keeping the pre-check outside equal_keys()

  I considered moving the hash comparison into equal_keys(), but I think
  keeping it at the call site is a better fit for the following reasons:

    - equal_keys() is a pure key comparison function.  Mixing in hash
      comparison would change its semantics and make the name misleading.

    - To pass hashes into equal_keys(), we would need to add two hash
      parameters (one for the search key, one for the item).  The caller
      still needs to extract item->hash before the call, so the call site
      doesn't actually get simpler.

    - There are only two call sites, both already modified in this patch,
      so there is no real maintenance burden from having the check at each
      call site.

    - This pattern is consistent with other hash table implementations in
      PostgreSQL (e.g., dynahash.c), which also compare hashes outside
      the key equality function.

  I'm happy to reconsider if you feel differently.

  Regards,
  charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <nathandbossart@gmail.com>님이 작성:
On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

Sorry, I forgot to ask whether we could move the "pre-check" to within the
equal_keys() function so that it needn't be added to every one of its
callers.

--
nathan
Вложения