[COMMITTERS] pgsql: Make simplehash.h grow hashtable in additional cases.

Поиск
Список
Период
Сортировка
От Andres Freund
Тема [COMMITTERS] pgsql: Make simplehash.h grow hashtable in additional cases.
Дата
Msg-id E1cl0tJ-0003LX-SR@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Make simplehash.h grow hashtable in additional cases.

Increase the size when either the distance between actual and optimal
slot grows too large, or when too many subsequent entries would have
to be moved.

This addresses reports that the simplehash performed, sometimes
considerably, worse than dynahash.

The reason turned out to be that insertions into the hashtable where,
due to the use of parallel query, in effect done from another
hashtable, in hash-value order.  If the target hashtable, due to
mis-estimation, was sized a lot smaller than the source table(s) that
lead to very imbalanced tables; a lot of entries in many close-by
buckets from the source tables were inserted into a single, wider,
bucket on the target table.  As the growth factor was solely computed
based on the fillfactor, the performance of the table decreased
further and further.

b81b5a96f424531b was an attempt to address this problem for hash
aggregates (but not for bitmap scans), but it turns out that the
current method of mixing hash values often actually leaves neighboring
hash-values close to each other, just in different value range.  It
might be worth revisiting that independently of the performance issues
addressed in this patch..

To address that problem resize tables in two additional cases: Firstly
when the optimal position for an entry would be far from the actual
position, secondly when many entries would have to be moved to make
space for the new entry (while satisfying the robin hood property).

Due to the additional resizing threshold it seems possible, and
testing confirms that so far, that a higher fillfactor doesn't hurt
performance and saves a bit of memory.  It seems better to increase it
now, before a release containing any of this code, rather than wonder
in some later release.

The various boundaries aren't determined in a particularly scientific
manner, they might need some fine-tuning.

In all my tests the new code now, even with parallelism, performs at
least as good as the old code, in several scenarios significantly
better.

Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh
Discussion:
    https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com
    https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d4c62a6b623d6eef88218158e9fa3cf974c6c7e5

Modified Files
--------------
src/include/lib/simplehash.h | 55 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 50 insertions(+), 5 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: [COMMITTERS] pgsql: pg_upgrade: Fix large object COMMENTS, SECURITY LABELS
Следующее
От: Tom Lane
Дата:
Сообщение: [COMMITTERS] pgsql: Repair incorrect pg_dump labeling for some comments andsecurity