Re: Experimenting with hash join prefetch
От | KAZAR Ayoub |
---|---|
Тема | Re: Experimenting with hash join prefetch |
Дата | |
Msg-id | CA+K2Rum_79Q_BbgJG_voe7sTE9uD5Cj8h17NZFNVWfRjRbgrfQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Experimenting with hash join prefetch (Thomas Munro <thomas.munro@gmail.com>) |
Список | pgsql-hackers |
Hi,
I thought it might be interesting to revive this thread because the improvements i saw from Thomas’s work, and even just simple prefetching of bucket headers for the probe phase in-memory (to see the effect of prefetching), are still showing nice improvements. Here are some results for simple prefetching in probe phase only, on Thomas's last benchmark query (in-memory self join):
Task clock: -25.6%
Page faults: -21.46%
Cycles: -17.39%
L1 dcache loads: -13.78%
L1 dcache load misses: -30.1%
LLC loads: -36.7%
LLC load misses: -55.1%
dTLB loads: -13.77%
dTLB Misses: +0.5%
Cache references: -9.5%
Cache misses: -7.9%
IPC: -6.4%
So, I thought it might be worth relooking at this, even if we avoid major architectural changes in the hash join executor required by more advanced techniques. Though it will require a lot of perf benchmarking to prove the performance improvements, i think its doable to prove or opposite what we can find with minimal architectural changes.
Also, about the Linux experience, it was for lists (pointer chasing) prefetching (see linux thread), which was happening on Intel with prefetch(null) in the case of doing list prefetching on short-sized lists, hitting the end of the list very often (like chained hash tables). This is still noticeable in Postgres if we try to do prefetching on intra-bucket scan, performance is relatively the same or even worse.
Thoughts?
В списке pgsql-hackers по дате отправления: