Re: Change GUC hashtable to use simplehash?

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: Change GUC hashtable to use simplehash?
Дата
Msg-id CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Change GUC hashtable to use simplehash?  (John Naylor <johncnaylorls@gmail.com>)
Ответы Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Tue, Mar 5, 2024 at 5:30 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Tue, Jan 30, 2024 at 5:04 PM John Naylor <johncnaylorls@gmail.com> wrote:
> >
> > On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
> > > But given that we know the data length and we have it in a register
> > > already, it's easy enough to just mask out data past the end with a
> > > shift. See patch 1. Performance benefit is about 1.5x Measured on a
> > > small test harness that just hashes and finalizes an array of strings,
> > > with a data dependency between consecutive hashes (next address
> > > depends on the previous hash output).
> >
> > Interesting work! I've taken this idea and (I'm guessing, haven't
> > tested) improved it by re-using an intermediate step for the
> > conditional, simplifying the creation of the mask, and moving the
> > bitscan out of the longest dependency chain.
>
> This needed a rebase, and is now 0001. I plan to push this soon.

I held off on this because CI was failing, but it wasn't because of this.

> I also went and looked at the simplehash instances and found a few
> that would be easy to switch over. Rather than try to figure out which
> could benefit from shaving cycles, I changed all the string hashes,
> and one more, in 0002 so they can act as examples.

This was the culprit. The search path cache didn't trigger this when
it went in, but it seems for frontend a read past the end of malloc
fails -fsantize=address. By the same token, I'm guessing the only
reason this didn't fail for backend is because almost all strings
you'd want to use as a hash key won't use a malloc'd external block.

I found that adding __attribute__((no_sanitize_address)) to
fasthash_accum_cstring_aligned() passes CI. While this kind of
exception is warned against (for good reason), I think it's fine here
given that glibc and NetBSD, and probably others, do something similar
for optimized strlen(). Before I write the proper macro for that, are
there any objections? Better ideas?

> Commit 42a1de3013 added a new use for string_hash, but I can't tell
> from a quick glance whether it uses the truncation, so I'm going to
> take a closer look before re-attaching the proposed dynahash change
> again.

After looking, I think the thing to do here is create a
hashfn_unstable.c file for global functions:
- hash_string() to replace all those duplicate definitions of
hash_string_pointer() in all the frontend code
- hash_string_with_limit() for dynahash and dshash.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: WIP Incremental JSON Parser
Следующее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation