Re: Change GUC hashtable to use simplehash?

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: Change GUC hashtable to use simplehash?
Дата
Msg-id CANWCAZZY+kZpMQh4OCJ1jrQigh_oCDWrbzWWOPTmJKj+0enR3A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Sat, Jan 20, 2024 at 7:13 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2024-01-19 at 13:38 -0800, Jeff Davis wrote:
> > One post-commit question on 0aba255440: why do
> > haszero64(pg_bswap64(chunk)) rather than just haszero64(chunk)? How
> > does byteswapping affect whether a zero byte exists or not?
>
> I missed that it was used later when finding the rightmost one
> position.
>
> The placement of the comment was slightly confusing. Is:
>
>   haszero64(pg_bswap64(chunk)) == pg_bswap64(haszero64(chunk))
>
> ? If so, perhaps we can do the byte swapping outside of the loop, which
> might save a few cycles on longer strings and would be more readable.

The above identity is not true for this haszero64 macro. I phrased it
as "The rest of the bytes are indeterminate", but that's not very
clear. It can only be true if it set bytes for all and only those
bytes where the input had zeros. In the NetBSD strlen source, there is
a comment telling of a way to do this:

~(((x & 0x7f....7f) + 0x7f....7f) | (x | 0x7f....7f))

https://github.com/NetBSD/src/blob/trunk/common/lib/libc/arch/x86_64/string/strlen.S

(They don't actually use it of course, since x86_64 is little-endian)
From the commentary there, it sounds like 1 or 2 more instructions.
One unmentioned assumption I had was that the byte swap would be a
single instruction on all platforms where we care about performance
(*). If that's not the case, we could switch to the above macro for
big-endian machines. It'd be less readable since we'd then need an
additional #ifdef for counting leading, rather than trailing zeros
(that would avoid byte-swapping entirely). Either way, I'm afraid
big-endian is stuck doing a bit of extra work somewhere. That work
could be amortized by doing a quick check in the loop and afterwards
completely redoing the zero check (or a bytewise check same as the
unaligned path), but that would penalize short strings.

(*) 32-bit platforms don't take this path, but mamba's build failed
because the previously-misspelled symbol was still in the source file.
We could also #ifdef around the whole aligned-path function, although
it's redundant.

I hope this makes it more clear. Maybe the comment could use some work.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: Change GUC hashtable to use simplehash?
Следующее
От: Yongtao Huang
Дата:
Сообщение: Re: Optimize duplicate code and fix memory leak in function fetch_remote_table_info()