Обсуждение: HASH INDEX builds seems confused

Поиск

Список

Период

Сортировка

HASH INDEX builds seems confused

От

John Naylor

Дата:

11 ноября, 08:56:42

hashbuild() says:

 * If we just insert the tuples into the index in scan order, then
 * (assuming their hash codes are pretty random) there will be no locality
 * of access to the index, and if the index is bigger than available RAM
 * then we'll thrash horribly.  To prevent that scenario, we can sort the
 * tuples by (expected) bucket number.  However, such a sort is useless
 * overhead when the index does fit in RAM.  We choose to sort if the
 * initial index size exceeds maintenance_work_mem, or the number of
 * buffers usable for the index, whichever is less.  (Limiting by the

However, since commit e09d7a126 it's harder to believe sorts are ever
useless, since we then decided that sorts should have a more strict
sort order for the sake of sequential access. Further, d09dbeb9b built
upon that to remove wasteful binary search when inserting into the
page. Looking at some of the numbers in the linked threads, I wonder
if all test environments were actually hitting the sort path at all,
since you'd have to exceed m_w_m or s_b to take advantage. Unless I'm
missing something, it seems like we should just sort unconditionally.
That would be a nice simplification, and might speed up index builds
even when there's plenty of memory. (If I am in fact missing
something, maybe comments need updating)

Now that I'm looking, I'm also wondering how hard it would be to have
datum1 contain both the bucket (high bits) and hash (lower bits),
since we can now count on Datums being 8 bytes on all platforms. It
might be harder in turn to hack things so that the appropriate sort
specialization could be applied (it'd need a fake sortKey at least),
but that would be a possible future project.

--
John Naylor
Amazon Web Services

Re: HASH INDEX builds seems confused

От

Amit Kapila

Дата:

11 ноября, 14:08:13

On Tue, Nov 11, 2025 at 11:27 AM John Naylor <johncnaylorls@gmail.com> wrote:
>
> hashbuild() says:
>
>  * If we just insert the tuples into the index in scan order, then
>  * (assuming their hash codes are pretty random) there will be no locality
>  * of access to the index, and if the index is bigger than available RAM
>  * then we'll thrash horribly.  To prevent that scenario, we can sort the
>  * tuples by (expected) bucket number.  However, such a sort is useless
>  * overhead when the index does fit in RAM.  We choose to sort if the
>  * initial index size exceeds maintenance_work_mem, or the number of
>  * buffers usable for the index, whichever is less.  (Limiting by the
>
> However, since commit e09d7a126 it's harder to believe sorts are ever
> useless, since we then decided that sorts should have a more strict
> sort order for the sake of sequential access. Further, d09dbeb9b built
> upon that to remove wasteful binary search when inserting into the
> page. Looking at some of the numbers in the linked threads, I wonder
> if all test environments were actually hitting the sort path at all,
> since you'd have to exceed m_w_m or s_b to take advantage. Unless I'm
> missing something, it seems like we should just sort unconditionally.
> That would be a nice simplification, and might speed up index builds
> even when there's plenty of memory. (If I am in fact missing
> something, maybe comments need updating)
>

+1. It seems worth pursuing this. We can establish the benefits by
taking some performance data.

> Now that I'm looking, I'm also wondering how hard it would be to have
> datum1 contain both the bucket (high bits) and hash (lower bits),
> since we can now count on Datums being 8 bytes on all platforms. It
> might be harder in turn to hack things so that the appropriate sort
> specialization could be applied (it'd need a fake sortKey at least),
> but that would be a possible future project.
>

Yeah that also sounds worth exploring but what benefit are you
expecting out of it?

--
With Regards,
Amit Kapila.

Re: HASH INDEX builds seems confused

От

John Naylor

Дата:

12 ноября, 10:13:05

On Tue, Nov 11, 2025 at 6:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 11, 2025 at 11:27 AM John Naylor <johncnaylorls@gmail.com> wrote:
> > However, since commit e09d7a126 it's harder to believe sorts are ever
> > useless, since we then decided that sorts should have a more strict
> > sort order for the sake of sequential access. Further, d09dbeb9b built
> > upon that to remove wasteful binary search when inserting into the
> > page. Looking at some of the numbers in the linked threads, I wonder
> > if all test environments were actually hitting the sort path at all,
> > since you'd have to exceed m_w_m or s_b to take advantage. Unless I'm
> > missing something, it seems like we should just sort unconditionally.
> > That would be a nice simplification, and might speed up index builds
> > even when there's plenty of memory. (If I am in fact missing
> > something, maybe comments need updating)
> >
>
> +1. It seems worth pursuing this. We can establish the benefits by
> taking some performance data.

I'll investigate this.

> > Now that I'm looking, I'm also wondering how hard it would be to have
> > datum1 contain both the bucket (high bits) and hash (lower bits),
> > since we can now count on Datums being 8 bytes on all platforms. It
> > might be harder in turn to hack things so that the appropriate sort
> > specialization could be applied (it'd need a fake sortKey at least),
> > but that would be a possible future project.
> >
>
> Yeah that also sounds worth exploring but what benefit are you
> expecting out of it?

In this case, anything that benefits unsigned integer sorts would
automatically benefit building hash indexes. It seems more invasive to
get there though, and I'm not planning on looking at this in the short
term. In the thread that lead to the two commits mentioned above

https://www.postgresql.org/message-id/CANbhV-FG-1ZNMBuwhUF7AxxJz3u5137dYL-o6hchK1V_dMw86g%40mail.gmail.com

...Simon mentioned currently "...the Tuplesortstate fixes the size of
max_buckets
at tuplesort_begin() time rather than tuplesort_performsort(), forcing
us to estimate the number of tuples ahead of time rather than using
the exact number. Next trick would be to alter the APIs to allow exact
values to be used for sorting, which would allow page at a time
builds."

I'm not quite sure of the details here, but if this presents an
additional benefit from a different API, that could make it more
worthwhile.

--
John Naylor
Amazon Web Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: HASH INDEX builds seems confused

HASH INDEX builds seems confused

Re: HASH INDEX builds seems confused

Re: HASH INDEX builds seems confused