Re: HASH INDEX builds seems confused
| От | Amit Kapila |
|---|---|
| Тема | Re: HASH INDEX builds seems confused |
| Дата | |
| Msg-id | CAA4eK1Jb9mxYfV2eThEkAv1r0bvR+Sd8Z7e15AfbC1hFWn3W0g@mail.gmail.com обсуждение исходный текст |
| Ответ на | HASH INDEX builds seems confused (John Naylor <johncnaylorls@gmail.com>) |
| Ответы |
Re: HASH INDEX builds seems confused
|
| Список | pgsql-hackers |
On Tue, Nov 11, 2025 at 11:27 AM John Naylor <johncnaylorls@gmail.com> wrote: > > hashbuild() says: > > * If we just insert the tuples into the index in scan order, then > * (assuming their hash codes are pretty random) there will be no locality > * of access to the index, and if the index is bigger than available RAM > * then we'll thrash horribly. To prevent that scenario, we can sort the > * tuples by (expected) bucket number. However, such a sort is useless > * overhead when the index does fit in RAM. We choose to sort if the > * initial index size exceeds maintenance_work_mem, or the number of > * buffers usable for the index, whichever is less. (Limiting by the > > However, since commit e09d7a126 it's harder to believe sorts are ever > useless, since we then decided that sorts should have a more strict > sort order for the sake of sequential access. Further, d09dbeb9b built > upon that to remove wasteful binary search when inserting into the > page. Looking at some of the numbers in the linked threads, I wonder > if all test environments were actually hitting the sort path at all, > since you'd have to exceed m_w_m or s_b to take advantage. Unless I'm > missing something, it seems like we should just sort unconditionally. > That would be a nice simplification, and might speed up index builds > even when there's plenty of memory. (If I am in fact missing > something, maybe comments need updating) > +1. It seems worth pursuing this. We can establish the benefits by taking some performance data. > Now that I'm looking, I'm also wondering how hard it would be to have > datum1 contain both the bucket (high bits) and hash (lower bits), > since we can now count on Datums being 8 bytes on all platforms. It > might be harder in turn to hack things so that the appropriate sort > specialization could be applied (it'd need a fake sortKey at least), > but that would be a possible future project. > Yeah that also sounds worth exploring but what benefit are you expecting out of it? -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: