Re: tuple radix sort
| От | John Naylor |
|---|---|
| Тема | Re: tuple radix sort |
| Дата | |
| Msg-id | CANWCAZaN8kW6o7Ymc_jzPO_Z5SqekQTTNra3xTGYTH+cjrVp8g@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: tuple radix sort (John Naylor <johncnaylorls@gmail.com>) |
| Список | pgsql-hackers |
On Mon, Nov 3, 2025 at 8:24 PM I wrote: > v1 was careful to restore isnull1 to false when diverting to quicksort > for the tiebreak. v2 doesn't bother, since the only tiebreak in core > that looks at isnull1 is comparetup_datum_tiebreak, which is not > reachable by radix sort, requiring a pass-by-value datum that compares > like an integer. This is a bit of a risk, since it's possible a third > party fork could be doing something weird. Seems unlikely, but > something to keep in mind. I decided I wasn't quite comfortable with the full normalized datum sharing space in SortTuple with isnull1. There's too much of a cognitive burden involved in deciding when we do or don't need to reset isnull1, and there's a non-zero risk of difficult-to-detect bugs. For v4 I've instead used one byte of padding space in SortTuple to store only the byte used for the current pass. That means we must compute the normalized datum on every pass. That's actually better than it sounds, since that one byte can now be used directly during the "deal" step, rather than having to extract the byte from the normalized datum by shifting and masking. That extraction step might add significant cycles in cases where a pass requires multiple iterations through the "deal" loop. It doesn't seem to make much difference in practice, performance-wise, even with the following pessimization: I had to scrap the qsort specialization on the normalized datum for small sorts, since it's no longer stored. It could still be worth it to compute the "next byte of the normalized datum" and perform a qsort on that (falling back to the comparator function in the usual way), but I haven't felt the need to resort to that yet. For v4, I just divert to qsort_tuple in non-assert builds, with a threshold of 40. > - I don't quite like how the NULL partitioning step looks, and it > could be costly when the distribution of NULL is not predictable, so > I'm thinking of turning part of that into a branch-free cyclic > permutation, similar to > https://www.postgresql.org/message-id/CANWCAZbAmaZ7P%2BARjS97sJLXsBB5CPZyzFgqNDiqe-L%2BBqXzug%40mail.gmail.com This is done. Even though the inner loop is mysterious at first glance, it's really quite elegant. I made an attempt at clean-up, but it's still under-commented. The common prefix detection has moved to a separate patch (v4-0004). I've been forcing all eligible sorts to use radix sort in assert builds, even when small enough that qsort would be faster. Since both qsort and in-place radix sort are unstable, it's expected that some regression tests need adjustment (v4-0002). One thing surprised me, however: The pg_upgrade TAP test that runs regression tests on the old cluster showed additional failures that I can't explain. I haven't seen this before, but it's possible I never ran TAP tests when testing new sort algorithms previously. This doesn't happen if you change the current insertion sort threshold, so I haven't been able to reproduce it aside from this patch. For that reason I can't completely rule out an actual bug, although I actually have more confidence in the verification of correct sort order in v4, since isnull1 now never changes, just as in master. I found that changing some tests to have additional sort keys seems to fix it (v4-0003). I did this in a rather quick and haphazard fashion. There's probably a longer conversation to be had about making test output more deterministic while still covering the intended executor paths. Aside from that, this seems like a good place to settle down, so I'm going to create a CF entry for this. I'll start more rigorous performance testing in the near future. -- John Naylor Amazon Web Services
Вложения
В списке pgsql-hackers по дате отправления: