On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sokolov Yura <funny.falcon@postgrespro.ru> writes:
>> [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ]
>
> I started to review this patch. I spent a fair amount of time on
> beautifying the code, because I found it rather ugly and drastically
> undercommented. Once I had it to the point where it seemed readable,
> I went to check the shellsort algorithm against Wikipedia's entry,
> and found that this appears to be an incorrect implementation of
> shellsort: where pg_shell_sort_pass has
>
> for (_i = off; _i < _n; _i += off) \
>
> it seems to me that we need to have
>
> for (_i = off; _i < _n; _i += 1) \
>
> or maybe just _i++. As-is, this isn't h-sorting the whole file,
> but just the subset of entries that have multiple-of-h indexes
> (ie, the first of the h distinct subfiles that should get sorted).
> The bug is masked by the final pass of plain insertion sort, but
> we are not getting the benefit we should get from the earlier passes.
>
> However, I'm a bit dubious that it's worth fixing that; instead
> my inclination would be to rip out the shellsort implementation
> entirely. The code is only using it for the nitems <= 48 case
> (which makes the first three offset steps certainly no-ops) and
> I am really unconvinced that it's worth expending the code space
> for a shellsort rather than plain insertion sort in that case,
> especially when we have good reason to think that the input data
> is nearly sorted.
I actually noticed that and benchmarked some variants. Neither
made any noticeable difference in performance, so I decided not
to complain about them.
I guess the same case can be made for removing the shell sort.
So I'm inclined to agree.
> BTW, the originally given test case shows no measurable improvement
> on my box.
I did manage to reproduce the original test and got a consistent improvement.
> I was eventually able to convince myself by profiling
> that the patch makes us spend less time in compactify_tuples, but
> this test case isn't a very convincing one.
>
> So, quite aside from the bug, I'm not excited about committing the
> attached as-is. I think we should remove pg_shell_sort and just
> use pg_insertion_sort. If somebody can show a test case that
> provides a measurable speed improvement from the extra code,
> I could be persuaded to reconsider.
My tests modifying the shell sort didn't produce any measurable
difference, but I didn't test removing it altogether.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers