Re: PageRepairFragmentation performance

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: PageRepairFragmentation performance
Дата
Msg-id CAM3SWZSpHtZXjQ3eSYfeFg0d9tHRuAtQ7nR2uqJuT1_cKphOAA@mail.gmail.com
обсуждение исходный текст
Ответ на PageRepairFragmentation performance  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: PageRepairFragmentation performance  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Tue, Nov 18, 2014 at 10:03 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> So there's clearly some room for improvement here. A couple of ideas:
>
> 1. Replace the qsort with something cheaper. The itemid arrays being sorted
> are small, a few hundred item at most, usually even smaller. In this pgbench
> test case I used, the typical size is about 60. With a small array a plain
> insertion sort is cheaper than the generic qsort(), because it can avoid the
> function overhead etc. involved with generic qsort. Or we could use
> something smarter, like a radix sort, knowing that we're sorting small
> integers. Or we could implement an inlineable version of qsort and use that.

> I spent some time hacking approach 1, and replaced the qsort() call with a
> bucket sort. I'm not sure if a bucket sort is optimal, or better than a
> specialized quicksort implementation, but it seemed simple.

Nice result.

Most quicksort implementations, including our own, switch to insertion
sort if the number of elements to be sorted is low enough (It occurs
when n less than 7, in our case). Also, specializing qsort() with
integers in isolation showed about a 3x improvement last I checked,
which was a few years ago now, so cache misses (reductions in which we
can gain through compile-time specialization) are likely by far the
dominant consideration here.

pgbench is probably a very conservative way of estimating how big the
array can get, given the structure of those tables. Even 60 seems high
if we're looking for some kind of rough idea of an average.

How did you arrive at this value? It seems rather high:

+#define N_SORT_BUCKETS 32

I think you're not sufficiently clear on why you're using bucket sort
rather than some other alternative. Maybe it's so obvious to you that
you forgot. I *think* it's because it's reasonable to suppose that for
this particular use-case, you know that in general there'll be a more
or less even distribution between the buckets. Right? That seems to
make sense to me. All the work that qsort() does to make sure it has a
good pivot at each level of recursion may be wasted here.

The refactoring patch certainly looks very reasonable.
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: parallel mode and parallel contexts
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Overhauling our interrupt handling (was Escaping from blocked send() reprised.)