Re: Double sorting split patch

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Double sorting split patch
Дата
Msg-id CAPpHfds333E_V28nRkDY2CBSDC+TDHykZ1HBdPpYkGAjb9w6dg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Double sorting split patch  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Double sorting split patch
Список pgsql-hackers
On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Can you elaborate the consider-split algorithm? The criteria to select the new split over the previously selected one is this:
!               /*
!                * If ratio is acceptable, we should compare current split with
!                * previously selected one. If no split was selected then we select
!                * current anyway. Between splits of one dimension we search for
!                * minimal overlap (allowing negative values) and minimal ration
!                * (between same overlaps. We switch dimension if find less overlap
!                * (non-negative) or less range with same overlap.
!                */
!               range = diminfo->upper - diminfo->lower;
!               overlap = ((leftUpper) - (rightLower)) / range;
!               if (context->first ||
!                       (context->dim == dimNum &&
!                        (overlap < context->overlap ||
!                         (overlap == context->overlap && ratio > context->ratio))) ||
!                       (context->dim != dimNum &&
!                        ((range > context->range &&
!                          non_negative(overlap) <= non_negative(context->overlap)) ||
!                         non_negative(overlap) < non_negative(context->overlap)))
!                       )
!               {

Why are negative overlaps handled differently across dimensions and within the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper doesn't seem to make that distinction.

Yes, I've changed this behaviour after experiments on real-life datasets. On the datasets where data don't overlap themselfs (points are always such datasets), non-overlapping splits are frequently possible. In this case splits tends to be along one dimension, because most distant non-overlapping splits (i.e. having lowest negative overlap) appears to be in the same dimension as previous split. Therefore MBRs appear to be very prolonged along another dimension, that's bad for search. In order to prevent this behavour I've made transition to another dimension by non-nagative part of overlap and range. Using range as the split criteria makes MBRs more quadratic, and using non-negative part of overlap as priority criteria give to range chance to matter.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alex Hunsaker
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Следующее
От: Amit Khandekar
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.