Re: Statistics and selectivity estimation for ranges

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Statistics and selectivity estimation for ranges
Дата
Msg-id CAPpHfdt92Fh5Jbovn-EXM_7Yv4Mhr6CvaVo=37jE85h1G2RGkA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics and selectivity estimation for ranges  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: Statistics and selectivity estimation for ranges  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Wed, Feb 13, 2013 at 5:55 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 04.01.2013 10:42, Alexander Korotkov wrote:
/*
 * Calculate selectivity of "&&" operator using histograms of range lower bounds
 * and histogram of range lengths.
 */
static double
calc_hist_selectivity_overlap(TypeCacheEntry *typcache, RangeBound *lower,
                                        RangeBound *upper, RangeBound *hist_lower, int hist_nvalues,
                                                        Datum *length_hist_values, int length_hist_nvalues)

We already have code to estimate &&, based on the lower and upper bound histograms:

                case OID_RANGE_OVERLAP_OP:
                case OID_RANGE_CONTAINS_ELEM_OP:
                        /*
                         * A && B <=> NOT (A << B OR A >> B).
                         *
                         * "range @> elem" is equivalent to "range && [elem,elem]". The
                         * caller already constructed the singular range from the element
                         * constant, so just treat it the same as &&.
                         */
                        hist_selec =
                                calc_hist_selectivity_scalar(typcache, &const_lower, hist_upper,
                                                                                         nhist, false);
                        hist_selec +=
                                (1.0 - calc_hist_selectivity_scalar(typcache, &const_upper, hist_lower,
                                                                                                  nhist, true));
                        hist_selec = 1.0 - hist_selec;
                        break;

I don't think the method based on lower bound and length histograms is any better. In fact, my gut feeling is that it's less accurate. I'd suggest dropping that part of the patch.

Right. This estimation has an accuracy of histogram, while estimation based on lower bound and length histograms rely on additional assumption about independence of lower bound and length histogram. We can sum A << B and A >> B probabilities because they are mutually exclusive. It's pretty evident but I would like to mention it in the comments, because typical assumption about events in statistics calculation is their independence.
 
These changes were made in attached patch.

------
With best regards,
Alexander Korotkov.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: Materialized views WIP patch
Следующее
От: Ants Aasma
Дата:
Сообщение: Re: Materialized views WIP patch