Re: benchmarking the query planner

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: benchmarking the query planner
Дата
Msg-id 603c8f070812111904j4b839c17sabde937e3d4cf08a@mail.gmail.com
обсуждение исходный текст
Ответ на Re: benchmarking the query planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: benchmarking the query planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
>> OK, I'll bite.  How do we decide where to put the cutoff?  If we make
>> it too high, it will have a negative effect on join selectivity
>> estimates; if it's too low, it won't really address the problem we're
>> trying to fix.  I randomly propose p = 0.001, which should limit
>> eqjoinsel() to about a million equality tests in the worst case.  In
>> the synthetic example we were just benchmarking, that causes the
>> entire MCV array to be tossed out the window (which feels about
>> right).
>
> Yeah.  One idle thought I had was that maybe the cutoff needs to
> consider both probabilities: if the high-frequency MCVs on one side
> chance to match to not-so-high-frequency MCVs on the other, you
> would like to know about that.  As long as we keep the lists in
> frequency order, it seems easy to implement this: for each MCV
> examined by the outer loop, you run the inner loop until the product of
> the outer and current inner frequency drops below whatever your
> threshold is.  This doesn't immediately suggest what the threshold

I had this idle thought too, but I didn't write it down because...

> ought to be, but it seems like it ought to be possible to determine
> that given a desired maximum error in the overall estimate.  I'm also
> not very clear on what the "total frequency" computations (matchfreq2
> and unmatchfreq2 in the current code) ought to look like if we are using
> a variable subset of the inner list.

...of this exact concern, which I think is an insurmountable problem.
If you don't consider some of the MCVs AT ALL, I think you can add
their frequencies back in to otherfreq{1,2} and go home, but if you
consider them for some elements of the other list but not all, I'm not
sure there's an appropriate way to proceed.

...Robert


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Nathan Boley"
Дата:
Сообщение: Re: benchmarking the query planner
Следующее
От: "Robert Haas"
Дата:
Сообщение: Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)