Re: Query optimizer 8.0.1 (and 8.0)

Поиск
Список
Период
Сортировка
От pgsql@mohawksoft.com
Тема Re: Query optimizer 8.0.1 (and 8.0)
Дата
Msg-id 16625.24.91.171.78.1107873206.squirrel@mail.mohawksoft.com
обсуждение исходный текст
Ответ на Re: Query optimizer 8.0.1 (and 8.0)  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
> pgsql@mohawksoft.com writes:
>
>> The basic problem with a fixed sample is that is assumes a normal
>> distribution.
>
> That's sort of true, but not in the way you think it is.
>
[snip]

Greg, I think you have an excellent ability to articulate stats, but I
think that the view that this is like election polling is incorrect.

Election polling assumes a very simple outcome: Some standard ditribution
of a limited number options. I don't think it applies to this.

>
>> When you look at a sine wave on an oscilloscope, you can see it clear as
>> day. When you look at music on the scope, you know there are many waves
>> there, but it is difficult to make heads or tails of it. (use xmms or
>> winamp to see for yourself) The waves change in frequency, amplitude,
>> and
>> duration over a very large scale. That's why you use a spectrum analyzer
>> to go from time domain to frequency domain. In frequency domain, you can
>> see the trends better.
>
> That's not a bad analogy to many problems where you're measuring data that
> has
> non-randomness in it but that are not visible in the domain that the
> statistics that are being analyzed. This seems to happen a lot with
> geographic
> data, for instance.

EXACTLY!!!

>
> If you find that increasing the stats targets improves things then this
> isn't true. If you find that it doesn't then what's really needed is a
> cleverer set of statistics to look for.

I will be the first one to say that increasing the samples is not perfect,
but it is a methodology that will help without major changes in postgres.
Simply increasing the samples to a percentage of the estimated number of
rows (with some upper and lower limits of course) will increase the
accuracy of the "n_distinct" and "correlation" settings (at least a little
bit), and that will make a huge impact with very little work.

If we want to discuss improved statatistics, then we should include a
standard deviation and a sliding window deviation, or something like that.
Hell, maybe even FFT.

The basic problem, I think, is that the sampling mechanism is more like an
oscilloscope looking for large trends instead of a spectrum analyzer
looking for the smaller ones.

We have to be able to tell the planner that adjacent values are less
random even though, as a whole, they are seemingly random.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim Buttafuoco"
Дата:
Сообщение: Fw: Re: float4 regression test failed on linux parisc
Следующее
От: Tom Lane
Дата:
Сообщение: Re: float4 regression test failed on linux parisc