Re: Cross-column statistics revisited

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Cross-column statistics revisited
Дата
Msg-id B71B9E9E-3F8D-48B2-9D99-A342AB043322@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Cross-column statistics revisited  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
[sorry for top osting - dam phone]

It's pretty straightforward to to a chi-squared test on all the pairs.  
But that tells you that the product is more likely to be wrong. It  
doesn't tell you whether it's going to be too high or too low...

greg

On 16 Oct 2008, at 07:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> I think you need to go a step back: how are you going to use this  
>> data?
>
> The fundamental issue as the planner sees it is not having to assume
> independence of WHERE clauses.  For instance, given
>
>    WHERE a < 5 AND b > 10
>
> our current approach is to estimate the fraction of rows with a < 5
> (using stats for a), likewise estimate the fraction with b > 10
> (using stats for b), and then multiply these fractions together.
> This is correct if a and b are independent, but can be very bad if
> they aren't.  So if we had joint statistics on a and b, we'd want to
> somehow match that up to clauses for a and b and properly derive
> the joint probability.
>
> (I'm not certain of how to do that efficiently, even if we had the
> right stats :-()
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Cross-column statistics revisited
Следующее
От: "Robert Haas"
Дата:
Сообщение: Re: Cross-column statistics revisited