Re: new correlation metric

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: new correlation metric
Дата
Msg-id 4B694644-E4C2-4F82-B661-9F0CEEAAC5F6@enterprisedb.com
обсуждение исходный текст
Ответ на Re: new correlation metric  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I haven't look at the patch yet -- I'm actually on a train now. I'm  
sorry if these questions are answered in the patch.

I think there are three questions here:
A) what metric are we going to use
B) how do we estimate thy metric given a sample
C) how do we draw the conclusions we need based on this metric

I looked recently on this topic and I found papers on estimating two  
metrics based on samples: longest sorted subsequence and number of  
sorted subsequence. The latter of which sounds like it might be  
equivalent to what you have.

I didn't did any papers on drawing conclusions from either of these  
metrics though.

I think we can just look at block number to avoid intra-block disorder  
confusing us. It does mean we have effectively a much smaller sample  
though.

I don't think the other objection is worth worrying about though. If  
our sample has 15263748 then it may be that readahead will save us in  
that particular instance but it's clear the table isn't well clustered  
and it's just luck that those blocks were intermixed. In a very  
particular way. The rest of the table is unlikely to be so lucky.


greg

On 26 Oct 2008, at 03:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote:
>>> I worked with Nathan Boley to come up with what we think is a better
>>> metric for measuring this cost.
>
>> I think the code is in the right direction, but I think want you want
>> is some kind of estimate of "given I've looked for tuple X, how many
>> tuples in the next k pages are near this one".
>
> ISTM that some experimental studies would be required to justify any
> proposal in this area.
>
> Having said that ... one of the things I never liked about the  
> existing
> code is that it pays no attention to block-boundary effects.  It  
> doesn't
> matter to an indexscan how badly tuples within a single block are
> misordered --- what matters is how many block reads you have to do.
> So I think that any changes here ought to include measuring the
> correlation on the basis of block numbers not tuple numbers.  But  
> what's
> not too clear to me is whether our sampling methods would mess that  
> up.
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: new correlation metric
Следующее
От: Tom Lane
Дата:
Сообщение: Re: new correlation metric