Re: TODO item: adding VERBOSE option to CLUSTER [with patch]

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: TODO item: adding VERBOSE option to CLUSTER [with patch]
Дата
Msg-id 87skr09sgo.fsf@oxford.xeocode.com
обсуждение исходный текст
Ответ на Re: TODO item: adding VERBOSE option to CLUSTER [with patch]  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: TODO item: adding VERBOSE option to CLUSTER [with patch]  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

> Jim Cox wrote:
>> On Mon, Oct 13, 2008 at 8:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>>
>>> It'd be possible to count the number of order reversals during the
>>> indexscan, ie the number of tuples with CTID lower than the previous
>>> one's.  But I'm not sure how useful that number really is.  

Incidentally it finally occurred to me that "sortedness" is actually a pretty
good term to search on. I found several papers for estimating metrics of
sortedness from samples even. Though the best looks like it requires a sample
of size O(sqrt(n)) which is more than we currently take.

The two metrics which seem popular is either the length of the longest
subsequence which is sorted or the number of sorted subsequences. I think the
latter is equivalent to counting the inversions.

I didn't find any papers which claimed to present good ways to draw
conclusions based on these metrics but I only did a quick search. I imagine if
everyone is looking for ways to estimate them they they must be useful for
something...

For some reason my access to the ACM digital library stopped working. Does
anyone else have access?


> It will look bad for patterns like:
> 2
> 1
> 4
> 3
> 6
> 5
> ..

Hm, you could include some measure of how far the inversion goes -- but I
think that's counter-productive. Sure some of them will be cached but others
won't and that'll be equally bad regardless of how far back it goes.

> Until we have a better metric for "sortedness", my earlier suggestion to print
> it was probably a bad idea. If anything, should probably print the same
> correlation metric that ANALYZE calculates, so that it would at least match
> what the planner uses for decision-making.

I agree with that. I like the idea of printing a message though -- we should
just have it print the correlation for now and when we improve the stats we'll
print the new metric.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joshua Drake
Дата:
Сообщение: Well its official, replicator is BSD
Следующее
От: Tom Lane
Дата:
Сообщение: Re: There's some sort of race condition with the new FSM stuff