Re: Index/Function organized table layout

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: Index/Function organized table layout
Дата
Msg-id 1065121751.2535.18.camel@fuji.krosing.net
обсуждение исходный текст
Ответ на Re: Index/Function organized table layout  (James Rogers <jamesr@best.com>)
Ответы Re: Index/Function organized table layout  (James Rogers <jamesr@best.com>)
Re: Index/Function organized table layout  (Alvaro Herrera Munoz <alvherre@dcc.uchile.cl>)
Список pgsql-hackers
James Rogers kirjutas N, 02.10.2003 kell 20:50:

> To give a real world example, a standard query on one of our tables that
> has not been CLUSTER-ed recently (i.e. within the last several days)
> generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
> generates ~0 cache misses on average.  Needless to say, one is *much*
> faster than the other. 

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

> In my case, not only does CLUSTER-ing increase the number of concurrent
> queries possible without disk thrashing by an integer factor, but the
> number of buffers touched on a query that generates a cache misses is
> greatly reduced as well.  The problem is that CLUSTER-ing is costly and
> index-organizing some of the tables would reduce the buffer needs, since
> the index tuple in these cases are almost as large as the heap tuples
> they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs 
to touch places:

1. selecting where new tuples go : 
 * updated ones go near old ones if not clustered and near the place   CLUSTER would place them if clustered. 
 * inserted ones go to the less than half-empty pages if not clustered   and near the place CLUSTER would place them if
clustered.
 

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
in pages for clustered updates/inserts.

the "half" above could of course mean anything from 10% to 95% depending
on access patterns.

---------------------
Hannu



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruno Wolff III
Дата:
Сообщение: Re: Thoughts on maintaining 7.3
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Large block size problems and notes...