Re: Grouped Index Tuples

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Grouped Index Tuples
Дата
Msg-id 457C666D.4050005@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Grouped Index Tuples  ("Jim C. Nasby" <jim@nasby.net>)
Ответы Re: Grouped Index Tuples  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Jim C. Nasby wrote:
> On Thu, Dec 07, 2006 at 10:30:11AM +0000, Heikki Linnakangas wrote:
>> I've cut a new version of the GIT patch I posted earlier, and collected
>> all my dispersed todo-lists, post-it notes, performance results,
>> supplementary patches etc. I had to a single web-page:
>>
>> http://community.enterprisedb.com/git/
>>
>> Perhaps the most interesting stuff apart from the patch itself is the 
>> performance results. I've run some CPU bound tests to measure the extra 
>> CPU overhead it causes. The CPU overhead is significant, the worst case 
>> being a select of a single row from a table with just one integer column.
>>
>> However, the I/O savings are also the greatest for that same test case, 
>> as the table grows and the test becomes I/O bound. I don't have the 
>> numbers now, but earlier runs showed that the duration of the test was 
>> roughly halved, which makes sense because the patch reduced the index 
>> size so that it fit in memory, reducing the number of physical I/Os 
>> required per select from 2 to 1.
>>
>> ISTM that if we want to enable GIT automatically, we need a way to 
>> either reduce the CPU overhead, or have a smart heuristic to tune the 
>> feature so that it's only enabled when it's beneficial.
> 
> The maintain_cluster_order patch is useful by itself, and handles an
> existing TODO regarding pulling pages out of WAL in a specified order to
> maintain clustering.

Pull pages out of WAL? That must be a typo...

> I think it'd be good to submit that patch
> separately. Even if we get HOT into the backend, the cluster patch would
> still be useful for cases where you sometimes have to update fields in a
> clustered index.

Yeah, I submitted it in August for the first time. The way it's written 
now is the most non-intrusive way I could think of: it just adds a new 
optional indexam method. That has a small performance drawback: When 
inserting, the B-tree needs to be descended twice, once for the 
amsuggestblock, and then second time in aminsert. It would make sense to 
keep the index page pinned to avoid the descend, but that requires API 
changes.

> On usage, ISTM it would be better to turn on GIT only for a clustered
> index and not the PK? I'm guessing your automatic case is intended for
> SERIAL PKs, but maybe it would be better to just make that explicit.

As the patch stands, GIT is enabled by default for clustered indexes. 
And also by default, a PK index is created as the clustered index for 
table, if it's a simple single column integer key. It's a bit arbitrary, 
but lacking a better heuristic, it's a reasonable guess that should 
enable the feature in the most common cases where it helps.

Yeah, I'm guessing that a table with a serial PK becomes naturally 
clustered by PK.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: Grouped Index Tuples
Следующее
От: Jeff Davis
Дата:
Сообщение: Synchronized Scan preliminary results