Re: The need for clustered indexes to boost TPC-V performance

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: The need for clustered indexes to boost TPC-V performance
Дата
Msg-id CAGTBQpY4J2on5=gN5-ykSpZkxVeGHAfiZwu0mXpqsSXwoxxnPg@mail.gmail.com
обсуждение исходный текст
Ответ на The need for clustered indexes to boost TPC-V performance  (Reza Taheri <rtaheri@vmware.com>)
Список pgsql-performance
On Tue, Jul 3, 2012 at 8:13 PM, Reza Taheri <rtaheri@vmware.com> wrote:
> So I looked more closely at the indexes. I chose the CASH_TRANSACTION
> table since it has a single index, and we can compare it more directly to the
> Dell data. If you look at page 34 of http://bit.ly/QeWXhE, the index size of CT
> is 1,278,720KB for 6,120,529,488 rows. That’s less than one byte of index
> per data row!  How could that be?  Well, MS SQL used a “clustered index”
> for CT, i.e., the data is held in the leaf pages of the index B-Tree.
> The data and index are in one data structure. Once you lookup the index,
> you also have the data at zero additional cost. For PGSQL, we had to create
> a regular index, which took up 55GB. Once you do the math, this works out
> to around 30 bytes per row. I imagine we have the 15-byte key along with a
> couple of 4-byte or 8-byte pointers.
...
> So MS SQL beats PGSQL by a) having a lower I/O rate due to no competition
> for the buffer pool from indexes (except for secondary indexes); and b) by
> getting the data with a free lookup, whereas we have to work our way down
> both the index and the data trees.

15-byte key?

What about not storing the keys, but a hash, for leaf nodes?

Assuming it can be made to work for both "range" and "equality" scans,
holding only hashes on leaf nodes would reduce index size, but how
much?

I think it's doable, and I could come up with a spec if it's worth it.
It would have to scan the heap for only two extra index pages (the
extremes that cannot be ruled out) and hash collisions, which doesn't
seem like a big loss versus the reduced index.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Daniel Farina
Дата:
Сообщение: Re: The need for clustered indexes to boost TPC-V performance
Следующее
От: Reza Taheri
Дата:
Сообщение: Re: Introducing the TPC-V benchmark, and its relationship to PostgreSQL