Re: proposal for smaller indexes on index-ordered tables

Поиск
Список
Период
Сортировка
От Jeffrey Baker
Тема Re: proposal for smaller indexes on index-ordered tables
Дата
Msg-id fd145f7d0806241408he74dd04i38f63f857f5d4adf@mail.gmail.com
обсуждение исходный текст
Ответ на Re: proposal for smaller indexes on index-ordered tables  (Zoltan Boszormenyi <zb@cybertec.at>)
Ответы Re: proposal for smaller indexes on index-ordered tables
Re: proposal for smaller indexes on index-ordered tables
Список pgsql-hackers
On Tue, Jun 24, 2008 at 1:59 PM, Zoltan Boszormenyi <zb@cybertec.at> wrote:
Jeffrey Baker írta:
> The way I read it, the current btree index stores the index value and
> the TID of every tuple having that value.  When you have a table with
> three columns, you index one of them and you get an index which is
> practically as large as the table itself.
>
> Supposing the table is generally or strictly ordered by the column to
> be indexed, it would be more compact if the index stored ranges of
> tuples.  Instead of storing the TID of every tuple with that value,
> the index would store a first and last TID, between which all tuples
> have the value.
>
> Example: table with one million rows indexed on a column having one
> thousand distinct values.  Table is in-order by the indexed column.
> The traditional index would contain a million TIDs, whereas a range
> index would contain only two thousand.  The range index would be 500
> times smaller, more likely to be cached, etc.
>
> Thoughts?
>
> -jwb

Example with your theory:
One (not yet committed) transaction changes one tuple
that was in the middle of a range before but the tuple's indexed
column changed. What would you do?

Insert the new tuple at the end of the table and add another range to the index.  Leave the old tuple in place and don't touch the original index range.
 
You need to keep track of multiple index versions:
1. the range has to be split for the not-yet-committed modifier transaction,
   it might need to re-read the same table.
2. the old range has to be kept for reader transactions that still see
the old data

This is only true if you update the tuple in-place. 
 
Imagine you have thousands of UPDATEs in flight on different rows.

I'm quite aware of the problems of maintaining such a table and index, but the fact is that data warehouse type tables may never be updated after being created.  The particular application I'm struggling with does a SELECT ... INTO ... ORDER BY to make an ordered table for querying every night.  The problem is it takes longer, much longer, to create the index than to create the table, and in the end the index is as big as half the table anyway.

So this type of index would only be useful for an essentially read-only table.  I agree.

Quite another proposal would be to somehow instruct the database that the table is strictly in-order by a column and allow a binary search access method.  Then you don't need any index at all.

-jwb

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Zoltan Boszormenyi
Дата:
Сообщение: Re: proposal for smaller indexes on index-ordered tables
Следующее
От: "Stephen R. van den Berg"
Дата:
Сообщение: Re: Dept of ugly hacks: eliminating padding space in system indexes