Re: Including Snapshot Info with Indexes

Поиск
Список
Период
Сортировка
От Gokulakannan Somasundaram
Тема Re: Including Snapshot Info with Indexes
Дата
Msg-id 9362e74e0710080301n2583ab33hb228d0fb78d812e9@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Including Snapshot Info with Indexes  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Ответы Re: Including Snapshot Info with Indexes  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Список pgsql-hackers


On 10/8/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Gokulakannan Somasundaram wrote:
>     Currently The index implementation in Postgresql does not store the
> Snapshot information in the Index. If we add the snapshot information into
> the indexing structure, we will have the following advantages.

This idea has been discussed to death many times before. Please search
the archives.

> a) There can be index only scans like Oracle

IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

Index only scans means that  in order to get certain results, we may not goto the table at all. For example, if you have an index on columns a and b, and if there is a query like "select b from table where a between a1 and a2", then the explain plan need not goto the table. I can't understand how dead space map will provide such a functionality. In short each index will act like an Index Organized Table, if the all the columns of the query are present in the index.

> b) Unique indexes will become less costly, as older index tuples can be
> found out.

Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.

Its not a big benefit. agreed.

> c) Even the index scans will get faster, since some of the index tuples
> won't translate into HeapScans.

That's the same as doing an index-only-scan, right?

No here if you have an index on a(say).  If there is a query like select * form table where a between a1 and a2, currently the scan goes to the table to verify the visibility. Of course if the tuple satisfies vacuum, then it is marked in the index, which is an optimization. This is not index-only scan. This is a normal index scan, which can skip certain random I/Os.

> d) Deletes and Updates will become slightly costly, as they have to update
> these indexes.

I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.

That's true. But i am not asking to replace the current index implementation, but to provide an extra option while indexing. Say if a particular database setup doesn't do much deletes and updates(imagine tables with partitioning, where the partitions/tables are dropped instead of deletes. They can have an option to "create index .. with snapshot"

Imagine the Index Vacuum also will do lesser Random I/Os

Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.

I am thinking of certain optimizations here.  we  have a bit unused in indextuple structure.  If a particular tuple is not deleted, then we can signify that using that bit and save 6 bytes of saving the xmax and cmax. We are trading of this space efficiency in place of Random I/Os, which is not a bad trade-off , i suppose. Again this is going to optional for the user. If users have an option to create Bitmap index/ Binary index, why can't they have this option as well?

There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.


I think Vacuum need not goto the table, as the visibility information is present in the index itself.  I don't know whether i have given the correct answer here.

Expecting your reply..


Thanks,
Gokul.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Heikki Linnakangas"
Дата:
Сообщение: Re: Improving the Performance of Full Table Updates
Следующее
От: "Gokulakannan Somasundaram"
Дата:
Сообщение: Re: Including Snapshot Info with Indexes