GSoC proposal. Index-only scans for GIST

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема GSoC proposal. Index-only scans for GIST
Дата
Msg-id CAP4vRV4CKk-zG-askGMjKhFmmufwXiV2-px3RHC32XDCekaEOQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: GSoC proposal. Index-only scans for GIST  (Robert Haas <robertmhaas@gmail.com>)
Re: GSoC proposal. Index-only scans for GIST  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Hello!
Here is the text of my proposal which I've applied to GSoC.
Any suggestions and comments are welcome.

Project name

Support for index-only scans for GIST index

 

Brief review

Currently GiST index don’t have index-only scan functionality. Index-only scans are a major performance feature added to Postgres 9.2. They allow certain types of queries to be satisfied just by retrieving data from indexes, and not from tables. This feature for B-trees (implemented since PostgreSQL-9.2) allows doing certain types of queries significantly faster. This is achieved by reduction in the amount of I/O necessary to satisfy queries. I think it will be useful to implement this feature for user defined data types that use GiST index.

 

Benefits to the PostgreSQL Community

Faster GiST index search would be actual for many PostgreSQL applications (for example some GIS systems).


Quantifiable results

Acceleration of GiST index search.


Project details

1. The GiST is a balanced tree structure like a B-tree, containing <key, pointer> pairs. GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries.

There are seven methods that an index operator class for GiST must provide, and an eighth that is optional. 

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I’m going to create new optional method fetch() in addition to existing. Thus user can create a method of retrieving data from the index without loss. It will be used when performing search queries to speed data retrieving.

 

2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() – does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or directly to output areas)

plain scan

Heap tuple TIDs are returned into so->pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can retrieve the data directly to output areas at this stage, without reference to the heap.


3. Create function gistcanreturn to check whether fetch() is specified by user.

Amcanreturn – Function to check whether index supports index-only scans, or zero if none

There is the question of support index-only scans for multicolumn indexes. Probably it would require extend the interface to add separate columns checking.

To solve this question I need the community’s help.


4. Add details for index only scans into gistcostestimate function

 

 Links

1)     Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search trees for database systems. – September, 1995.

2)     http://www.sai.msu.su/~megera/postgres/gist/

3)     PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST Indexes.

-- 

Best regards,
Lubennikova Anastasia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Is this a bug?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_archivecleanup bug