GSoC 2015 proposal. Bitmap Index-only Count

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема GSoC 2015 proposal. Bitmap Index-only Count
Дата
Msg-id CAP4vRV7xdUE7Eq7ZFgoWSC6aXmH9AM3qeSQ139dzZhpCcQGr3A@mail.gmail.com
обсуждение исходный текст
Ответы Re: GSoC 2015 proposal. Bitmap Index-only Count  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi, hackers!
Here is the text of my proposal which I've applied to GSoC.
Any suggestions and comments are welcome.


Project name

Bitmap Index-only Count

 

Brief review

There is a problem of slow counting in PostgreSQL [1]. The reason why this is slow is related to the MVCC implementation in PostgreSQL. Index-only scans (implemented since PostgreSQL-9.2) providing some performance improvements where the visibility map of the table allows it. That’s good. But it works only for access methods which provide amgettuple method. Unfortunately GIN supports only BitmapIndexScan and has no implementation of index_getnext() interface [2]. Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap.

 

Benefits to the PostgreSQL Community

Faster count(*) would be actual for GIN. Probably it would accelerate count(*) for other access methods too, but I do not sure if it would be significant difference.

 

Quantifiable results

Acceleration of count(*) queries with WHERE clause which use GIN.

 

Project details

Let’s look at count(*) query:

EXPLAIN  SELECT count(*) from tbl_mytbl where val @> '{"b":2}';

 

Now the query plan looks like this:

Aggregate

   ->  Bitmap Heap Scan on tbl_mytbl 

         Recheck Cond: (val @> '{"b": 2}'::jsonb)

         ->  Bitmap Index Scan on idx_mytbl 

               Index Cond: (val @> '{"b": 2}'::jsonb)

 

Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap.

Conditions:

  •  all tuples are visible (the same problem as for Index-only count(*));
  • result TIDBitmap is not lossy. nchunks = 0;

int nchunks; /* number of lossy entries in pagetable */

  • pages in TIDBitmap don’t require recheck

* recheck is used only on exact pages --- it indicates that although

* only the stated tuples need be checked, the full index qual condition

* must be checked for each (ie, these are candidate matches).

 

If all conditions are true, it’s possible to call aggregate COUNT function for each tuple from TIDBitmap returned by Bitmap Index Scan (or BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap Scan.

 

 

As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which would catch tuples from Bitmap Index Scan node and pass them to Aggregate node. Thus, new query plan will be as follow:

 

Aggregate

   ->  Bitmap Index-only Count

         ->  Bitmap Index Scan on idx_mytbl 

               Index Cond: (val @> '{"b": 2}'::jsonb)

 

 

Project Schedule

until May 25

Read documentation and source code, clarify details of implementation.

until June 30

Implement Bitmap Index-only Count node.

1 July – 31 July

Add Bitmap Index-only Count node to Planner.

1 August -15 August

Final refactoring, testing and submitting a patch.

 

Links

  1.  https://wiki.postgresql.org/wiki/Slow_Counting
  2. http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html 


--
Best regards,
Lubennikova Anastasia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thom Brown
Дата:
Сообщение: Re: Error with index on unlogged table
Следующее
От: Thom Brown
Дата:
Сообщение: Re: Error with index on unlogged table