Re: Improving count(*)

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Improving count(*)
Дата
Msg-id 878xvmr3t8.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Improving count(*)  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-hackers
I think the important thing to keep track of is a single bit: 
 Which the following apply?
 a) all of the tuples in this block are visible
 b) at least one tuple in this block is in-doubt and may need to be vacuumed

That isn't enough to calculate count(*) on its own but it means you could scan
an index like any other database and avoid checking every single tuple. If the
tuple lies in a block that is known not to contain any in-doubt records then
the tuple can be counted immediately.

This has the advantage that it helps with a lot more cases than a simple
"select count(*) from tab". As Tom pointed out that case can be tackled more
directly with a O(1) solution anyways. More complex cases are where fast index
scans are really important.

So you could do "SELECT count(*) FROM tab WHERE a > ?" and have it scan an
index on <a> without having to check the visibility of every single tuple. It
only has to check the visibility of tuples that lie on blocks that contain at
least one in-doubt tuple.

You could even imagine using the new bitmap index scan machinery to combine
these bits of information too.

And this is exactly the same information that vacuum needs. Once vacuum has
run and cleaned out a block it knows whether there are any records that are
still in-doubt or whether every record it left is universally visible. It can
note that and allow future vacuums to skip that block if no deletes or inserts
have changed that bit since.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Improving count(*)
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Bug in predicate indexes?