Re: Query about index usage

Поиск

Список

Период

Сортировка

От	Greg Smith
Тема	Re: Query about index usage
Дата	23 июня 2010 г. 03:06:03
Msg-id	4C21A438.9030903@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: Query about index usage (Jayadevan M <Jayadevan.Maymala@ibsplc.com>)
Ответы	Re: Query about index usage
Список	pgsql-performance

Дерево обсуждения

Jayadevan M wrote:
> It is mentioned that table data blocks have data about tuple visibility and hence table scans
> are always necessary. So how does PostgreSQL reduce the number of blocks
> to be read by using indexes?

To be useful, a query utilizing an index must be selective:  it must
only return a fraction of the possible rows in the table.  Scanning the
index will produce a list of blocks that contain the potentially visible
data, then only those data blocks will be retrieved and tested for
visibility.

Let's say you have a table that's 100 pages (pages are 8KB) and an index
that's 50 pages against it.  You run a query that only selects 5% of the
rows in the table, from a continuous section.  Very rough estimate, it
will look at 5% * 50 = 3 index pages.  Those will point to a matching
set of 5% * 100 = 5 data pages.  Now you've just found the right subset
of the data by only retrieving 8 random pages of data instead of 100.
With random_page_cost=4.0, that would give this plan a cost of around
32, while the sequential scan one would cost 100 * 1.0 (sequential
accesses) for a cost of around 100 (Both of them would also have some
smaller row processing cost added in there too).

It's actually a bit more complicated than that--the way indexes are
built means you can't just linearly estimate their usage, and scans of
non-contiguous sections are harder to model simply--but that should give
you an idea.  Only when using the index significantly narrows the number
of data pages expected will it be an improvement over ignoring the index
and just scanning the whole table.

If the expected use of the index was only 20% selective for another
query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data
pages, for a potential total of 30 random page lookups.  That could end
up costing 30 * 4.0 = 120, higher than the sequential scan.    Usually
the breakpoint for how much of a table has to be scanned before just
scanning the whole thing sequentially is considered cheaper happens near
20% of it, and you can shift it around by adjusting random_page_cost.
Make it lower, and you can end up preferring index scans even for 30 or
40% of a table.

> Do index data get updated as and when data is committed and made 'visible' or is it
> that index data get updated as soon as data is changed, before commit is
> issued and rollback of transaction results in a rollback of the index data

Index changes happen when the data goes into the table, including
situations where it might not be committed.  The index change doesn't
ever get deferred to commit time, like you can things like foreign key
checks.  When a transaction is rolled back, the aborted row eventually
gets marked as dead by vacuum, at which point any index records pointing
to it can also be cleaned up.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Query about index usage