Обсуждение: We need index-only scans

Поиск
Список
Период
Сортировка

We need index-only scans

От
Bruce Momjian
Дата:
We last researched index-only scans, also called covering indexes, in
September of 2008, but have made little progress on it since.  Many have
been waiting for Heikki to implement this but I talked to him and he
doesn't have time.  

I believe it is time for the community to move forward and I would like
to assemble a team to work on this feature.  We might not be able to
implement it for Postgres 9.1, but hopefully we can make some progress
on this.

I have created a wiki page for this TODO item:
http://wiki.postgresql.org/wiki/Index-only_scans

I am interested in people improving this wiki page, and in people
discussing and coding some of the items needed to implement this
feature.  I have personally talked to a few people already who seemed
receptive.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: We need index-only scans

От
Greg Stark
Дата:
On Wed, Nov 10, 2010 at 4:04 AM, Bruce Momjian <bruce@momjian.us> wrote:
> We last researched index-only scans, also called covering indexes, in
> September of 2008, but have made little progress on it since.  Many have
> been waiting for Heikki to implement this but I talked to him and he
> doesn't have time.
>
> I believe it is time for the community to move forward and I would like
> to assemble a team to work on this feature.  We might not be able to
> implement it for Postgres 9.1, but hopefully we can make some progress
> on this.

Just so everyone is on the same page.... Even once we have index-only
scans they won't be anywhere near as useful with Postgres as they are
with Oracle and other databases. At least not unless we find a
solution for a different problem -- our inability to scan btree
indexes sequentially.

In Oracle "Fast Full Index" scans are particularly useful for things
like unconstrained select count(*). Since the scan can scan through
the index sequentially and the index is much smaller than the table it
can count all the values fairly quickly even on a very wide table.

In Postgres, aside from the visibility issues we have a separate
problem. In order to achieve high concurrency we allow splits to occur
without locking the index. And the new pages can be found anywhere in
the index, even to the left of the existing page. So a sequential scan
could miss some data if the page it's on is split and some of the data
is moved to be to the left of where our scan is.

It's possible this is a non-issue in the future due to large RAM sizes
and SSDs. Large amounts of RAM mean perhaps indexes will be in memory
much of the time and SSDs might mean that scanning the btree in index
order might not really be that bad.



--
greg


Re: We need index-only scans

От
Alvaro Herrera
Дата:
Excerpts from Greg Stark's message of vie nov 12 10:33:28 -0300 2010:

> In Postgres, aside from the visibility issues we have a separate
> problem. In order to achieve high concurrency we allow splits to occur
> without locking the index. And the new pages can be found anywhere in
> the index, even to the left of the existing page. So a sequential scan
> could miss some data if the page it's on is split and some of the data
> is moved to be to the left of where our scan is.

Eh?  If a transaction splits a page and put some new tuples to the
"left" of another transaction's scan (assuming a forward scan), then
necessarily the second transaction cannot see those tuples anyway -- the
inserter must have done the split after you got your snapshot.

A transaction cannot split a page that other transaction has a scan
stopped in, because there are buffer locks involved.  Therefore the scan
cannot miss any tuples.  Saith src/backend/access/nbtree/README:


Lehman and Yao don't require read locks, but assume that in-memory
copies of tree pages are unshared.  Postgres shares in-memory buffers
among backends.  As a result, we do page-level read locking on btree
pages in order to guarantee that no record is modified while we are
examining it.  This reduces concurrency but guaranteees correct
behavior.  An advantage is that when trading in a read lock for a
write lock, we need not re-read the page after getting the write lock.
Since we're also holding a pin on the shared buffer containing the
page, we know that buffer still contains the page and is up-to-date.

We support the notion of an ordered "scan" of an index as well as
insertions, deletions, and simple lookups.  A scan in the forward
direction is no problem, we just use the right-sibling pointers that
L&Y require anyway.  (Thus, once we have descended the tree to the
correct start point for the scan, the scan looks only at leaf pages
and never at higher tree levels.)  To support scans in the backward
direction, we also store a "left sibling" link much like the "right
sibling".  (This adds an extra step to the L&Y split algorithm: while
holding the write lock on the page being split, we also lock its former
right sibling to update that page's left-link.  This is safe since no
writer of that page can be interested in acquiring a write lock on our
page.)  A backwards scan has one additional bit of complexity: after
following the left-link we must account for the possibility that the
left sibling page got split before we could read it.  So, we have to
move right until we find a page whose right-link matches the page we
came from.  (Actually, it's even harder than that; see deletion discussion
below.)

Page read locks are held only for as long as a scan is examining a page.
To minimize lock/unlock traffic, an index scan always searches a leaf page
to identify all the matching items at once, copying their heap tuple IDs
into backend-local storage.  The heap tuple IDs are then processed while
not holding any page lock within the index.  We do continue to hold a pin
on the leaf page, to protect against concurrent deletions (see below).
In this state the scan is effectively stopped "between" pages, either
before or after the page it has pinned.  This is safe in the presence of
concurrent insertions and even page splits, because items are never moved
across pre-existing page boundaries --- so the scan cannot miss any items
it should have seen, nor accidentally return the same item twice.  The scan
must remember the page's right-link at the time it was scanned, since that
is the page to move right to; if we move right to the current right-link
then we'd re-scan any items moved by a page split.  We don't similarly
remember the left-link, since it's best to use the most up-to-date
left-link when trying to move left (see detailed move-left algorithm
below).


-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: We need index-only scans

От
Heikki Linnakangas
Дата:
On 12.11.2010 15:51, Alvaro Herrera wrote:
> Excerpts from Greg Stark's message of vie nov 12 10:33:28 -0300 2010:
>
>> In Postgres, aside from the visibility issues we have a separate
>> problem. In order to achieve high concurrency we allow splits to occur
>> without locking the index. And the new pages can be found anywhere in
>> the index, even to the left of the existing page. So a sequential scan
>> could miss some data if the page it's on is split and some of the data
>> is moved to be to the left of where our scan is.
>
> Eh?

It took me a while to understand what Greg meant as well. You can't scan 
a B-tree index in *physical order*, You have to first descend to the 
leftmost leaf, and follow the right pointers from there until you reach 
the rightmost leaf. That is a lot slower than seqscanning a file in 
physical order.

We solved that for VACUUM, taking advantage of the fact that there can 
only be one VACUUM on a table at a time. Maybe that mechanism could be 
generalized to all scans, but it would require some thinking..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: We need index-only scans

От
Bruce Momjian
Дата:
Greg Stark wrote:
> On Wed, Nov 10, 2010 at 4:04 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > We last researched index-only scans, also called covering indexes, in
> > September of 2008, but have made little progress on it since. ?Many have
> > been waiting for Heikki to implement this but I talked to him and he
> > doesn't have time.
> >
> > I believe it is time for the community to move forward and I would like
> > to assemble a team to work on this feature. ?We might not be able to
> > implement it for Postgres 9.1, but hopefully we can make some progress
> > on this.
> 
> Just so everyone is on the same page.... Even once we have index-only
> scans they won't be anywhere near as useful with Postgres as they are
> with Oracle and other databases. At least not unless we find a
> solution for a different problem -- our inability to scan btree
> indexes sequentially.
> 
> In Oracle "Fast Full Index" scans are particularly useful for things
> like unconstrained select count(*). Since the scan can scan through
> the index sequentially and the index is much smaller than the table it
> can count all the values fairly quickly even on a very wide table.
> 
> In Postgres, aside from the visibility issues we have a separate
> problem. In order to achieve high concurrency we allow splits to occur
> without locking the index. And the new pages can be found anywhere in
> the index, even to the left of the existing page. So a sequential scan
> could miss some data if the page it's on is split and some of the data
> is moved to be to the left of where our scan is.
> 
> It's possible this is a non-issue in the future due to large RAM sizes
> and SSDs. Large amounts of RAM mean perhaps indexes will be in memory
> much of the time and SSDs might mean that scanning the btree in index
> order might not really be that bad.

Agreed.  I updated the index-only scans wiki for this:
http://wiki.postgresql.org/wiki/Index-only_scanstest speed improvement for scans of the entire index (this
involvesrandomI/O)    * we can't scan the index in physical order like vacuum does 
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: We need index-only scans

От
Andrew Dunstan
Дата:

On 11/12/2010 09:17 AM, Bruce Momjian wrote:
> Greg Stark wrote:
>> On Wed, Nov 10, 2010 at 4:04 AM, Bruce Momjian<bruce@momjian.us>  wrote:
>>> We last researched index-only scans, also called covering indexes, in
>>> September of 2008, but have made little progress on it since. ?Many have
>>> been waiting for Heikki to implement this but I talked to him and he
>>> doesn't have time.
>>>
>>> I believe it is time for the community to move forward and I would like
>>> to assemble a team to work on this feature. ?We might not be able to
>>> implement it for Postgres 9.1, but hopefully we can make some progress
>>> on this.
>> Just so everyone is on the same page.... Even once we have index-only
>> scans they won't be anywhere near as useful with Postgres as they are
>> with Oracle and other databases. At least not unless we find a
>> solution for a different problem -- our inability to scan btree
>> indexes sequentially.
>>
>> In Oracle "Fast Full Index" scans are particularly useful for things
>> like unconstrained select count(*). Since the scan can scan through
>> the index sequentially and the index is much smaller than the table it
>> can count all the values fairly quickly even on a very wide table.
>>
>> In Postgres, aside from the visibility issues we have a separate
>> problem. In order to achieve high concurrency we allow splits to occur
>> without locking the index. And the new pages can be found anywhere in
>> the index, even to the left of the existing page. So a sequential scan
>> could miss some data if the page it's on is split and some of the data
>> is moved to be to the left of where our scan is.
>>
>> It's possible this is a non-issue in the future due to large RAM sizes
>> and SSDs. Large amounts of RAM mean perhaps indexes will be in memory
>> much of the time and SSDs might mean that scanning the btree in index
>> order might not really be that bad.
> Agreed.  I updated the index-only scans wiki for this:
>
>     http://wiki.postgresql.org/wiki/Index-only_scans
>     
>     test speed improvement for scans of the entire index (this involves
>     random I/O)
>         * we can't scan the index in physical order like vacuum does

For unconstrained select count(*), why does scanning in index order matter?

cheers

andrew


Re: We need index-only scans

От
Alvaro Herrera
Дата:
Excerpts from Heikki Linnakangas's message of vie nov 12 11:01:39 -0300 2010:

> It took me a while to understand what Greg meant as well. You can't scan 
> a B-tree index in *physical order*, You have to first descend to the 
> leftmost leaf, and follow the right pointers from there until you reach 
> the rightmost leaf. That is a lot slower than seqscanning a file in 
> physical order.

Oh, that makes more sense.  I'm not sure that can be supported sanely
(i.e. not locking the whole index)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: We need index-only scans

От
Robert Haas
Дата:
On Fri, Nov 12, 2010 at 8:33 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Wed, Nov 10, 2010 at 4:04 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> We last researched index-only scans, also called covering indexes, in
>> September of 2008, but have made little progress on it since.  Many have
>> been waiting for Heikki to implement this but I talked to him and he
>> doesn't have time.
>>
>> I believe it is time for the community to move forward and I would like
>> to assemble a team to work on this feature.  We might not be able to
>> implement it for Postgres 9.1, but hopefully we can make some progress
>> on this.
>
> Just so everyone is on the same page.... Even once we have index-only
> scans they won't be anywhere near as useful with Postgres as they are
> with Oracle and other databases. At least not unless we find a
> solution for a different problem -- our inability to scan btree
> indexes sequentially.

I have very little doubt that our first attempts to chip away at this
problem are going to be a bit rough around the edges.  Here's another
problem to mull over: a large insert-only table will never be
vacuumed; therefore, the visibility map bits will never become set;
therefore, the index-only scan optimization won't apply (and the user
may not realize it or understand why it's happening).

But the journey of a thousand miles begins with the first step.  I
think we need to focus our first effort on making the visibility map
crash-safe.  Then we can implement the basic feature, which I would
characterize this way: if performing an index-scan, and all the
attributes we need are available from the index tuple, then skip the
heap fetch when the visibility map bit is set.  This requires minimal
planner support - just an adjustment of the costing model for index
scans; although to do it right I think we're going to need statistics
on what fraction of pages in the heap have the visibility map bit set.Then, we can work on refinements, of which I
thinkthere will be 
many, including the one you listed.  Another is to bubble up heap
fetches in the plan tree - so for example if you eventually need to
return some attributes that aren't in the index tuple, you could
consider performing some other join based on the index columns and
then do the heap fetches for the remaining attributes (and visibility
checks) later.

I am not confident that we can get even a basic implementation of
index-only scans into 9.1 at this point, and we're certainly not going
to get all the kinks worked out.  So I agree with you that we
shouldn't set expectations above the level at which they can be met,
but, I'd be happy if we can make a start on it.

...Robert


Re: We need index-only scans

От
Kristian Nielsen
Дата:
Greg Stark <gsstark@mit.edu> writes:

> Just so everyone is on the same page.... Even once we have index-only
> scans they won't be anywhere near as useful with Postgres as they are
> with Oracle and other databases. At least not unless we find a
> solution for a different problem -- our inability to scan btree
> indexes sequentially.

True, however I would not be too pessimistic about this.

For OLTP like typical web applications, index-only scans are a killer feature
for being able to read N rows with 1 I/O (for some small N), when the data no
longer fits in the buffer pool, or after cold start.

Eg. read all account settings for one user account, or subjects of all
messages, etc. A composite index with user_id in the first column allows to
fetch all N rows from one (or a few) disk pages with an index-only scan, as
opposed to N disk pages.

So for this, index-only scans can make a _big_ difference, even without
support for Oracle-type index fast-full-scans.
- Kristian.


Re: We need index-only scans

От
MARK CALLAGHAN
Дата:
On Wed, Dec 1, 2010 at 3:53 AM, Kristian Nielsen
<knielsen@knielsen-hq.org> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>
>> Just so everyone is on the same page.... Even once we have index-only
>> scans they won't be anywhere near as useful with Postgres as they are
>> with Oracle and other databases. At least not unless we find a
>> solution for a different problem -- our inability to scan btree
>> indexes sequentially.
>
> True, however I would not be too pessimistic about this.
>
> For OLTP like typical web applications, index-only scans are a killer feature
> for being able to read N rows with 1 I/O (for some small N), when the data no
> longer fits in the buffer pool, or after cold start.
>
> Eg. read all account settings for one user account, or subjects of all
> messages, etc. A composite index with user_id in the first column allows to
> fetch all N rows from one (or a few) disk pages with an index-only scan, as
> opposed to N disk pages.
>
> So for this, index-only scans can make a _big_ difference, even without
> support for Oracle-type index fast-full-scans.

I am not trying start a MySQL vs PostgreSQL thread. I lurk on these
lists to learn more about PostgreSQL.

I know that PostgreSQL is good at OLTP and complex query processing
and that index fast-full scans can make a big difference for large
joins, but the workload that I care about is OLTP-only. PostgreSQL
will be more efficient on that workload with support for index-only
scans. The majority of the load is simple queries -- joins that touch
a few rows, short index range scans and index point lookups. With
covering indexes and InnoDB the queries do a few disk reads in the
worst case. Without covering indexes the queries do extra disk IO in
the worst case (buffer pool read miss) and this is much worse for the
range scans. I assume that the behavior with covering indexes but
without index-only scans is similar to not having index-only scans.

I collect 95th percentile response times for my popular queries and
they are much improved with the use of covering indexes.

-- 
Mark Callaghan
mdcallag@gmail.com