Обсуждение: Multiple column index usage question

Поиск
Список
Период
Сортировка

Multiple column index usage question

От
"Jan Muszynski"
Дата:
Rather simple question, of which I'm not sure of the answer.

If I have a multiple column index, say:
    Index index1 on tableA (foo,bar)

and I then:
    Select * from "tableA" where foo = <some value>

Will index1 be used, or am I looking at a seqscan in all circumstances?

TIA
-jan m

Re: Multiple column index usage question

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/19/07 15:53, Jan Muszynski wrote:
> Rather simple question, of which I'm not sure of the answer.
>
> If I have a multiple column index, say:
>     Index index1 on tableA (foo,bar)
>
> and I then:
>     Select * from "tableA" where foo = <some value>
>
> Will index1 be used, or am I looking at a seqscan in all circumstances?

Yes, it will use the index.

However, in earlier versions, the lvalue & rvalue needed to match in
type to use the index.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFsUzmS9HxQb37XmcRArY8AKDqzS5FeY1HwkSGeOlhQsjsdpAV5gCghiWj
R4e7rBWaAAGF25ZFhy1Elgc=
=Wkp8
-----END PGP SIGNATURE-----

Re: Multiple column index usage question

От
"Jeremy Haile"
Дата:
That's interesting.  So if you have a composite index on two columns, is
there much of a reason (usually) to create single indexes on each of the
two columns?  I guess the single indexes might be slightly faster
depending on the number of different values/combinations, so probably
"it depends" eh?


On Fri, 19 Jan 2007 16:57:42 -0600, "Ron Johnson"
<ron.l.johnson@cox.net> said:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 01/19/07 15:53, Jan Muszynski wrote:
> > Rather simple question, of which I'm not sure of the answer.
> >
> > If I have a multiple column index, say:
> >     Index index1 on tableA (foo,bar)
> >
> > and I then:
> >     Select * from "tableA" where foo = <some value>
> >
> > Will index1 be used, or am I looking at a seqscan in all circumstances?
>
> Yes, it will use the index.
>
> However, in earlier versions, the lvalue & rvalue needed to match in
> type to use the index.
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFFsUzmS9HxQb37XmcRArY8AKDqzS5FeY1HwkSGeOlhQsjsdpAV5gCghiWj
> R4e7rBWaAAGF25ZFhy1Elgc=
> =Wkp8
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: Multiple column index usage question

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yes, it depends.

Given the example from OP, if you have queries that only reference
field bar, then the query optimizer will do a seqscan on the table.
 You would need a separate index on "bar"

And, given index1, you do not need another index on "foo" alone.


On 01/19/07 17:20, Jeremy Haile wrote:
> That's interesting.  So if you have a composite index on two columns, is
> there much of a reason (usually) to create single indexes on each of the
> two columns?  I guess the single indexes might be slightly faster
> depending on the number of different values/combinations, so probably
> "it depends" eh?
>
>
> On Fri, 19 Jan 2007 16:57:42 -0600, "Ron Johnson"
> <ron.l.johnson@cox.net> said:
> On 01/19/07 15:53, Jan Muszynski wrote:
>>>> Rather simple question, of which I'm not sure of the answer.
>>>>
>>>> If I have a multiple column index, say:
>>>>     Index index1 on tableA (foo,bar)
>>>>
>>>> and I then:
>>>>     Select * from "tableA" where foo = <some value>
>>>>
>>>> Will index1 be used, or am I looking at a seqscan in all circumstances?
> Yes, it will use the index.
>
> However, in earlier versions, the lvalue & rvalue needed to match in
> type to use the index.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFsVVbS9HxQb37XmcRAuB1AKDvMEzNgWVzYvwd6Z1OqAvZCOiD3gCg12Mo
vhk/F0f45VNzAn3sA2btrcQ=
=tZ8Z
-----END PGP SIGNATURE-----

Re: Multiple column index usage question

От
Tomas Vondra
Дата:
> Rather simple question, of which I'm not sure of the answer.
>
> If I have a multiple column index, say:
>     Index index1 on tableA (foo,bar)
>
> and I then:
>     Select * from "tableA" where foo = <some value>
>
> Will index1 be used, or am I looking at a seqscan in all circumstances?
>
> TIA
> -jan m

Rather difficult to answer this question without knowledge of the data,
especially it's statistical properties, but yes - the index might been
used if there's enough variability in the data (for the particular value).

But that's true for all indexes, single as well as multi-column ones.
For example imagine a query

   Select * from "tableA" where foo = 'x';

where 'x' is a very uncommon value (for example less than 1% of the rows
has this value). In that case the index definitely will be used (unless
some really stupid mistake - for example different data types - prevents
it's usage). On the other side, imagine the value 'x' is very common
(for example more than 10% has this value). In that case it's very
unlikely the index will be used as the sequential scan of the whole
table will most likely be more efficient).

In the new releases (definitely 8.1, I'm not sure about 8.0) the index
might be used even for queries related to 'bar' column alone, though it
would be a little less efficient as for 'foo' (or even both columns).

All that means you can replace several single-column indexes with one
multi-column index, and still use that index for queries with only some
of the indexed columns, but there are differences in efficiency. Generally:

  0) Multi-column indexes are most efficient when all the columns are
     used in the query.

  1) The more columns are used, the more efficient the index usage is.

  2) Columns 'from the beginning' are more efficiently processed than
     the columns 'from the end' (so the most often used column should
     be placed at the beginning).

  3) More variability in the data means more efficient index (so the
     most variable columns should be placed at the beginning).

This is somehow contradictory, especially the rules (2) and (3), and you
have to reason (and test) carefully about the order in the index, as you
want place the most often queried at the beginning of the list, but
there may be columns with more variability.

Another thing you have to take into account is sorting - that's another
area of indexing, especially with multi-column indexes.

Well, somehow long answer for a relatively short question ... sorry for
that.

Tomas

Re: Multiple column index usage question

От
Bruno Wolff III
Дата:
On Fri, Jan 19, 2007 at 18:20:47 -0500,
  Jeremy Haile <jhaile@fastmail.fm> wrote:
> That's interesting.  So if you have a composite index on two columns, is
> there much of a reason (usually) to create single indexes on each of the
> two columns?  I guess the single indexes might be slightly faster
> depending on the number of different values/combinations, so probably
> "it depends" eh?

You are normal going to want an index on just the second column in the
index or you do things where you are selecting a small subset of rows
based on the value of that column.

In some situations it may even make sense to have a separate index on just
the first column, because it will be more compact which will speed up
searches. However, you have to pay for maintaining the extra index when
changing the data so it often isn't worth it.