Обсуждение: Optimizer: ranges and partial indices? Or use partitioning?

Поиск
Список
Период
Сортировка

Optimizer: ranges and partial indices? Or use partitioning?

От
Adrian von Bidder
Дата:
Heyho!

Given a (big [1]) table

values ( ts timestamp, source integer, value float );

[under what conditions] will the opitmizer be smart enough to make use of a
partial index on "(source, ts) where ts > '2009-01-01'"?  (Queries will have
a date restriction but not necessarily the exact "> 2009-01-01".)

(A full index on source, ts is also built, but most queries are on values
within the last year.)

And related: what are the pro / contra of such a partial index versus table
partitioning?  Partitioning certainly brings added complexity; since values
are only ever appended to the table (with ts representing more or less
"now"), data is physically already grouped by ts.  Would we still benefit
from partitioning?  (partitioning by ts, that is.  Partitioning by source is
not realistic since there may be tens of thousands of sources.  Queries will
often be for values from multiple sources, so partitioning by groups of
sources would be very complicated to implement.)


thanks for your input.
-- vbi

[1] some of our databases are in the range of 20 to 50G, most of it in this
one table.

--
featured link: http://www.pool.ntp.org

Re: Optimizer: ranges and partial indices? Or use partitioning?

От
Tom Lane
Дата:
Adrian von Bidder <avbidder@fortytwo.ch> writes:
> Given a (big [1]) table

> values ( ts timestamp, source integer, value float );

> [under what conditions] will the opitmizer be smart enough to make use of a
> partial index on "(source, ts) where ts > '2009-01-01'"?  (Queries will have
> a date restriction but not necessarily the exact "> 2009-01-01".)

The planner is reasonably smart about deductions involving combinations
of btree-indexable operators.  It will be able to prove the index is
usable if the query includes restrictions like
    ts > '2009-01-02'
    ts >= '2009-01-02'
    ts = '2009-01-02'
where the comparison is to a constant that is >= the one in the index
predicate in the first case, or > the predicate in the others.

Whether it will think that using the index is a win is a different
question --- if the restriction is not reasonably selective it will
likely not want to use an index anyway.

> (A full index on source, ts is also built, but most queries are on values
> within the last year.)

If you have a full index on the same columns, I think that a partial
index like that is likely to be a complete waste.  It's just replicating
a subtree of the full index, and saving you probably not more than one
level of btree descent, at the cost of double the index update work and a
lot more pressure on cache memory.

            regards, tom lane

Re: Optimizer: ranges and partial indices? Or use partitioning?

От
Adrian von Bidder
Дата:
Heyho!

On Monday 01 March 2010 17.04:46 Tom Lane wrote:

table
> > values ( ts timestamp, source integer, value float );
> > [...] partial index on "(source, ts) where ts > '2009-01-01'"?

> The planner is reasonably smart about deductions involving combinations
> of btree-indexable operators.  It will be able to prove the index is
> usable if the query includes restrictions like
>     ts > '2009-01-02'
>     ts >= '2009-01-02'
>     ts = '2009-01-02'
> where the comparison is to a constant that is >= the one in the index
> predicate in the first case, or > the predicate in the others.

Ok.

>
> Whether it will think that using the index is a win is a different
> question --- if the restriction is not reasonably selective it will
> likely not want to use an index anyway.
>
> > (A full index on source, ts is also built, but most queries are on
> > values within the last year.)
>
> If you have a full index on the same columns, I think that a partial
> index like that is likely to be a complete waste.  It's just replicating
> a subtree of the full index, and saving you probably not more than one
> level of btree descent, at the cost of double the index update work and a
> lot more pressure on cache memory.

Ok, thanks.  That was exactly the kind of answer/explanation I was looking
for - I don't have enough experience to judge this. (And completely forgot
the fact that an additional index means that it'd need to be loaded to
memory at times, too.)  Thanks a lot.

Given the size of the table and given that query plans with small "test data
sets" will likely be different from what happens in a live system,
experimenting with various indices etc. is a bit time consuming, that's why
I asked first.

Currently, we're still in the green area, but as the table grows I'm quite
sure I'll have to look into optimizing this area within the next one or two
years...  (Luckily the application is fully within our control as well so we
can optimize on that side, too.)

cheers
-- vbi

--
The worst cliques are those which consist of one man.
        -- G. B. Shaw

Вложения

Re: Optimizer: ranges and partial indices? Or use partitioning?

От
"Davor J."
Дата:
I have the same table as yours with potential to grow over 50 billion of
records once operational. But our hardware is currently very limited (8GB
RAM).

I concur with Tom Lane about the fact that partial indexes aren't really an
option, but what about partitioning?

I read from the Postgres docs that "The exact point at which a table will
benefit from partitioning depends on the application, although a rule of
thumb is that the size of the table should exceed the physical memory of the
database server."
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Now, a table with 500M records would exceed our RAM, so I wonder what impact
a table of 50G would have on simple lookup performance (i.e. source = fixed,
timestamp = range), taking into account that a global index would exceed our
RAM on some 1G records.

Did anyone do some testing? Is partitioning a viable option in such
scenario?

"Adrian von Bidder" <avbidder@fortytwo.ch> wrote in message
news:201003020849.19133@fortytwo.ch...