Re: Performace comparison of indexes over timestamp fields

Поиск
Список
Период
Сортировка
От Alexander Staubo
Тема Re: Performace comparison of indexes over timestamp fields
Дата
Msg-id 88daf38c0705220539q5807d2c2t1f20a03922bf8d33@mail.gmail.com
обсуждение исходный текст
Ответ на Performace comparison of indexes over timestamp fields  (Arnau <arnaulist@andromeiberica.com>)
Ответы Re: Performace comparison of indexes over timestamp fields  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Список pgsql-performance
On 5/22/07, Arnau <arnaulist@andromeiberica.com> wrote:
>    On older versions of PostgreSQL, at least in my experience, queries
> on timestamps fields even having indexes where performing quite bad
> mainly sequential scans where performed.

PostgreSQL uses B-tree indexes for scalar values. For an expression
such as "t between a and b", I believe it's going to match both sides
of the table independently (ie., t >= a and t <= b) and intersect
these subsets. This is inefficient.

You should get better performance by mapping timestamps to a
one-dimensional plane and indexing them using GiST. GiST implements an
R-tree-like structure that supports bounding-box searches.

This involves setting up a functional index:

  create index ... on payment_transactions using gist (
    box(point(extract(epoch from time), 0), point(extract(epoch from
time), 0)) box_ops)

I'm using box() here because GiST doesn't have a concept of points.

Then insert as usual, and then query with something like:

  select ... from payment_transactions
  where box(
    point(extract(epoch from '2006-04-01'::date), 0),
    point(extract(epoch from '2006-08-01'::date), 0)) && box(
    point(extract(epoch from time), 0),
    point(extract(epoch from time), 0));

PostgreSQL should be able to exploit the GiST index by recognizing
that the result of box() expression operand is already computed in the
index.

This much less inconvenient and portable -- I would love for
PostgreSQL to be provide syntactic sugar and special-casing to make
this transparent -- but worth it if you are dealing with a lot of
range searches.

>    Now I have a newer version of PostgreSQL and I've done some tests
> comparing the performance of an index over a timestamp field with a
> numeric field. To do so, I have the following table:
>
>                       Table "public.payment_transactions"
>      Column     |            Type             |            Modifiers
> ----------------+-----------------------------+---------------------------------
> transaction_id | character varying(32)       | not null
> timestamp_in   | timestamp without time zone | default now()
> credits        | integer                     |
> epoch_in       | bigint                      |
> epoch_in2      | double precision            |
[snip]

A timestamp is stored internally as an 8-byte double-precision float.
Therefore, timestamp_in and epoch_in2 should behave identically.

> While doing the tests this table has about 100.000 entries.

Make sure PostgreSQL is able to keep the entire table in memory by
setting shared_buffers; you don't want to be hitting to the disk.

Make sure you run "analyze" on the table before you execute the test.

> To test the diferent indexes I have executed the following:

Your query plans are roughly identical. The difference in the timings
implies that you only ran the queries once. I suggest you run each
query at least 10 times, and report the individual numbers (the "total
runtime" parts of the output) you get. Arithmetic means are not that
interesting.

Alexander.

В списке pgsql-performance по дате отправления:

Предыдущее
От: cedric
Дата:
Сообщение: Re: Key/Value reference table generation: INSERT/UPDATE performance
Следующее
От: "Steinar H. Gunderson"
Дата:
Сообщение: Re: Performace comparison of indexes over timestamp fields