Обсуждение: timestamp indexing

От:
Tobias Brox
Дата:

We have a production database with transaction-style data, in most of the
tables we have a timestamp attribute "created" telling the creation time of
the table row.  Naturally, this attribute is always increasing.

By now we are hitting the limit where the table data does not fit in caches
anymore.  We have a report section where there are constantly requests for
things like "sum up all transactions for the last two weeks", and those
requests seem to do a full table scan, even though only the last parts of
the table is needed - so by now those reports have started to cause lots of
iowait.

Is there any way to avoid this, apart from adding memory linearly with
database growth, make adjunct tables for historical rows, or build a
separate data warehousing system?  There must be some simpler solutions,
right?

--
Tobias Brox, Beijing

От:
Michael Fuhr
Дата:

On Mon, May 30, 2005 at 05:19:51PM +0800, Tobias Brox wrote:
>
> We have a production database with transaction-style data, in most of the
> tables we have a timestamp attribute "created" telling the creation time of
> the table row.  Naturally, this attribute is always increasing.

The message subject is "timestamp indexing" but you don't mention
whether you have an index on the timestamp column.  Do you?

> By now we are hitting the limit where the table data does not fit in caches
> anymore.  We have a report section where there are constantly requests for
> things like "sum up all transactions for the last two weeks", and those
> requests seem to do a full table scan, even though only the last parts of
> the table is needed - so by now those reports have started to cause lots of
> iowait.

Could you post an example query and its EXPLAIN ANALYZE output?  If
the query uses a sequential scan then it might also be useful to see
the EXPLAIN ANALYZE output with enable_seqscan turned off.  Since
caching can cause a query to be significantly faster after being run
several times, it might be a good idea to run EXPLAIN ANALYZE three
times and post the output of the last run -- that should put the
queries under comparison on a somewhat equal footing (i.e., we don't
want to be misled about how much faster one query is than another
simply because one query happened to use more cached data on a
particular run).

How many records are in the tables you're querying?  Are you regularly
vacuuming and analyzing the database or the individual tables?  Are
any of the tables clustered?  If so, on what indexes and how often
are you re-clustering them?  What version of PostgreSQL are you using?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

От:
Tobias Brox
Дата:

[Michael Fuhr - Mon at 07:54:29AM -0600]
> The message subject is "timestamp indexing" but you don't mention
> whether you have an index on the timestamp column.  Do you?

Yes. Sorry for not beeing explicit on that.

> Could you post an example query and its EXPLAIN ANALYZE output?  If
> the query uses a sequential scan then it might also be useful to see
> the EXPLAIN ANALYZE output with enable_seqscan turned off.  Since
> caching can cause a query to be significantly faster after being run
> several times, it might be a good idea to run EXPLAIN ANALYZE three
> times and post the output of the last run -- that should put the
> queries under comparison on a somewhat equal footing (i.e., we don't
> want to be misled about how much faster one query is than another
> simply because one query happened to use more cached data on a
> particular run).

The actual statement was with 6 or 7 joins and very lengthy.  I reduced
it to a simple single join query which still did a sequential scan
rather than an index scan (as expected), and I believe I already did a
follow-up mail including "explain analyze".  All "explain analyze" in my
previous mail was run until the resulting execution time had stabilized,
relatively.  I will try with "set enable_seqscan off" when I get back to
the office.

> How many records are in the tables you're querying?

Also answered on in my follow-up.

> Are you regularly
> vacuuming and analyzing the database or the individual tables?

Vacuum is run nightly, and I also did a manual "vacuum analyze table" on
the table in question.

> Are
> any of the tables clustered?  If so, on what indexes and how often
> are you re-clustering them?

Huh? :)

> What version of PostgreSQL are you using?

Also answered in my follow-up - "not yet pg8" :)


От:
Tom Lane
Дата:

Tobias Brox <> writes:
>> What version of PostgreSQL are you using?

> Also answered in my follow-up - "not yet pg8" :)

Your followup hasn't shown up here yet, but if the query is written like
    WHERE timestampcol >= now() - interval 'something'
then the pre-8.0 planner is not capable of making a good estimate of the
selectivity of the WHERE clause.  One solution is to fold the timestamp
computation to a constant on the client side.

            regards, tom lane

От:
Tobias Brox
Дата:

[Tom Lane - Mon at 01:57:54PM -0400]
> Your followup hasn't shown up here yet,

I'll check up on that and resend it.

> but if the query is written like
>     WHERE timestampcol >= now() - interval 'something'
> then the pre-8.0 planner is not capable of making a good estimate of the
> selectivity of the WHERE clause.

> One solution is to fold the timestamp
> computation to a constant on the client side.

I don't think there are any of that in the production; we always make the
timestamps on the client side.

As to my original problem, I looked up on table clustering on google.
Right, for report performance, we store some aggregates in the table which
are updated several times.  If I've understood it correctly, the row will
physically be moved to the tail of the table every time the attribute is
updated.  I understand that it may make sense to do a full table scan if a
random 10% of the rows should be selected.  Forcing the usage of the index
caused a tiny improvement of performance, but only after running it some few
times to be sure the index got buffered :-)

--
Tobias Brox, Beijing

От:
Tobias Brox
Дата:

[Tobias Brox - Tue at 10:06:25AM +0800]
> [Tom Lane - Mon at 01:57:54PM -0400]
> > Your followup hasn't shown up here yet,
>
> I'll check up on that and resend it.

Hrm ... messed-up mail configuration I suppose.  Here we go:

Paul McGarry unintentionally sent a request for more details off the list,
since it was intended for the list I'll send my reply here.

While writing up the reply, and doing research, I discovered that this is
not a problem with indexing timestamps per se, but more with a query of the
kind "give me 5% of the table"; it seems like it will often prefer to do a
full table scan instead of going via the index.

I think that when I had my university courses on databases, we also learned
about flat indexes, where the whole index has to be rebuilt whenever a field
is updated or inserted in the middle, and I also think we learned that the
table usually would be sorted physically by the primary key on the disk.  As
long as we have strictly incrementing primary keys and timestamps, such a
setup would probably be more efficient for queries of the kind "give me all
activity for the last two weeks"?

Here follows my reply to Paul, including some gory details:

[Paul McGarry - Mon at 07:59:35PM +1000]
> What version of postgresql are you using and what are the exact
> datatypes and queries?

We are still using 7.4.6, but I suppose that if our issues are completely or
partially solved in pg 8, that would make a good case for upgrading :-)

The datatypes I'm indexing are timestamp without time zone.

Actually I may be on the wrong hunting ground now - the production system
froze completely some days ago basically due to heavy iowait and load on the
database server, rendering postgresql completely unresponsive - and back
then we had too poor logging to find out what queries that was causing it to
grind to a halt, and since we've never such a bad problem before, we didn't
know how to handle the situation (we just restarted the entire postgresql;
if we had been just killing the processes running the rogue database
queries, we would have had very good tracks of it in the logs).

I digress.  The last days I've looked through profiling logs, and I'm
checking if the accumulatively worst queries can be tuned somehow.  Most of
them are big joins, but I'm a bit concerned of the amounts of "Seq Scan"
returned by "explain" despite the fact that only a small fraction of the
tables are queried.  I reduced the problem to a simple "select * from table
where created>xxx" and discovered that it still won't use index, and still
will be costly (though of course not much compared to the big joined query).

The "ticket" table have less than a million rows, around 50k made the last
ten days:

NBET=> explain analyze select * from ticket where created>'2005-05-20';
                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------
 Seq Scan on ticket  (cost=0.00..19819.91 rows=89553 width=60) (actual time=535.884..1018.268 rows=53060 loops=1)
   Filter: (created > '2005-05-20 00:00:00'::timestamp without time zone)
 Total runtime: 1069.514 ms
(3 rows)

Anyway, it seems to me that "indexing on timestamp" is not the real issue
here, because when restricting by primary key (numeric, sequential ID) the
execution time is the same or worse, still doing a sequence scan:

NBET=> explain analyze select * from ticket  where id>711167;
                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------
 Seq Scan on ticket  (cost=0.00..19819.91 rows=92273 width=60) (actual
time=550.855..1059.843 rows=53205 loops=1)
   Filter: (id > 711167)
 Total runtime: 1110.469 ms
(3 rows)


I've tried running equivalent queries on a table with twice as many rows and
width=180, it will pull from the index both when querying by ID and
timestamp, and it will usually spend less time.

Running "select * from ticket" seems to execute ~2x slower than when having
the restriction.

> I have a 7.3 database with a "timestamp with time zone" field and we
> have to be very careful to explicitly cast values as that in queries
> if it is to use the index correctly. I believe it's an issue that is
> cleared up in newer versions though.

I suppose so - as said, restricting by primary key didn't improve the
performance significantly, so I was clearly wrong indicating that this is a
special issue with indexing a timestamp.

--
Tobias Brox, Beijing

От:
"Jim C. Nasby"
Дата:

What does

SET enable_seqscan = false;
EXPLAIN ANALYZE SELECT * FROM ...

get you? Is it faster?

BTW, I suspect this behavior is because the estimates for the cost of an
index scan don't give an appropriate weight to the correlation of the
index. The 'sort and index' thread on this list from a few months ago
has more info.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

От:
Tobias Brox
Дата:

[Jim C. Nasby - Thu at 01:04:53PM -0500]
> What does
>
> SET enable_seqscan = false;
> EXPLAIN ANALYZE SELECT * FROM ...
>
> get you? Is it faster?

I was experimenting with this some weeks ago, by now our database server has
quite low load numbers and I haven't gotten any complaints about anything
that is too slow, so I have temporary stopped working with this issue - so I
will not contribute with any more gory details at the moment. :-)

I concluded with that our "problem" is that we (for performance reasons)
store aggregated statistics in the "wrong" tables, and since updating a row
in pg effectively means creating a new physical row in the database, the
rows in the table are not in chronological order.  If "last months activity"
presents like 7% of the rows from the table is to be fetched, the planner
will usually think that a seq scan is better.  As time pass by and the table
grows, it will jump to index scans.

The "old" stuff in the database eventually grow historical, so the
aggregated statistics will not be updated for most of those rows.  Hence a
forced index scan will often be a bit faster than a suggested table scan.  I
experimented, and doing an index scan for the 3rd time would usually be
faster than doing a full table scan for the 3rd time, but with things not
beeing in cache, the planner was right to suggest that seq scan was faster
due to less disk seeks.

The long term solution for this problem is to build a separate data
warehouse system.  The short time solution is to not care at all
(eventually, buy more memory).

As long as the queries is on the form "give me everything since last
monday", it is at least theoretically possible to serve this through partial
indices, and have a cronjob dropping the old indices and creating new every
week.

Doing table clustering night time would probably also be a solution, but I
haven't cared to test it out yet.  I'm a bit concerned about
performance/locking issues.

--
Tobias Brox, +47-91700050
Tallinn, Europe