Re: Performance problems with large telemetric datasets on 7.4.2

Поиск
Список
Период
Сортировка
От Sven Clement
Тема Re: Performance problems with large telemetric datasets on 7.4.2
Дата
Msg-id 881035ea0708030857s621af020r4d3d7b994e2ca9a3@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance problems with large telemetric datasets on 7.4.2  (Mark Lewis <mark.lewis@mir3.com>)
Ответы Re: Performance problems with large telemetric datasets on 7.4.2
Список pgsql-performance
Hi,

First thank you already for your answers, as we are working in an environment with NDA's I have first to check all the queries before I may publish them here, but the structure of the DB is publishable:

2 Tables:
Table: "public.tmdata"

  Column    |        Type          |    Modifiers
------------+-----------------------------+-----------------------------
timestamp   | timestamp without time zone |
id        | integer              | default -2147483684::bigint
datapointid | integer              | default 0
value       | integer              | default 0

Indexes:
    "tmdata_idx1" btree ("timestamp")

Legend:
-------
timestamp    = Timeindex of the event
id        = Hostname of the system who sent the event
datapointid    = ID of the Datapoint ( less than 100 )
value        = The value of the event

========================================================================

Table: "public.tmdataintervalsec"

  Column    |        Type          |    Modifiers
------------+-----------------------------+-----------------------------
timestamp   | timestamp without time zone |
id        | integer              | default -2147483684::bigint
datapointid | integer              | default 0
max        | integer              | default 0
min         | integer              | default 0
avg        | integer              | default 0
count       | integer              | default 0

Indexes:
    "tmdataintervalsec_idx1" btree ("timestamp", id)

Legend:
-------
timestamp    = Sets the period
id        = Hostname of the system who sent the event
datapointid    = ID of the Datapoint ( less than 100 )
max        = Max value for the period
min        = Min value for the period
avg        = Average of all values for the period
count        = Number of rows used for generation of the statistic

The data for the second table is generated by the daemon who receives the data and writes it to the database.


And we also confirmed that the index is used by the queries.

Regards,
Sven

P.S: I hope the databse layout is tsill readable when you receive it... ;)

2007/8/3, Mark Lewis < mark.lewis@mir3.com>:
On Fri, 2007-08-03 at 06:52 -0700, Sven Clement wrote:
> Hello everybody,
>
> as I'm new to this list I hope that it is the right place to post this
> and also the right format, so if I'm committing an error, I apologize
> in advance.
>
> First the background of my request:
>
> I'm currently employed by an enterprise which has approx. 250 systems
> distributed worldwide which are sending telemetric data to the main
> PostgreSQL.
> The remote systems are generating about 10 events per second per
> system which accumulates to about 2500/tps.
> The data is stored for about a month before it is exported and finally
> deleted from the database.
> On the PostgreSQL server are running to databases one with little
> traffic (about 750K per day) and the telemetric database with heavy
> write operations all around the day (over 20 million per day).
> We already found that the VACUUM process takes excessively long and as
> consequence the database is Vacuumed permanently.
>
> The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM
> and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated
> to database.
> OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1) with the
> libpq frontend library.
>
> Now the problem:
>
> The problem we are experiencing is that our queries are slowing down
> continuously even if we are performing queries on the index which is
> the timestamp of the event, a simple SELECT query with only a simple
> WHERE clause (< or >) takes very long to complete. So the database
> becomes unusable for production use as the data has to be retrieved
> very quickly if we want to act based on the telemetric data.
>
> So I'm asking me if it is useful to update to the actual 8.2 version
> and if we could experience performance improvement only by updating.
>
> Thank you for your answers,
> Sven Clement

Upgrading from 7.4.x to 8.2.x will probably give you a performance
benefit, yes.  There have been numerous changes since the days of 7.4.

But you didn't really give any information about why the query is
running slow.  Specifically, could you provide the query itself, some
information about the tables/indexes/foreign keys involved, and an
EXPLAIN ANALYZE for one of the problematic queries?

Also, what kind of vacuuming regimen are you using?  Just a daily cron
maybe?  Are you regularly analyzing the tables?

-- Mark Lewis



--
DSIGN.LU
Sven Clement
+352 621 63 21 18
sven@dsign.lu

www.dsign.lu

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Dan Langille"
Дата:
Сообщение: Re: Performance problems with large telemetric datasets on 7.4.2
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Performance problems with large telemetric datasets on 7.4.2