Обсуждение: Performance problems with large telemetric datasets on 7.4.2

Поиск
Список
Период
Сортировка

Performance problems with large telemetric datasets on 7.4.2

От
"Sven Clement"
Дата:
Hello everybody,

as I'm new to this list I hope that it is the right place to post this and also the right format, so if I'm committing an error, I apologize in advance.

First the background of my request:

I'm currently employed by an enterprise which has approx. 250 systems distributed worldwide which are sending telemetric data to the main PostgreSQL.
The remote systems are generating about 10 events per second per system which accumulates to about 2500/tps.
The data is stored for about a month before it is exported and finally deleted from the database.
On the PostgreSQL server are running to databases one with little traffic (about 750K per day) and the telemetric database with heavy write operations all around the day (over 20 million per day).
We already found that the VACUUM process takes excessively long and as consequence the database is Vacuumed permanently.

The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated to database.
OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1) with the libpq frontend library.

Now the problem:

The problem we are experiencing is that our queries are slowing down continuously even if we are performing queries on the index which is the timestamp of the event, a simple SELECT query with only a simple WHERE clause (< or >) takes very long to complete. So the database becomes unusable for production use as the data has to be retrieved very quickly if we want to act based on the telemetric data.

So I'm asking me if it is useful to update to the actual 8.2 version and if we could experience performance improvement only by updating.

Thank you for your answers,
Sven Clement

Re: Performance problems with large telemetric datasets on 7.4.2

От
Mark Lewis
Дата:
On Fri, 2007-08-03 at 06:52 -0700, Sven Clement wrote:
> Hello everybody,
>
> as I'm new to this list I hope that it is the right place to post this
> and also the right format, so if I'm committing an error, I apologize
> in advance.
>
> First the background of my request:
>
> I'm currently employed by an enterprise which has approx. 250 systems
> distributed worldwide which are sending telemetric data to the main
> PostgreSQL.
> The remote systems are generating about 10 events per second per
> system which accumulates to about 2500/tps.
> The data is stored for about a month before it is exported and finally
> deleted from the database.
> On the PostgreSQL server are running to databases one with little
> traffic (about 750K per day) and the telemetric database with heavy
> write operations all around the day (over 20 million per day).
> We already found that the VACUUM process takes excessively long and as
> consequence the database is Vacuumed permanently.
>
> The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM
> and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated
> to database.
> OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1) with the
> libpq frontend library.
>
> Now the problem:
>
> The problem we are experiencing is that our queries are slowing down
> continuously even if we are performing queries on the index which is
> the timestamp of the event, a simple SELECT query with only a simple
> WHERE clause (< or >) takes very long to complete. So the database
> becomes unusable for production use as the data has to be retrieved
> very quickly if we want to act based on the telemetric data.
>
> So I'm asking me if it is useful to update to the actual 8.2 version
> and if we could experience performance improvement only by updating.
>
> Thank you for your answers,
> Sven Clement

Upgrading from 7.4.x to 8.2.x will probably give you a performance
benefit, yes.  There have been numerous changes since the days of 7.4.

But you didn't really give any information about why the query is
running slow.  Specifically, could you provide the query itself, some
information about the tables/indexes/foreign keys involved, and an
EXPLAIN ANALYZE for one of the problematic queries?

Also, what kind of vacuuming regimen are you using?  Just a daily cron
maybe?  Are you regularly analyzing the tables?

-- Mark Lewis

Re: Performance problems with large telemetric datasets on 7.4.2

От
"Dan Langille"
Дата:
On 3 Aug 2007 at 6:52, Sven Clement wrote:

> Hello everybody,
>
> as I'm new to this list I hope that it is the right place to post this
> and also the right format, so if I'm committing an error, I apologize
> in advance.
>
> First the background of my request:
>
> I'm currently employed by an enterprise which has approx. 250 systems
> distributed worldwide which are sending telemetric data to the main
> PostgreSQL. The remote systems are generating about 10 events per
> second per system which accumulates to about 2500/tps. The data is
> stored for about a month before it is exported and finally deleted
> from the database. On the PostgreSQL server are running to databases
> one with little traffic (about 750K per day) and the telemetric
> database with heavy write operations all around the day (over 20
> million per day). We already found that the VACUUM process takes
> excessively long and as consequence the database is Vacuumed
> permanently.
>
> The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM
> and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated
> to database. OS is Debian 3.1 Sarge with PostgreSQL 7.4.7
> (7.4.7-6sarge1) with the libpq frontend library.
>
> Now the problem:
>
> The problem we are experiencing is that our queries are slowing down
> continuously even if we are performing queries on the index which is
> the timestamp of the event, a simple SELECT query with only a simple
> WHERE clause (< or >) takes very long to complete. So the database
> becomes unusable for production use as the data has to be retrieved
> very quickly if we want to act based on the telemetric data.

Have you confirmed via explain (or explain analyse) that the index is
being used?

> So I'm asking me if it is useful to update to the actual 8.2 version
> and if we could experience performance improvement only by updating.

There are other benefits from upgrading, but you may be able to solve
this problem without upgrading.

--
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



Re: Performance problems with large telemetric datasets on 7.4.2

От
"Sven Clement"
Дата:
Hi,

First thank you already for your answers, as we are working in an environment with NDA's I have first to check all the queries before I may publish them here, but the structure of the DB is publishable:

2 Tables:
Table: "public.tmdata"

  Column    |        Type          |    Modifiers
------------+-----------------------------+-----------------------------
timestamp   | timestamp without time zone |
id        | integer              | default -2147483684::bigint
datapointid | integer              | default 0
value       | integer              | default 0

Indexes:
    "tmdata_idx1" btree ("timestamp")

Legend:
-------
timestamp    = Timeindex of the event
id        = Hostname of the system who sent the event
datapointid    = ID of the Datapoint ( less than 100 )
value        = The value of the event

========================================================================

Table: "public.tmdataintervalsec"

  Column    |        Type          |    Modifiers
------------+-----------------------------+-----------------------------
timestamp   | timestamp without time zone |
id        | integer              | default -2147483684::bigint
datapointid | integer              | default 0
max        | integer              | default 0
min         | integer              | default 0
avg        | integer              | default 0
count       | integer              | default 0

Indexes:
    "tmdataintervalsec_idx1" btree ("timestamp", id)

Legend:
-------
timestamp    = Sets the period
id        = Hostname of the system who sent the event
datapointid    = ID of the Datapoint ( less than 100 )
max        = Max value for the period
min        = Min value for the period
avg        = Average of all values for the period
count        = Number of rows used for generation of the statistic

The data for the second table is generated by the daemon who receives the data and writes it to the database.


And we also confirmed that the index is used by the queries.

Regards,
Sven

P.S: I hope the databse layout is tsill readable when you receive it... ;)

2007/8/3, Mark Lewis < mark.lewis@mir3.com>:
On Fri, 2007-08-03 at 06:52 -0700, Sven Clement wrote:
> Hello everybody,
>
> as I'm new to this list I hope that it is the right place to post this
> and also the right format, so if I'm committing an error, I apologize
> in advance.
>
> First the background of my request:
>
> I'm currently employed by an enterprise which has approx. 250 systems
> distributed worldwide which are sending telemetric data to the main
> PostgreSQL.
> The remote systems are generating about 10 events per second per
> system which accumulates to about 2500/tps.
> The data is stored for about a month before it is exported and finally
> deleted from the database.
> On the PostgreSQL server are running to databases one with little
> traffic (about 750K per day) and the telemetric database with heavy
> write operations all around the day (over 20 million per day).
> We already found that the VACUUM process takes excessively long and as
> consequence the database is Vacuumed permanently.
>
> The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM
> and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated
> to database.
> OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1) with the
> libpq frontend library.
>
> Now the problem:
>
> The problem we are experiencing is that our queries are slowing down
> continuously even if we are performing queries on the index which is
> the timestamp of the event, a simple SELECT query with only a simple
> WHERE clause (< or >) takes very long to complete. So the database
> becomes unusable for production use as the data has to be retrieved
> very quickly if we want to act based on the telemetric data.
>
> So I'm asking me if it is useful to update to the actual 8.2 version
> and if we could experience performance improvement only by updating.
>
> Thank you for your answers,
> Sven Clement

Upgrading from 7.4.x to 8.2.x will probably give you a performance
benefit, yes.  There have been numerous changes since the days of 7.4.

But you didn't really give any information about why the query is
running slow.  Specifically, could you provide the query itself, some
information about the tables/indexes/foreign keys involved, and an
EXPLAIN ANALYZE for one of the problematic queries?

Also, what kind of vacuuming regimen are you using?  Just a daily cron
maybe?  Are you regularly analyzing the tables?

-- Mark Lewis



--
DSIGN.LU
Sven Clement
+352 621 63 21 18
sven@dsign.lu

www.dsign.lu

Re: Performance problems with large telemetric datasets on 7.4.2

От
Josh Berkus
Дата:
Sven,

> The hardware is a IBM X306m Server, 3.2 GHz HT (Pentium IV), 1 GB RAM
> and 2x 250 GB HDD (SATA-II) with ext3 fs, one of the HDD is dedicated to
> database. OS is Debian 3.1 Sarge with PostgreSQL 7.4.7 (7.4.7-6sarge1)
> with the libpq frontend library.

Note that 7.4.7 is not the current bugfix version of 7.4.x.  It is 5 or 6
patches behind.

> So I'm asking me if it is useful to update to the actual 8.2 version and
> if we could experience performance improvement only by updating.

8.2 will give you a number of features which should greatly improve your
performance situation:

1) partitioning: break up your main data table into smaller
easier-to-maintain segments (this will require some application changes)

2) VACUUM delay, which lowers the impact of vacuum on concurrent queries

3) AUTOVACUUM, which helps keep your tables in reasonable maintenance.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Performance problems with large telemetric datasets on 7.4.2

От
Andrew Kroeger
Дата:
Sven Clement wrote:
> Table: "public.tmdata"
...
> id        | integer              | default -2147483684::bigint
...
> Table: "public.tmdataintervalsec"
...
> id        | integer              | default -2147483684::bigint

Not that this directly addresses the performance issues you described,
but I have already seen 2 recommendations that you upgrade...

With the table definitions you posted, one of the first things I noticed
was that the default value for an integer column was a bigint value.  I
did some quick 32-bit math and found that the smallest legal 32-bit
integer value is -2147483648, not -2147483684 (notice the last 2 numbers
are transposed).

I checked your previous post and saw that you are currently running PG
7.4.2/7.4.7 (subject says 7.4.2, but you indicate 7.4.7 in the body of
your original post).  I did a quick check on my 8.1.9 box using the same
bigint default value for an integer column and received an "integer out
of range" error when I attempted to use the default value.

I don't know the exact workings of your system, but you'll need to watch
out for any cases where the default value for the id columns was used.
If that default value was used (and was allowed by your PG version) you
will probably have values in the id column that are not what you'd
expect.  I don't know how a bigint would be coerced into an integer, but
it would probably truncate in some form which would give you positive
values in the id column where you expected the smallest 32-bit integer
value (i.e. -2147483648).

I don't know if this was ever actually an issue (if you never rely on
the default value for the id column -- maybe version 7.4.7 would
generate the same error if you did), but if it was, you need to look at
a couple of things before upgrading (whether to a more recent 7.4.X or
8.2.4):

1. If you do rely on the default clause for the id column, you may
encounter the "integer out of range" errors with your existing codebase.

2. You may have values in the id column that are supposed to represent
the smallest 32-bit integer that may in fact be positive integers.

You will probably want to investigate these potential issues and perform
any necessary schema changes and data cleanup before attempting any upgrade.

Again, I'm not sure if this was ever an issue or if this issue has any
effects on your database.  I don't have any PG machines running anything
prior to 8.1.X, so I can't really test these.  I just saw the bigint
value as a default for an integer column and it caught my eye.

Hope this might help you avoid some problems when upgrading.

Andrew


Re: Performance problems with large telemetric datasets on 7.4.2

От
Tom Lane
Дата:
Andrew Kroeger <andrew@sprocks.gotdns.com> writes:
> With the table definitions you posted, one of the first things I noticed
> was that the default value for an integer column was a bigint value.  I
> did some quick 32-bit math and found that the smallest legal 32-bit
> integer value is -2147483648, not -2147483684 (notice the last 2 numbers
> are transposed).

Oooh, good catch, but 7.4 seems to notice the overflow all right:

regression=# create temp table foo(f1 int default -2147483684::bigint);
CREATE TABLE
regression=# insert into foo default values;
ERROR:  integer out of range
regression=# select version();
                            version
----------------------------------------------------------------
 PostgreSQL 7.4.17 on hppa-hp-hpux10.20, compiled by GCC 2.95.3
(1 row)

So I think we can conclude that the OP never actually uses this default.

            regards, tom lane

Re: Performance problems with large telemetric datasets on 7.4.2

От
"Sven Clement"
Дата:
Hi everybody,
 
The bigint problem was probably a typo because I had to type the entire definitions, as the server is on a vpn and I don't had access with my machine where I wrote the mail, and the 7.4.2 was surely a typo... ;) I apology...
 
OK so beginning on Monday I will test the config on a 8.2.x to verify the performance issues, but I also found some disturbing info's on the net, that the index may be corrupted because of the big difference between an index entry which is deleted and the new value inserted afterwards, which should not be an issue with a btree, but do you guys know something more about it, sorry I'm really good in SQL but in Postgre I'm still a beginner.
 
What the version belongs, so I know that it's not the actual bug fix, but as it is used in a running prod system and as my employer began considering the migration a year ago and so they froze the version waiting for an major update.
 
Thanks really to everybody here who already helped me a lot... Thanks!!!
Sven Clement

 
2007/8/4, Tom Lane <tgl@sss.pgh.pa.us>:
Andrew Kroeger <andrew@sprocks.gotdns.com> writes:
> With the table definitions you posted, one of the first things I noticed
> was that the default value for an integer column was a bigint value.  I
> did some quick 32-bit math and found that the smallest legal 32-bit
> integer value is -2147483648, not -2147483684 (notice the last 2 numbers
> are transposed).

Oooh, good catch, but 7.4 seems to notice the overflow all right:

regression=# create temp table foo(f1 int default -2147483684::bigint);
CREATE TABLE
regression=# insert into foo default values;
ERROR:  integer out of range
regression=# select version();
                           version
----------------------------------------------------------------
PostgreSQL 7.4.17 on hppa-hp-hpux10.20, compiled by GCC 2.95.3
(1 row)

So I think we can conclude that the OP never actually uses this default.

                       regards, tom lane



--
DSIGN.LU
Sven Clement
+352 621 63 21 18
sven@dsign.lu

www.dsign.lu

Re: Performance problems with large telemetric datasets on 7.4.2

От
Heikki Linnakangas
Дата:
Sven Clement wrote:
> OK so beginning on Monday I will test the config on a 8.2.x to verify the
> performance issues, but I also found some disturbing info's on the net, that
> the index may be corrupted because of the big difference between an index
> entry which is deleted and the new value inserted afterwards, which should
> not be an issue with a btree, but do you guys know something more about it,
> sorry I'm really good in SQL but in Postgre I'm still a beginner.

I don't remember a bug like that. Where did you read that from?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Performance problems with large telemetric datasets on 7.4.2

От
"Sven Clement"
Дата:


2007/8/5, Heikki Linnakangas <heikki@enterprisedb.com>:

I don't remember a bug like that. Where did you read that from?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Partially I found that one in the PostgreSQL Documentation for the 7.x.x versions under the command REINDEX where they claim that you should run a reindex under certain circumstances and for my comprehension this says that with some access pattern (as ours (major writes / one big delete per day)) the index may be corrupted or otherwise not really useful.

Sven Clement

Re: Performance problems with large telemetric datasets on 7.4.2

От
Ragnar
Дата:
On mán, 2007-08-06 at 00:10 -0700, Sven Clement wrote:
>
>
> 2007/8/5, Heikki Linnakangas <heikki@enterprisedb.com>:
>
>         I don't remember a bug like that. Where did you read that
>         from?
>
>         --
>           Heikki Linnakangas
>           EnterpriseDB   http://www.enterprisedb.com
>
> Partially I found that one in the PostgreSQL Documentation for the
> 7.x.x versions under the command REINDEX where they claim that you
> should run a reindex under certain circumstances and for my
> comprehension this says that with some access pattern (as ours (major
> writes / one big delete per day)) the index may be corrupted or
> otherwise not really useful.

you are probably talking about index bloat, not corruption.

when that happens, the index consumes more space that needed,
and its effectivity is reduced, but it is not corrupted and does
not cause wrong results.

i believe this is a lot less common now than in the 7.x days

gnari



Re: Performance problems with large telemetric datasets on 7.4.2

От
Heikki Linnakangas
Дата:
Sven Clement wrote:
> Partially I found that one in the PostgreSQL Documentation for the
> 7.x.xversions under the command REINDEX where they claim that you
> should run a
> reindex under certain circumstances and for my comprehension this says that
> with some access pattern (as ours (major writes / one big delete per day))
> the index may be corrupted or otherwise not really useful.

Up to 7.3, periodical REINDEX was needed to trim down bloated indexes.
Since 7.4, empty index pages are recycled so that's no longer necessary.
You can still end up with larger than necessary indexes in recent
versions under unusual access patterns, like if you delete all but a few
index tuples from each index page, but it's rare in practice. And it's
not unbounded growth like in <= 7.3.

In any case, the indexes won't become corrupt.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Performance problems with large telemetric datasets on 7.4.2

От
"Sven Clement"
Дата:
Ok thanks everybody for the calrification, after all now I allready learned something new... ;)

My employer is currently thinking about migration to 8.2.x because of your feedback, so I think that the problem could be resolved... ;)

Thanks to everyone...

Sven Clement

2007/8/6, Heikki Linnakangas <heikki@enterprisedb.com>:
Sven Clement wrote:
> Partially I found that one in the PostgreSQL Documentation for the
> 7.x.xversions under the command REINDEX where they claim that you
> should run a
> reindex under certain circumstances and for my comprehension this says that
> with some access pattern (as ours (major writes / one big delete per day))
> the index may be corrupted or otherwise not really useful.

Up to 7.3, periodical REINDEX was needed to trim down bloated indexes.
Since 7.4, empty index pages are recycled so that's no longer necessary.
You can still end up with larger than necessary indexes in recent
versions under unusual access patterns, like if you delete all but a few
index tuples from each index page, but it's rare in practice. And it's
not unbounded growth like in <= 7.3.

In any case, the indexes won't become corrupt.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



--
DSIGN.LU
Sven Clement
+352 621 63 21 18
sven@dsign.lu

www.dsign.lu

Re: Performance problems with large telemetric datasets on 7.4.2

От
"Scott Marlowe"
Дата:
On 8/6/07, Sven Clement <sven@dsign.lu> wrote:
> Ok thanks everybody for the calrification, after all now I allready learned
> something new... ;)
>
> My employer is currently thinking about migration to 8.2.x because of your
> feedback, so I think that the problem could be resolved... ;)

Note that whether the 8.2 migration is forthcoming, you should
definitely update to the latest patch version of 7.4.  The update is
pretty much painless, although I remember there being a change around
7.4.13 that might have changed db behaviour for security reasons.
Release notes here:

http://www.postgresql.org/docs/8.2/static/release-7-4-13.html

Note that there are settings to work around the change in behaviour
should it create a problem.