Обсуждение: Large number of tables slow insert

Поиск

Список

Период

Сортировка

Large number of tables slow insert

От

"Loic Petit"

Дата:

22 августа 2008 г., 22:41:38

Hi,

I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount of sensors. In order to have good performances on querying by timestamp on each sensor, I partitionned my measures table for each sensor. Thus I create a lot of tables.
I simulated a large sensor network with 3000 nodes so I have ~3000 tables. And it appears that each insert (in separate transactions) in the database takes about 300ms (3-4 insert per second) in tables where there is just few tuples (< 10). I think you can understand that it's not efficient at all because I need to treat a lot of inserts.

Do you have any idea why it is that slow ? and how can have good insert ?

My test machine: Intel p4 2.4ghz, 512mb ram, Ubuntu Server (ext3)
iostat tells me that I do : 0.5MB/s reading and ~6-7MB/s writing while constant insert

Here is the DDL of the measures tables:
-------------------------------------------------------
CREATE TABLE measures_0
(
"timestamp" timestamp without time zone,
storedtime timestamp with time zone,
count smallint,
"value" smallint[]
)
WITH (OIDS=FALSE);
CREATE INDEX measures_0_1_idx
ON measures_0
USING btree
((value[1]));

-- Index: measures_0_2_idx
CREATE INDEX measures_0_2_idx
ON measures_0
USING btree
((value[2]));

-- Index: measures_0_3_idx
CREATE INDEX measures_0_3_idx
ON measures_0
USING btree
((value[3]));

-- Index: measures_0_count_idx
CREATE INDEX measures_0_count_idx
ON measures_0
USING btree
(count);

-- Index: measures_0_timestamp_idx
CREATE INDEX measures_0_timestamp_idx
ON measures_0
USING btree
("timestamp");

-- Index: measures_0_value_idx
CREATE INDEX measures_0_value_idx
ON measures_0
USING btree
(value);
-------------------------------------------------------

Regards

Loïc Petit

Re: Large number of tables slow insert

От

DiezelMax

Дата:

23 августа 2008 г., 07:30:35

Each INDEX create a delay on INSERT. Try to measure performance w/o any
indexes.

Re: Large number of tables slow insert

От

tls.wydd@free.fr

Дата:

23 августа 2008 г., 17:24:42

Actually, I've got another test system with only few sensors (thus few tables)
and it's working well (<10ms insert) with all the indexes.
I know it's slowing down my performance but I need them to interogate the big
tables (each one can reach millions rows with time) really fast.

> Each INDEX create a delay on INSERT. Try to measure performance w/o any
> indexes.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Regards

Loïc

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

23 августа 2008 г., 17:27:40

Actually, I've got another test system with only few sensors (thus few tables) and it's working well (<10ms insert) with all the indexes.
I know it's slowing down my performance but I need them to interogate the big tables (each one can reach millions rows with time) really fast.

Regards

Loïc

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

23 августа 2008 г., 20:50:04

On Sat, Aug 23, 2008 at 1:35 PM,  <tls.wydd@free.fr> wrote:
> Actually, I've got another test system with only few sensors (thus few tables)
> and it's working well (<10ms insert) with all the indexes.
> I know it's slowing down my performance but I need them to interogate the big
> tables (each one can reach millions rows with time) really fast.

It's quite likely that on the smaller system the indexes all fit into
memory and only require writes, while on the bigger system they are
too large and have to be read from disk first, then written out.

A useful solution is to remove most of the indexes on the main server,
and set up a slony slave with the extra indexes on it to handle the
reporting queries.

Re: Large number of tables slow insert

От

tls.wydd@free.fr

Дата:

23 августа 2008 г., 21:18:03

On this smaller test, the indexes are over the allowed memory size (I've got
over 400.000 readings per sensor) so they are mostly written in disk. And on the
big test, I had small indexes (< page_size) because I only had about 5-10 rows
per table, thus it was 3000*8kb = 24mb which is lower than the allowed memory.

btw which is the conf parameter that contains the previously read indexes ?

I cannot do test this weekend because I do not have access to the machine but I
will try on monday some tests.

Thanks for your answers thought

Selon Scott Marlowe <scott.marlowe@gmail.com>:

> On Sat, Aug 23, 2008 at 1:35 PM,  <tls.wydd@free.fr> wrote:
> > Actually, I've got another test system with only few sensors (thus few
> tables)
> > and it's working well (<10ms insert) with all the indexes.
> > I know it's slowing down my performance but I need them to interogate the
> big
> > tables (each one can reach millions rows with time) really fast.
>
> It's quite likely that on the smaller system the indexes all fit into
> memory and only require writes, while on the bigger system they are
> too large and have to be read from disk first, then written out.
>
> A useful solution is to remove most of the indexes on the main server,
> and set up a slony slave with the extra indexes on it to handle the
> reporting queries.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

23 августа 2008 г., 21:26:23

On Sat, Aug 23, 2008 at 6:09 PM,  <tls.wydd@free.fr> wrote:
> On this smaller test, the indexes are over the allowed memory size (I've got
> over 400.000 readings per sensor) so they are mostly written in disk.

They're always written to disk.  Just sometimes they're not read.
Note that the OS caches files as well as pgsql, and I'm not sure what
you mean by "allowed memory size" but only the shared_buffers would
hold onto something after it's been operated on.  work_mem won't.

> And on the
> big test, I had small indexes (< page_size) because I only had about 5-10 rows
> per table, thus it was 3000*8kb = 24mb which is lower than the allowed memory.

> btw which is the conf parameter that contains the previously read indexes ?

Not sure what you're asking.  It's all automatic as far as OS and
postgresql caching goes.

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

23 августа 2008 г., 21:47:47

I was a bit confused about the read and write sorry ! I understand what you mean...
But do you think that the IO cost (of only one page) needed to handle the index writing is superior than 300ms ? Because each insert in any of these tables is that slow.
NB: between my "small" and my "big" tests the only difference is the number of table (10 and 3000), there is almost the same amount of on-disk data

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

23 августа 2008 г., 21:50:53

On Sat, Aug 23, 2008 at 6:47 PM, Loic Petit <tls.wydd@free.fr> wrote:
> I was a bit confused about the read and write sorry ! I understand what you
> mean...
> But do you think that the IO cost (of only one page) needed to handle the
> index writing is superior than 300ms ? Because each insert in any of these
> tables is that slow.
> NB: between my "small" and my "big" tests the only difference is the number
> of table (10 and 3000), there is almost the same amount of on-disk data

It could be that the tables or indexes are heavily bloated.  What's
the churn rate on these tables / indexes?

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

23 августа 2008 г., 21:59:25

1 table contains about 5 indexes : timestamp, one for each sensor type - 3, and one for packet counting (measures packet dropping)
(I reckon that this is quite heavy, but a least the timestamp and the values are really usefull)

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

23 августа 2008 г., 22:18:52

On Sat, Aug 23, 2008 at 6:59 PM, Loic Petit <tls.wydd@free.fr> wrote:
> 1 table contains about 5 indexes : timestamp, one for each sensor type - 3,
> and one for packet counting (measures packet dropping)
> (I reckon that this is quite heavy, but a least the timestamp and the values
> are really usefull)

But what's the update rate on these indexes and tables?  I'm wondering
if you're not vacuuming aggresively enough to keep up with bursty
update patterns

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

23 августа 2008 г., 22:31:44

One sensor (so one table) sends a packet each seconds (for 3000 sensors).
=> So we have : 1 insert per second for 3000 tables (and their indexes). Hopefully there is no update nor delete in it...

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

23 августа 2008 г., 22:37:46

On Sat, Aug 23, 2008 at 7:31 PM, Loic Petit <tls.wydd@free.fr> wrote:
> One sensor (so one table) sends a packet each seconds (for 3000 sensors).
> => So we have : 1 insert per second for 3000 tables (and their indexes).
> Hopefully there is no update nor delete in it...

Wait, I'm confused, I thought you said earlier that for the big test
the tables only had a few rows.  I'd expect it to be quite a bit
bigger if you're inserting into a table once a second for any period.

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

23 августа 2008 г., 22:48:13

What I described in the last mail is what I try to do.
But I said earlier that I only do about 3-4 inserts / seconds because of my problem.
So it's about one insert each 30 minutes for each table.

Re: Large number of tables slow insert

От

dforum

Дата:

24 августа 2008 г., 11:42:20

hello to all,

I've a question regarding the folowing comments.

How to estimate vacuum aggressiveness ?

It's for me very deficulte to setup the autovaccum setting correctly. It
seems for me that it is not enough aggressive, but when I change the
settings the autovacuum process is almost always running.

So how to setup it, for around 40000 insert, update, delete per 5 minutes

regards

david
Scott Marlowe a écrit :
> On Sat, Aug 23, 2008 at 6:59 PM, Loic Petit <tls.wydd@free.fr> wrote:
>
>> 1 table contains about 5 indexes : timestamp, one for each sensor type - 3,
>> and one for packet counting (measures packet dropping)
>> (I reckon that this is quite heavy, but a least the timestamp and the values
>> are really usefull)
>>
>
> But what's the update rate on these indexes and tables?  I'm wondering
> if you're not vacuuming aggresively enough to keep up with bursty
> update patterns
>
>

Re: Large number of tables slow insert

От

DiezelMax

Дата:

24 августа 2008 г., 11:42:23

Does all INSERTs are made by one application? Maybe PREPARED STATEMENTS
will help you? What about optimization on application level?

Re: Large number of tables slow insert

От

"Scott Carey"

Дата:

24 августа 2008 г., 17:59:28

Just a guess, but have you tried increasing max_fsm_relations ? This probably shouldn't matter but you'll want this to be larger than the sum of all your tables and indexes and it doesn't take that much memory to increase it.

My next suggestion would be to log in as the superuser and 'vacuum analyze' the system tables. Perhaps it is simply the system table access that has gotten inefficient with this many tables / indexes.

On Sat, Aug 23, 2008 at 6:48 PM, Loic Petit <tls.wydd@free.fr> wrote:

What I described in the last mail is what I try to do.
But I said earlier that I only do about 3-4 inserts / seconds because of my problem.
So it's about one insert each 30 minutes for each table.

Re: Large number of tables slow insert

От

"H. Hall"

Дата:

24 августа 2008 г., 18:02:14

Loic Petit wrote:
> Hi,
>
> I use Postgresql 8.3.1-1 to store a lot of data coming from a large
> amount of sensors. In order to have good performances on querying by
> timestamp on each sensor, I partitionned my measures table for each
> sensor. Thus I create a lot of tables.
> I simulated a large sensor network with 3000 nodes so I have ~3000
> tables. And it appears that each insert (in separate transactions) in
> the database takes about 300ms (3-4 insert per second) in tables where
> there is just few tuples (< 10). I think you can understand that it's
> not efficient at all because I need to treat a lot of inserts.
Can you tell us what kind of application this is? It sounds like a
control systems application where you will write the current values of
the sensors with each scan of a PLC.  If so, is that a good idea?  Also
is 3,000 sensors realistic? That would be a lot of sensors for one
control system.
>
> Do you have any idea why it is that slow ? and how can have good insert ?
How often do you write data for a sensor?
Once write per sensor per second = 3,000 writes per second
That would be an insert plus updates to each of your 6 indexes every
0.33 ms .

Is that a good idea?  Is there a better strategy? What are you measuring
with the instruments e.g. is this a process plant or manufacturing
facility? What will people do with this data?
>
> My test machine: Intel p4 2.4ghz, 512mb ram, Ubuntu Server (ext3)
> iostat tells me that I do : 0.5MB/s reading and ~6-7MB/s writing while
> constant insert
>
> Here is the DDL of the measures tables:
> -------------------------------------------------------
> CREATE TABLE measures_0
> (
>  "timestamp" timestamp without time zone,
>  storedtime timestamp with time zone,
>  count smallint,
>  "value" smallint[]
> )
> WITH (OIDS=FALSE);
> CREATE INDEX measures_0_1_idx
>  ON measures_0
>  USING btree
>  ((value[1]));
>
> -- Index: measures_0_2_idx
> CREATE INDEX measures_0_2_idx
>  ON measures_0
>  USING btree
>  ((value[2]));
>
> -- Index: measures_0_3_idx
> CREATE INDEX measures_0_3_idx
>  ON measures_0
>  USING btree
>  ((value[3]));
>
> -- Index: measures_0_count_idx
> CREATE INDEX measures_0_count_idx
>  ON measures_0
>  USING btree
>  (count);
>
> -- Index: measures_0_timestamp_idx
> CREATE INDEX measures_0_timestamp_idx
>  ON measures_0
>  USING btree
>  ("timestamp");
>
> -- Index: measures_0_value_idx
> CREATE INDEX measures_0_value_idx
>  ON measures_0
>  USING btree
>  (value);
> -------------------------------------------------------
>
> Regards
>
> Loïc Petit
>
> --------------------------------
>
>


--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com

Re: Large number of tables slow insert

От

Peter Schuller

Дата:

24 августа 2008 г., 18:30:01

> I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount
> of sensors. In order to have good performances on querying by timestamp on
> each sensor, I partitionned my measures table for each sensor. Thus I create
> a lot of tables.
> I simulated a large sensor network with 3000 nodes so I have ~3000 tables.
> And it appears that each insert (in separate transactions) in the database
> takes about 300ms (3-4 insert per second) in tables where there is just few
> tuples (< 10). I think you can understand that it's not efficient at all
> because I need to treat a lot of inserts.
>
> Do you have any idea why it is that slow ? and how can have good insert ?
>
> My test machine: Intel p4 2.4ghz, 512mb ram, Ubuntu Server (ext3)
> iostat tells me that I do : 0.5MB/s reading and ~6-7MB/s writing while
> constant insert

Have you checked what you are bottlenecking on - CPU or disk? Try
iostat/top/etc during the inserts. Also check actual disk utilizatio
(iostat -x on linux/freebsd; varies on others) to see what percentage
of time the disk/storage device is busy.

You say you have 3-4 inserts/second causing 6-7 MB/s writing. That
suggests to me the inserts are fairly large. Are they in the MB range,
which would account for the I/O?

My suspicion is that you are bottlenecking on CPU, since in my
experience there is definitely something surprisingly slow about
encoding/decoding data at the protocol level or somewhere else that is
involved in backend/client communication. I.e, I suspect your client
and/or server is spending a lot of CPU time with things not directly
related to the actual table inserts. If so, various suggested schemes
w.r.t. indexing, table bloat etc won't help at all.

In short, 6-7 MB/second would be fairly consistent with INSERT/COPY
operations being CPU bound on a modern CPU, in my experience. It may
be that this is entirely untrue in your case, but it sounds like a
reasonable thing to at least consider.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org

Вложения

msg-22763-33436.dat

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

24 августа 2008 г., 19:30:55

Quite a lot of answers !

> Does all INSERTs are made by one application? Maybe PREPARED STATEMENTS will help you? What about optimization on application level?
Yes it's on only one application (through JDBC), optimizations (grouped transaction, prepared statement) will be done but that our very first test in hard condition which scared us all :p.

> Can you tell us what kind of application this is? It sounds like a control systems application where you will write the current values of the sensors with each scan of a PLC. If so, is that a good idea? Also is 3,000 sensors realistic? That would be a lot of sensors for one control system.
Our research project is trying to manage large scale sensor network deployments. 3.000 is quite a huge deployment but it can be realistic for huge aggricultural deployment for example.

> That would be an insert plus updates to each of your 6 indexes every 0.33 ms. Is that a good idea? Is there a better strategy? What are you measuring with the instruments e.g. is this a process plant or manufacturing facility? What will people do with this data?
I try to suppress the indexes the more I can. Actually I only really need the index on timestamp to see for example the last readings, and to search for a historical data by period, the others (values) are more for "when this sensor was over 45ºC" for instance but it can be without indexes (will be slow but less heavy at insert time). I get the data from differents telosb motes that gathers temperature / humidity and light.

> Have you checked what you are bottlenecking on - CPU or disk? Try iostat/top/etc during the inserts. Also check actual disk utilizatio (iostat -x on linux/freebsd; varies on others) to see what percentage of time the disk/storage device is busy.
I saw the results of iostat and top, the postgres process was at 70% cpu . Yes I know that my test machine is not brand new but I have to find a good solution with this.

Ok I just ran some tests. It seems that I spammed too much right after the creation of the tables, thus the vacuum analyse could not be ran. I have better results now :

Average of writing 10 rows in each table
ON 1000 TABLES
    Without indexes at all : ~1.5s
    With only the index on timestamp : ~2.5s
    With all indexes : ~30s

ON 3000 TABLES
    Without indexes at all : ~8s
    With only the index on timestamp : ~45s
    With all indexes : ~3min

I don't know why but the difference is quite huge with indexes ! When I did my vacuum the system told me about the "max_fsm_relations" (1000). Do you think it will change something (as Scott said). I didn't have time to run tests with vacuum analyze on system table see you tomorow for other news...

Re: Large number of tables slow insert

От

"Scott Carey"

Дата:

24 августа 2008 г., 22:15:55

I don't know if the max_fsm_relations issue will solve your problem or not. I do know that you definitely want to increase it to a number larger than the sum of all your tables and indexes -- preferably with room to grow. Additionally the max_fsm_pages value will likely need to be increased as your data size grows.

I work with about 9000 tables at the moment (growing each day) and do not see your issue. I do not have indexes on most of my tables, and max_fsm_relations is set to 30000.

Although this will increase the number of tables even more-- you may want to consider partitioning your tables by time: day or week or month.
This way, you may not even need an index on the date, as it will only scan tables over the date range specified ( NOTE -- this is not true if you use prepared statements -- prepared statements + partitioned tables = performance disaster).
In addition, this may allow you to add the indexes on the partitioned table at a later date. For example:

Partitions by week -- the current week's table has no indexes and is thus fast to insert. But once it becomes last week's table and you are only inserting into a new table, the old one can have indexes added to it -- it is now mostly a read-only table. In this way, full scans will only be needed for the current week's table, which will most of the time be smaller than the others and more likely be cached in memory as well. You may want to partition by day or month instead.
You may want to combine several sensors into one table, so that you can partition by day or even hour. It all depends on how you expect to access the data later and how much you can afford to deal with managing all those tables -- postgres only does some of the partitioning work for you and you have to be very careful with your queries. There are some query optimizer oddities with partitioned tables one has to be aware of.

On Sun, Aug 24, 2008 at 3:30 PM, Loic Petit <tls.wydd@free.fr> wrote:

Quite a lot of answers !

> Does all INSERTs are made by one application? Maybe PREPARED STATEMENTS will help you? What about optimization on application level?
Yes it's on only one application (through JDBC), optimizations (grouped transaction, prepared statement) will be done but that our very first test in hard condition which scared us all :p.

> Can you tell us what kind of application this is? It sounds like a control systems application where you will write the current values of the sensors with each scan of a PLC. If so, is that a good idea? Also is 3,000 sensors realistic? That would be a lot of sensors for one control system.
Our research project is trying to manage large scale sensor network deployments. 3.000 is quite a huge deployment but it can be realistic for huge aggricultural deployment for example.

> That would be an insert plus updates to each of your 6 indexes every 0.33 ms. Is that a good idea? Is there a better strategy? What are you measuring with the instruments e.g. is this a process plant or manufacturing facility? What will people do with this data?
I try to suppress the indexes the more I can. Actually I only really need the index on timestamp to see for example the last readings, and to search for a historical data by period, the others (values) are more for "when this sensor was over 45ºC" for instance but it can be without indexes (will be slow but less heavy at insert time). I get the data from differents telosb motes that gathers temperature / humidity and light.

> Have you checked what you are bottlenecking on - CPU or disk? Try iostat/top/etc during the inserts. Also check actual disk utilizatio (iostat -x on linux/freebsd; varies on others) to see what percentage of time the disk/storage device is busy.
I saw the results of iostat and top, the postgres process was at 70% cpu . Yes I know that my test machine is not brand new but I have to find a good solution with this.

Ok I just ran some tests. It seems that I spammed too much right after the creation of the tables, thus the vacuum analyse could not be ran. I have better results now :

Average of writing 10 rows in each table
ON 1000 TABLES
    Without indexes at all : ~1.5s
    With only the index on timestamp : ~2.5s
    With all indexes : ~30s

ON 3000 TABLES
    Without indexes at all : ~8s
    With only the index on timestamp : ~45s
    With all indexes : ~3min

I don't know why but the difference is quite huge with indexes ! When I did my vacuum the system told me about the "max_fsm_relations" (1000). Do you think it will change something (as Scott said). I didn't have time to run tests with vacuum analyze on system table see you tomorow for other news...

Re: Large number of tables slow insert

От

"Loic Petit"

Дата:

24 августа 2008 г., 23:22:51

That's not a bad idea, at least for historical data.
But actually one of the most common thing in sensor network monitoring is last readings monitoring.
With indexes what I can do is : SELECT * FROM measures_xx ORDER BY timestamp DESC LIMIT 1 => And I got the very last reading in a blink (one page reading only).
It shall be complicated that way... it can only be done by a mass update on a table each time I receive a packet...
Also it's very important to have last 24h readings aggregate by hour or minutes (to plot a graph).
So I can't go that way.... I think I must keep the timestamp index where it is but I should probably get rid of the others.

Thank you again for your help people

Regards

Loic Petit

Re: Large number of tables slow insert

От

Matthew Wakeling

Дата:

26 августа 2008 г., 09:50:59

On Sat, 23 Aug 2008, Loic Petit wrote:
> I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount of sensors. In order to have good
> performances on querying by timestamp on each sensor, I partitionned my measures table for each sensor. Thus I create
> a lot of tables.

As far as I can see, you are having performance problems as a direct
result of this design decision, so it may be wise to reconsider. If you
have an index on both the sensor identifier and the timestamp, it should
perform reasonably well. It would scale a lot better with thousands of
sensors too.

Matthew

--
And why do I do it that way? Because I wish to remain sane. Um, actually,
maybe I should just say I don't want to be any worse than I already am.
         - Computer Science Lecturer

Re: Large number of tables slow insert

От

"Scott Marlowe"

Дата:

26 августа 2008 г., 12:30:16

On Tue, Aug 26, 2008 at 6:50 AM, Matthew Wakeling <matthew@flymine.org> wrote:
> On Sat, 23 Aug 2008, Loic Petit wrote:
>>
>> I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount
>> of sensors. In order to have good
>> performances on querying by timestamp on each sensor, I partitionned my
>> measures table for each sensor. Thus I create
>> a lot of tables.
>
> As far as I can see, you are having performance problems as a direct result
> of this design decision, so it may be wise to reconsider. If you have an
> index on both the sensor identifier and the timestamp, it should perform
> reasonably well. It would scale a lot better with thousands of sensors too.

Properly partitioned, I'd expect one big table to outperform 3,000
sparsely populated tables.

Чтобы сделать работу с сайтом удобнее, мы используем cookie и аналитический сервис «Яндекс.Метрика». Продолжая пользоваться сайтом, вы соглашаетесь с их использованием.