Обсуждение: how to handle a big table for data log

Поиск
Список
Период
Сортировка

how to handle a big table for data log

От
kuopo
Дата:
Hi,

I have a situation to handle a log table which would accumulate a
large amount of logs. This table only involves insert and query
operations. To limit the table size, I tried to split this table by
date. However, the number of the logs is still large (46 million
records per day). To further limit its size, I tried to split this log
table by log type. However, this action does not improve the
performance. It is much slower than the big table solution. I guess
this is because I need to pay more cost on the auto-vacuum/analyze for
all split tables.

Can anyone comment on this situation? Thanks in advance.


kuopo.

Re: how to handle a big table for data log

От
"Jorge Montero"
Дата:
Large tables, by themselves, are not necessarily a problem. The problem is what you might be trying to do with them. Depending on the operations you are trying to do, partitioning the table might help performance or make it worse.
 
What kind of queries are you running? How many days of history are you keeping? Could you post an explain analyze output of a query that is being problematic?
Given the amount of data you hint about, your server configuration, and custom statistic targets for the big tables in question would be useful.

>>> kuopo <spkuo@cs.nctu.edu.tw> 7/19/2010 1:27 AM >>>
Hi,

I have a situation to handle a log table which would accumulate a
large amount of logs. This table only involves insert and query
operations. To limit the table size, I tried to split this table by
date. However, the number of the logs is still large (46 million
records per day). To further limit its size, I tried to split this log
table by log type. However, this action does not improve the
performance. It is much slower than the big table solution. I guess
this is because I need to pay more cost on the auto-vacuum/analyze for
all split tables.

Can anyone comment on this situation? Thanks in advance.


kuopo.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: how to handle a big table for data log

От
kuopo
Дата:
Let me make my problem clearer. Here is a requirement to log data from a set of objects consistently. For example, the object maybe a mobile phone and it will report its location every 30s. To record its historical trace, I create a table like
CREATE TABLE log_table
(
  id integer NOT NULL,
 data_type integer NOT NULL,
 data_value double precision,
 ts timestamp with time zone NOT NULL,
 CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
)
;
In my location log example, the field data_type could be longitude or latitude.

I create a primary key (id, data_type, ts) to make my queries more efficient. The major type of queries would ask the latest data_value of a data_type by given id and timestamp. For this kind of query, I make the following SQL statement
SELECT * FROM log_table WHERE id=[given id] and data_type='longitude' and (ts = (SELECT max(ts) FROM log_table WHERE id=[given id]and data_type='longitude' and ts<=[given timestamp]));
According to my evaluation, its performance is acceptable.

However, I concern more about the performance of insert operation. As I have mentioned, the log_table is growing so I decide to partition it. Currently, I partition it by date and only keep it 60 days. This partition is helpful. But when I partition it by data_type (in my case, the number of data_type is limited, say 10), the performance of insert operation will be degraded. I guess this is caused by multiple vacuum/analyze on these partitioned data_type log tables. However, if I put all data_type logs together, I can expect that the performance of insert operation will also have degradation if I want to expand the system to support more mobile phones or more data_type.

This is my current situation. Please give me some hints to improve the performance (especially for the insert part).


kuopo.


On Mon, Jul 19, 2010 at 11:37 PM, Jorge Montero <jorge_montero@homedecorators.com> wrote:
> Large tables, by themselves, are not necessarily a problem. The problem is
> what you might be trying to do with them. Depending on the operations you
> are trying to do, partitioning the table might help performance or make it
> worse.
>  
> What kind of queries are you running? How many days of history are you
> keeping? Could you post an explain analyze output of a query that is being
> problematic?
> Given the amount of data you hint about, your server configuration, and
> custom statistic targets for the big tables in question would be useful.
>
>>>> kuopo <spkuo@cs.nctu.edu.tw> 7/19/2010 1:27 AM >>>
> Hi,
>
> I have a situation to handle a log table which would accumulate a
> large amount of logs. This table only involves insert and query
> operations. To limit the table size, I tried to split this table by
> date. However, the number of the logs is still large (46 million
> records per day). To further limit its size, I tried to split this log
> table by log type. However, this action does not improve the
> performance. It is much slower than the big table solution. I guess
> this is because I need to pay more cost on the auto-vacuum/analyze for
> all split tables.
>
> Can anyone comment on this situation? Thanks in advance.
>
>
> kuopo.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: how to handle a big table for data log

От
Greg Spiegelberg
Дата:
On Tue, Jul 20, 2010 at 9:51 PM, kuopo <spkuo@cs.nctu.edu.tw> wrote:
Let me make my problem clearer. Here is a requirement to log data from a set of objects consistently. For example, the object maybe a mobile phone and it will report its location every 30s. To record its historical trace, I create a table like
CREATE TABLE log_table
(
  id integer NOT NULL,
 data_type integer NOT NULL,
 data_value double precision,
 ts timestamp with time zone NOT NULL,
 CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
)
;
In my location log example, the field data_type could be longitude or latitude.


I witnessed GridSQL in action many moons ago that managed a massive database log table.  From memory, the configuration was 4 database servers with a cumulative 500M+ records and queries were running under 5ms.  May be worth a look.

http://www.enterprisedb.com/community/projects/gridsql.do

Greg

Re: how to handle a big table for data log

От
Josh Berkus
Дата:
On 7/20/10 8:51 PM, kuopo wrote:
> Let me make my problem clearer. Here is a requirement to log data from a
> set of objects consistently. For example, the object maybe a mobile
> phone and it will report its location every 30s. To record its
> historical trace, I create a table like
> /CREATE TABLE log_table
> (
>   id integer NOT NULL,
>  data_type integer NOT NULL,
>  data_value double precision,
>  ts timestamp with time zone NOT NULL,
>  CONSTRAINT log_table_pkey PRIMARY KEY (id, data_type, ts)
> )/;
> In my location log example, the field data_type could be longitude or
> latitude.

If what you have is longitude and latitude, why this brain-dead EAV
table structure?  You're making the table twice as large and half as
useful for no particular reason.

Use the "point" datatype instead of anonymizing the data.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com