Обсуждение: indexes and big tables

Поиск
Список
Период
Сортировка

indexes and big tables

От
Robert Vojta
Дата:
Hi, I'm fighting with problem with indexes. I read documentation about
performance tips, about internal logic functions which are making decision
if to use or not use indexes, etc. and I'm still failed. I'm not SQL
guru and I don't know what to do now. My tables and indexes looks like
...

CREATE TABLE counters ( line         VARCHAR(64) NOT NULL, counterfrom  INT8 NOT NULL, counterto    INT8 NOT NULL,
counterstampTIMESTAMP NOT NULL, stamp        TIMESTAMP NOT NULL DEFAULT 'now'); 

CREATE INDEX counters_line_stamp ON counters (line, counterstamp);
 I have several other tables too with names static_counters_(hour|day|month).
Why? It's only for fast sumarization, so ...
 in counters - 5min counters for last hour, rows are moved into static_counters               after hour sumarization
incounters_hour table 
 in counters_hour - last day hour sums, rows are moved into static_counters_                    hour table after day
sumarizationin counters_day 
 in counters_day - last month days sums, rows are moved into static_counters_                   days table after month
sumarizationin counters_month 
 I'm inserting about 300 rows into counters table in 5min period (fetching
info from routers). Sumarization is doing everyhour with some internal logic
and decision are made by hour info. There are about 3 milions rows in
static_counters table and they are only for last month. It means, that when
next month begins, we moved this old data into tables counters_YYYY_MM, etc.
I'm running VACUUM ANALYZE two times a day. Everything works fine, but I'm drawing graphs from static_counters and
counterstables. For first graph I 
need about 33 hour old data and for second graph I need about a week old data.
I know, now there is a more data than I need in this table, but if I create a
table with needed values only, there is no indexes used too. Select for graphs
looks like ...

netacc=> EXPLAIN (SELECT SUM(counterfrom) AS from, SUM(counterto) AS to,
floor((985098900 - date_part('epoch', counterstamp)) / 300) AS sequence
FROM counters WHERE line='absolonll' AND date_part('epoch', counterstamp)
> 984978900 GROUP BY sequence, line) UNION (SELECT SUM(counterfrom) AS
from, SUM(counterto) AS to, floor((985098900 - date_part('epoch',
counterstamp)) / 300) AS sequence FROM static_counters WHERE
line='absolonll' AND date_part('epoch', counterstamp) > 984978900 GROUP BY
sequence, line); NOTICE:  QUERY PLAN:
Unique (cost=67518.73..67525.44 rows=89 width=36) -> Sort (cost=67518.73..67518.73 rows=895 width=36)       -> Append
(cost=1860.01..67474.87rows=895 width=36)               -> Aggregate (cost=1860.01..1870.90 rows=109 width=36)
          -> Group (cost=1860.01..1865.46 rows=1089 width=36)                           -> Sort (cost=1860.01..1860.01
rows=1089
width=36)                                 -> Seq Scan on counters
(cost=0.00..1805.10 rows=1089 width=36)               -> Aggregate (cost=65525.38..65603.97 rows=786 width=36)
          -> Group (cost=65525.38..65564.67 rows=7858 
width=36)                           -> Sort (cost=65525.38..65525.38 rows=7858
width=36)                                 -> Seq Scan on static_counters
(cost=0.00..65016.95 rows=7858 width=36)
EXPLAIN
netacc=>
 ... Indexes are used when I have a few rows in table only :( Result of this
select is about ~105 rows in every way. Now, I don't know what to do, because
drawing of this two graphs is about 30 seconds and it's too much. Please, how may I change my solution for fast graphs
drawings?May I 
split this table? Make table for each line? Upgrade HW?
 I'm running PostgreSQL 7.0.3 now on RedHat 6.2 linux box. HW of this box
is Duron 700MHz, 384MB RAM, SCSI disk. May I upgrade PostgreSQL? May I
upgrade RAM, CPU? I don't know what to do now and any help will be very
appreciated.
 Thank you very much,
king regards, Robert Vojta

--   _ |-|  __      Robert Vojta <vojta-at-ipex.cz>          -= Oo.oO =- |=| [Ll]     IPEX, s.r.o. "^" ====`o

Re: indexes and big tables

От
Robert Vojta
Дата:
> The index is only used for the line= part of the where clause
> with your query. With many rows the "line=" is not selective enough
> to justify the index.

Hi, I tried you suggestion about 'AND counterstamp > '2001-07-26 00:00:00.0' and
it works and index is used :) But, whole query run for 10 sec (was 30s) and
it's too much, I need about 1 sec. May I optimize my tables, queries or may I
upgrade something from my HW (duron 700, 384MB RAM, slow scsi disk :( )? I
do not want solution, some hint in which part may I focus and I will go through
documentation again, thank you very much.

Best regards, Robert

--   _ |-|  __      Robert Vojta <vojta-at-ipex.cz>          -= Oo.oO =- |=| [Ll]     IPEX, s.r.o. "^" ====`o

Re: indexes and big tables

От
Robert Vojta
Дата:
> The index is only used for the line= part of the where clause
> with your query. With many rows the "line=" is not selective enough
> to justify the index.
 I tried move only needed data into new table and change query into ...

netacc=> EXPLAIN SELECT counterfrom AS from, counterto AS to,
floor((980000000 - date_part('epoch', counterstamp)) / 300) AS sequence
FROM graphs_5m WHERE line='absolonll'; NOTICE:  QUERY PLAN:
Index Scan using graphs_5m_idx on graphs_5m (cost=0.00..58.38 rows=29
width=24)
EXPLAIN
 and query runs for 3-5 seconds. Any idea how to make it faster? I think,
that now it's ready to HW upgrade for faster result ...

Best regards, Robert

--   _ |-|  __      Robert Vojta <vojta-at-ipex.cz>          -= Oo.oO =- |=| [Ll]     IPEX, s.r.o. "^" ====`o

Re: indexes and big tables

От
Stephan Szabo
Дата:
On Fri, 27 Jul 2001, Robert Vojta wrote:
> netacc=> EXPLAIN (SELECT SUM(counterfrom) AS from, SUM(counterto) AS to,
> floor((985098900 - date_part('epoch', counterstamp)) / 300) AS sequence
> FROM counters WHERE line='absolonll' AND date_part('epoch', counterstamp)
> > 984978900 GROUP BY sequence, line) UNION (SELECT SUM(counterfrom) AS
> from, SUM(counterto) AS to, floor((985098900 - date_part('epoch',
> counterstamp)) / 300) AS sequence FROM static_counters WHERE
> line='absolonll' AND date_part('epoch', counterstamp) > 984978900 GROUP BY
> sequence, line); NOTICE:  QUERY PLAN:

Is there any possibility of overlapping rows between the parts of the
union?  If not, I'd suggest union all, since that might get rid of the top
level unique and sort steps (probably not a huge gain, but might help).

> Unique (cost=67518.73..67525.44 rows=89 width=36)
>   -> Sort (cost=67518.73..67518.73 rows=895 width=36)
>         -> Append (cost=1860.01..67474.87 rows=895 width=36)
>                 -> Aggregate (cost=1860.01..1870.90 rows=109 width=36)
>                       -> Group (cost=1860.01..1865.46 rows=1089 width=36)
>                             -> Sort (cost=1860.01..1860.01 rows=1089
> width=36)
>                                   -> Seq Scan on counters
> (cost=0.00..1805.10 rows=1089 width=36)
>                 -> Aggregate (cost=65525.38..65603.97 rows=786 width=36)
>                       -> Group (cost=65525.38..65564.67 rows=7858
> width=36)
>                             -> Sort (cost=65525.38..65525.38 rows=7858
> width=36)
>                                   -> Seq Scan on static_counters
> (cost=0.00..65016.95 rows=7858 width=36)
>  
> EXPLAIN
> netacc=>



Re: indexes and big tables

От
Robert Vojta
Дата:
> Is there any possibility of overlapping rows between the parts of the
> union?  If not, I'd suggest union all, since that might get rid of the top
> level unique and sort steps (probably not a huge gain, but might help).

Hi, thanx for the response, there is a little possibility of overlapping rows.
But, I removed date_part function and it use index now. It's too slow, so
I move all needed data into graphs_5m table and it works fine now, previous
time for graphs drawing was ~30-60seconds, today it's ~3-5 seconds. Thanx.

Best regards, Robert

--   _ |-|  __      Robert Vojta <vojta-at-ipex.cz>          -= Oo.oO =- |=| [Ll]     IPEX, s.r.o. "^" ====`o