Обсуждение: Help on indexing timestamps
Hi all, i have a little problem on indexing a table which contains about 4 millions of traffic-data. My problem is, that a want to select all data from a specific month from a specific ip and this select should use the index. I use the following select: db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; NOTICE: QUERY PLAN: Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) -> Index Scan using idx_ip_time_stampon tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89 rows=5232 loops=1) Total runtime: 1620.94 msec But it takes a long time to select the traffic for all Ips. Is there a way to select these data with using the index correctly ? Thanks in advance BTW: Table-schema as follows db_test=# \d tbl_traffic Table "tbl_traffic" Column | Type | Modifiers ------------+--------------------------+-----------time_stamp | timestamp with time zone | not nullip | inet | local_up | bigint | not nulllocal_down | bigint | not nullinet_up | bigint | not nullinet_down | bigint | not null Indexes: idx_ip_time_stamp db_test=# \d idx_ip_time_stamp Index "idx_ip_time_stamp" Column | Type ------------+--------------------------ip | inettime_stamp | timestamp with time zone
On Fri, 7 Mar 2003, Andre Schubert wrote: > Hi all, > > i have a little problem on indexing a table which contains > about 4 millions of traffic-data. > My problem is, that a want to select all data from > a specific month from a specific ip and this select should use the index. > I use the following select: Did you try to use BETWEEN ?? E.g. ... and time_stamp between '2003-01-01 00:00:00'::timestamp and '2003-02-01 00:00:00'::timestamp > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > NOTICE: QUERY PLAN: > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > Total runtime: 1620.94 msec > > But it takes a long time to select the traffic for all Ips. > Is there a way to select these data with using the index correctly ? > > Thanks in advance > > BTW: Table-schema as follows > > db_test=# \d tbl_traffic > Table "tbl_traffic" > Column | Type | Modifiers > ------------+--------------------------+----------- > time_stamp | timestamp with time zone | not null > ip | inet | > local_up | bigint | not null > local_down | bigint | not null > inet_up | bigint | not null > inet_down | bigint | not null > Indexes: idx_ip_time_stamp > > db_test=# \d idx_ip_time_stamp > Index "idx_ip_time_stamp" > Column | Type > ------------+-------------------------- > ip | inet > time_stamp | timestamp with time zone > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-210-8981112 fax: +30-210-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > Hi all, > > > > i have a little problem on indexing a table which contains > > about 4 millions of traffic-data. > > My problem is, that a want to select all data from > > a specific month from a specific ip and this select should use the index. > > I use the following select: > > Did you try to use BETWEEN ?? > E.g. > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > '2003-02-01 00:00:00'::timestamp > Yes and it works if i write the dates by hand, every new month. But the query is executed automatically and i dont want to write in the dates before the query is executed. Maybe the the start and enddate should also be alculated with sql, because i want to create a view from this statement and execute it every month. Or did i miss something. Regards, as > > > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > > Total runtime: 1620.94 msec > > > > But it takes a long time to select the traffic for all Ips. > > Is there a way to select these data with using the index correctly ? > > > > Thanks in advance > > > > BTW: Table-schema as follows > > > > db_test=# \d tbl_traffic > > Table "tbl_traffic" > > Column | Type | Modifiers > > ------------+--------------------------+----------- > > time_stamp | timestamp with time zone | not null > > ip | inet | > > local_up | bigint | not null > > local_down | bigint | not null > > inet_up | bigint | not null > > inet_down | bigint | not null > > Indexes: idx_ip_time_stamp > > > > db_test=# \d idx_ip_time_stamp > > Index "idx_ip_time_stamp" > > Column | Type > > ------------+-------------------------- > > ip | inet > > time_stamp | timestamp with time zone > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > -- > ================================================================== > Achilleus Mantzios > S/W Engineer > IT dept > Dynacom Tankers Mngmt > Nikis 4, Glyfada > Athens 16610 > Greece > tel: +30-210-8981112 > fax: +30-210-8981877 > email: achill@matrix.gatewaynet.com > mantzios@softlab.ece.ntua.gr > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Fri, 7 Mar 2003, Andre Schubert wrote: > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > Hi all, > > > > > > i have a little problem on indexing a table which contains > > > about 4 millions of traffic-data. > > > My problem is, that a want to select all data from > > > a specific month from a specific ip and this select should use the index. > > > I use the following select: > > > > Did you try to use BETWEEN ?? > > E.g. > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > '2003-02-01 00:00:00'::timestamp > > > > Yes and it works if i write the dates by hand, every new month. > But the query is executed automatically and i dont want > to write in the dates before the query is executed. Maybe the > the start and enddate should also be alculated with sql, > because i want to create a view from this statement and execute it every month. > Or did i miss something. You could have an index on the whole date_trunc('month',tbl_traffic.time_stamp),ip How does it perform? > > Regards, as > > > > > > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > > > NOTICE: QUERY PLAN: > > > > > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > > > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > > > Total runtime: 1620.94 msec > > > > > > But it takes a long time to select the traffic for all Ips. > > > Is there a way to select these data with using the index correctly ? > > > > > > Thanks in advance > > > > > > BTW: Table-schema as follows > > > > > > db_test=# \d tbl_traffic > > > Table "tbl_traffic" > > > Column | Type | Modifiers > > > ------------+--------------------------+----------- > > > time_stamp | timestamp with time zone | not null > > > ip | inet | > > > local_up | bigint | not null > > > local_down | bigint | not null > > > inet_up | bigint | not null > > > inet_down | bigint | not null > > > Indexes: idx_ip_time_stamp > > > > > > db_test=# \d idx_ip_time_stamp > > > Index "idx_ip_time_stamp" > > > Column | Type > > > ------------+-------------------------- > > > ip | inet > > > time_stamp | timestamp with time zone > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 3: if posting/reading through Usenet, please send an appropriate > > > subscribe-nomail command to majordomo@postgresql.org so that your > > > message can get through to the mailing list cleanly > > > > > > > -- > > ================================================================== > > Achilleus Mantzios > > S/W Engineer > > IT dept > > Dynacom Tankers Mngmt > > Nikis 4, Glyfada > > Athens 16610 > > Greece > > tel: +30-210-8981112 > > fax: +30-210-8981877 > > email: achill@matrix.gatewaynet.com > > mantzios@softlab.ece.ntua.gr > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-210-8981112 fax: +30-210-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2) Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > Hi all, > > > > > > > > i have a little problem on indexing a table which contains > > > > about 4 millions of traffic-data. > > > > My problem is, that a want to select all data from > > > > a specific month from a specific ip and this select should use the index. > > > > I use the following select: > > > > > > Did you try to use BETWEEN ?? > > > E.g. > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > > '2003-02-01 00:00:00'::timestamp > > > > > > > Yes and it works if i write the dates by hand, every new month. > > But the query is executed automatically and i dont want > > to write in the dates before the query is executed. Maybe the > > the start and enddate should also be alculated with sql, > > because i want to create a view from this statement and execute it every month. > > Or did i miss something. > > You could have an index on the > whole > date_trunc('month',tbl_traffic.time_stamp),ip > > How does it perform? > I'am not sure how to create such an index... First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as 'select date_trunc(''month'',$1)'language 'sql' with (iscachable); Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) ); Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp) =trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56'; NOTICE: QUERY PLAN: Aggregate (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) -> Index Scan using idx_teston tbl_traffic (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346 loops=1) Total runtime: 2278.62 msec Maybe the problem is, that the index is created without ip as the second column.... [schnipp]
On Fri, 7 Mar 2003, Andre Schubert wrote: > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2) > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > Hi all, > > > > > > > > > > i have a little problem on indexing a table which contains > > > > > about 4 millions of traffic-data. > > > > > My problem is, that a want to select all data from > > > > > a specific month from a specific ip and this select should use the index. > > > > > I use the following select: > > > > > > > > Did you try to use BETWEEN ?? > > > > E.g. > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > > > '2003-02-01 00:00:00'::timestamp > > > > > > > > > > Yes and it works if i write the dates by hand, every new month. > > > But the query is executed automatically and i dont want > > > to write in the dates before the query is executed. Maybe the > > > the start and enddate should also be alculated with sql, > > > because i want to create a view from this statement and execute it every month. > > > Or did i miss something. > > > > You could have an index on the > > whole > > date_trunc('month',tbl_traffic.time_stamp),ip > > > > How does it perform? > > > > I'am not sure how to create such an index... > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as > 'select date_trunc(''month'',$1)' language 'sql' with (iscachable); > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) ); > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56'; > NOTICE: QUERY PLAN: > > Aggregate (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) > -> Index Scan using idx_test on tbl_traffic (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346loops=1) > Total runtime: 2278.62 msec > > Maybe the problem is, that the index is created without ip as the second column.... Sorry, it sliped my mind that we cannot have compound indexes on functions. :( Anyway. Did the explicit BETWEEN gave you satisfactory performance? > > [schnipp] > -- ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-210-8981112 fax: +30-210-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2) Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2) > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > i have a little problem on indexing a table which contains > > > > > > about 4 millions of traffic-data. > > > > > > My problem is, that a want to select all data from > > > > > > a specific month from a specific ip and this select should use the index. > > > > > > I use the following select: > > > > > > > > > > Did you try to use BETWEEN ?? > > > > > E.g. > > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > > > > '2003-02-01 00:00:00'::timestamp > > > > > > > > > > > > > Yes and it works if i write the dates by hand, every new month. > > > > But the query is executed automatically and i dont want > > > > to write in the dates before the query is executed. Maybe the > > > > the start and enddate should also be alculated with sql, > > > > because i want to create a view from this statement and execute it every month. > > > > Or did i miss something. > > > > > > You could have an index on the > > > whole > > > date_trunc('month',tbl_traffic.time_stamp),ip > > > > > > How does it perform? > > > > > > > I'am not sure how to create such an index... > > > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as > > 'select date_trunc(''month'',$1)' language 'sql' with (iscachable); > > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) ); > > > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56'; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) > > -> Index Scan using idx_test on tbl_traffic (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346loops=1) > > Total runtime: 2278.62 msec > > > > Maybe the problem is, that the index is created without ip as the second column.... > > Sorry, it sliped my mind that we cannot have compound indexes on > functions. :( > > Anyway. > Did the explicit BETWEEN gave you satisfactory performance? > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze. I think these is a good performance. But if i use now() instead of manually typed dates the query take about 1400ms :( I thought the somebody posted to this list, that now() is a function that is not cached, and thatswhy does not work pretty well with indexes. I created a cached function cached_now() which returns now() but is declared with "isCacheable". If i use cached_now() instead of now the query takes also about 200ms :) Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ? > > > > [schnipp] > > > > -- > ================================================================== > Achilleus Mantzios > S/W Engineer > IT dept > Dynacom Tankers Mngmt > Nikis 4, Glyfada > Athens 16610 > Greece > tel: +30-210-8981112 > fax: +30-210-8981877 > email: achill@matrix.gatewaynet.com > mantzios@softlab.ece.ntua.gr > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
On Fri, 7 Mar 2003, Andre Schubert wrote: > On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2) > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2) > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > i have a little problem on indexing a table which contains > > > > > > > about 4 millions of traffic-data. > > > > > > > My problem is, that a want to select all data from > > > > > > > a specific month from a specific ip and this select should use the index. > > > > > > > I use the following select: > > > > > > > > > > > > Did you try to use BETWEEN ?? > > > > > > E.g. > > > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > > > > > '2003-02-01 00:00:00'::timestamp > > > > > > > > > > > > > > > > Yes and it works if i write the dates by hand, every new month. > > > > > But the query is executed automatically and i dont want > > > > > to write in the dates before the query is executed. Maybe the > > > > > the start and enddate should also be alculated with sql, > > > > > because i want to create a view from this statement and execute it every month. > > > > > Or did i miss something. > > > > > > > > You could have an index on the > > > > whole > > > > date_trunc('month',tbl_traffic.time_stamp),ip > > > > > > > > How does it perform? > > > > > > > > > > I'am not sure how to create such an index... > > > > > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as > > > 'select date_trunc(''month'',$1)' language 'sql' with (iscachable); > > > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) ); > > > > > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56'; > > > NOTICE: QUERY PLAN: > > > > > > Aggregate (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) > > > -> Index Scan using idx_test on tbl_traffic (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346loops=1) > > > Total runtime: 2278.62 msec > > > > > > Maybe the problem is, that the index is created without ip as the second column.... > > > > Sorry, it sliped my mind that we cannot have compound indexes on > > functions. :( > > > > Anyway. > > Did the explicit BETWEEN gave you satisfactory performance? > > > > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze. > I think these is a good performance. > But if i use now() instead of manually typed dates the query take about 1400ms :( > I thought the somebody posted to this list, that now() is a function that is not cached, > and thatswhy does not work pretty well with indexes. > I created a cached function cached_now() which returns now() but is declared with "isCacheable". > If i use cached_now() instead of now the query takes also about 200ms :) > Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ? I think you should use a high level language to wrap your PostgreSQL access (e.g. java, c, perl, ....). Since you are gonna be running this query as a batch job, you must use a high level lang that can handle dates in a UNIX-like way. (milisecs from 1970-01-01 00:00:00) > > > > > > > [schnipp] > > > > > > > -- > > ================================================================== > > Achilleus Mantzios > > S/W Engineer > > IT dept > > Dynacom Tankers Mngmt > > Nikis 4, Glyfada > > Athens 16610 > > Greece > > tel: +30-210-8981112 > > fax: +30-210-8981877 > > email: achill@matrix.gatewaynet.com > > mantzios@softlab.ece.ntua.gr > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > -- ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-210-8981112 fax: +30-210-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
On Fri, 7 Mar 2003 15:03:01 -0200 (GMT+2) Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2) > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2) > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2) > > > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > > > > > > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > i have a little problem on indexing a table which contains > > > > > > > > about 4 millions of traffic-data. > > > > > > > > My problem is, that a want to select all data from > > > > > > > > a specific month from a specific ip and this select should use the index. > > > > > > > > I use the following select: > > > > > > > > > > > > > > Did you try to use BETWEEN ?? > > > > > > > E.g. > > > > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and > > > > > > > '2003-02-01 00:00:00'::timestamp > > > > > > > > > > > > > > > > > > > Yes and it works if i write the dates by hand, every new month. > > > > > > But the query is executed automatically and i dont want > > > > > > to write in the dates before the query is executed. Maybe the > > > > > > the start and enddate should also be alculated with sql, > > > > > > because i want to create a view from this statement and execute it every month. > > > > > > Or did i miss something. > > > > > > > > > > You could have an index on the > > > > > whole > > > > > date_trunc('month',tbl_traffic.time_stamp),ip > > > > > > > > > > How does it perform? > > > > > > > > > > > > > I'am not sure how to create such an index... > > > > > > > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as > > > > 'select date_trunc(''month'',$1)' language 'sql' with (iscachable); > > > > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) ); > > > > > > > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56'; > > > > NOTICE: QUERY PLAN: > > > > > > > > Aggregate (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) > > > > -> Index Scan using idx_test on tbl_traffic (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346loops=1) > > > > Total runtime: 2278.62 msec > > > > > > > > Maybe the problem is, that the index is created without ip as the second column.... > > > > > > Sorry, it sliped my mind that we cannot have compound indexes on > > > functions. :( > > > > > > Anyway. > > > Did the explicit BETWEEN gave you satisfactory performance? > > > > > > > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze. > > I think these is a good performance. > > But if i use now() instead of manually typed dates the query take about 1400ms :( > > I thought the somebody posted to this list, that now() is a function that is not cached, > > and thatswhy does not work pretty well with indexes. > > I created a cached function cached_now() which returns now() but is declared with "isCacheable". > > If i use cached_now() instead of now the query takes also about 200ms :) > > Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ? > > I think you should use a high level language to wrap your PostgreSQL > access (e.g. java, c, perl, ....). > > Since you are gonna be running this query as a batch job, > you must use a high level lang that can handle dates in > a UNIX-like way. (milisecs from 1970-01-01 00:00:00) > Ok, i think you are right. Thanks for the quick advice. Regards, as
Andre Schubert <andre@km3.de> writes: > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze. > I think these is a good performance. > But if i use now() instead of manually typed dates the query take about 1400ms :( > I thought the somebody posted to this list, that now() is a function that is not cached, > and thatswhy does not work pretty well with indexes. Update to 7.3, and that problem will go away. regards, tom lane
On Fri, 07 Mar 2003 10:05:03 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andre Schubert <andre@km3.de> writes: > > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze. > > I think these is a good performance. > > But if i use now() instead of manually typed dates the query take about 1400ms :( > > I thought the somebody posted to this list, that now() is a function that is not cached, > > and thatswhy does not work pretty well with indexes. > > Update to 7.3, and that problem will go away. > What is the difference between 7.2 and 7.3 related to my problem ? Thanks, as
Andre Schubert wrote: > Hi all, > > i have a little problem on indexing a table which contains > about 4 millions of traffic-data. > My problem is, that a want to select all data from > a specific month from a specific ip and this select should use the index. > I use the following select: > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > NOTICE: QUERY PLAN: > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > Total runtime: 1620.94 msec > > But it takes a long time to select the traffic for all Ips. > Is there a way to select these data with using the index correctly ? > > Thanks in advance > I have one more solution - try to rewrite your where clause to NOT USE function on time_stamp. If your query will look like: select ... where time_stamp between (function with now() returning first day) and (function with now() returning last day); your index will work fine. Regards, Tomasz Myrta
On Mon, 10 Mar 2003 10:12:15 +0100 Tomasz Myrta <jasiek@klaster.net> wrote: > Andre Schubert wrote: > > Hi all, > > > > i have a little problem on indexing a table which contains > > about 4 millions of traffic-data. > > My problem is, that a want to select all data from > > a specific month from a specific ip and this select should use the index. > > I use the following select: > > > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > > Total runtime: 1620.94 msec > > > > But it takes a long time to select the traffic for all Ips. > > Is there a way to select these data with using the index correctly ? > > > > Thanks in advance > > > I have one more solution - try to rewrite your where clause to NOT USE > function on time_stamp. If your query will look like: > select ... where time_stamp between (function with now() returning first > day) and (function with now() returning last day); > your index will work fine. > Thanks for the hint, i will test this and report to this list. Thanks, as
Andre Schubert <andre@km3.de> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Update to 7.3, and that problem will go away. >> > What is the difference between 7.2 and 7.3 related to my problem ? 7.3 is willing to index comparisons to now(), that's what ;-) You may care to study the difference between cachable/noncachable functions in 7.2, versus the difference between immutable/stable/volatile functions in 7.3. 7.2 has to treat now() as noncachable and therefore unsafe to index. 7.3 treats it as stable and therefore safe to index. You can work around this in 7.2 by cheating with subselects or wrapper functions, but that's a poor substitute for a real solution. regards, tom lane
On Mon, 10 Mar 2003 15:24:09 +0100 Andre Schubert <andre@km3.de> wrote: > On Mon, 10 Mar 2003 10:12:15 +0100 > Tomasz Myrta <jasiek@klaster.net> wrote: > > > Andre Schubert wrote: > > > Hi all, > > > > > > i have a little problem on indexing a table which contains > > > about 4 millions of traffic-data. > > > My problem is, that a want to select all data from > > > a specific month from a specific ip and this select should use the index. > > > I use the following select: > > > > > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip = '80.243.38.57'; > > > NOTICE: QUERY PLAN: > > > > > > Aggregate (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) > > > -> Index Scan using idx_ip_time_stamp on tbl_traffic (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89rows=5232 loops=1) > > > Total runtime: 1620.94 msec > > > > > > But it takes a long time to select the traffic for all Ips. > > > Is there a way to select these data with using the index correctly ? > > > > > > Thanks in advance > > > > > I have one more solution - try to rewrite your where clause to NOT USE > > function on time_stamp. If your query will look like: > > select ... where time_stamp between (function with now() returning first > > day) and (function with now() returning last day); > > your index will work fine. > > > Thanks for the hint, i will test this and report to this list. > I have tested and it doesnt work as it should do. I think its the same problem posted by Tom Lane, and it should go away after an update to 7.3. Regards, as Thanks as