Обсуждение: Help on indexing timestamps

Поиск
Список
Период
Сортировка

Help on indexing timestamps

От
Andre Schubert
Дата:
Hi all,

i have a little problem on indexing a table which contains
about 4 millions of traffic-data.
My problem is, that a want to select all data from
a specific month from a specific ip and this select should use the index.
I use the following select:

db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
NOTICE:  QUERY PLAN:

Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1) ->  Index Scan using
idx_ip_time_stampon tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual time=1216.79..1579.89 rows=5232 loops=1)
 
Total runtime: 1620.94 msec

But it takes a long time to select the traffic for all Ips.
Is there a way to select these data with using the index correctly ?

Thanks in advance

BTW: Table-schema as follows

db_test=# \d tbl_traffic               Table "tbl_traffic"  Column   |           Type           | Modifiers 
------------+--------------------------+-----------time_stamp | timestamp with time zone | not nullip         | inet
                | local_up   | bigint                   | not nulllocal_down | bigint                   | not
nullinet_up   | bigint                   | not nullinet_down  | bigint                   | not null
 
Indexes: idx_ip_time_stamp

db_test=# \d idx_ip_time_stamp      Index "idx_ip_time_stamp"  Column   |           Type           
------------+--------------------------ip         | inettime_stamp | timestamp with time zone


Re: Help on indexing timestamps

От
Achilleus Mantzios
Дата:
On Fri, 7 Mar 2003, Andre Schubert wrote:

> Hi all,
> 
> i have a little problem on indexing a table which contains
> about 4 millions of traffic-data.
> My problem is, that a want to select all data from
> a specific month from a specific ip and this select should use the index.
> I use the following select:

Did you try to use BETWEEN ??
E.g.
... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
'2003-02-01 00:00:00'::timestamp

> 
> db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> NOTICE:  QUERY PLAN:
> 
> Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
>   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> Total runtime: 1620.94 msec
> 
> But it takes a long time to select the traffic for all Ips.
> Is there a way to select these data with using the index correctly ?
> 
> Thanks in advance
> 
> BTW: Table-schema as follows
> 
> db_test=# \d tbl_traffic
>                 Table "tbl_traffic"
>    Column   |           Type           | Modifiers 
> ------------+--------------------------+-----------
>  time_stamp | timestamp with time zone | not null
>  ip         | inet                     | 
>  local_up   | bigint                   | not null
>  local_down | bigint                   | not null
>  inet_up    | bigint                   | not null
>  inet_down  | bigint                   | not null
> Indexes: idx_ip_time_stamp
> 
> db_test=# \d idx_ip_time_stamp
>        Index "idx_ip_time_stamp"
>    Column   |           Type           
> ------------+--------------------------
>  ip         | inet
>  time_stamp | timestamp with time zone
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 

-- 
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-210-8981112
fax:    +30-210-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:

> On Fri, 7 Mar 2003, Andre Schubert wrote:
> 
> > Hi all,
> > 
> > i have a little problem on indexing a table which contains
> > about 4 millions of traffic-data.
> > My problem is, that a want to select all data from
> > a specific month from a specific ip and this select should use the index.
> > I use the following select:
> 
> Did you try to use BETWEEN ??
> E.g.
> ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> '2003-02-01 00:00:00'::timestamp
> 

Yes and it works if i write the dates by hand, every new month.
But the query is executed automatically and i dont want
to write in the dates before the query is executed. Maybe the
the start and enddate should also be alculated with sql,
because i want to create a view from this statement and execute it every month.
Or did i miss something.

Regards, as

> > 
> > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> > NOTICE:  QUERY PLAN:
> > 
> > Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
> >   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> > Total runtime: 1620.94 msec
> > 
> > But it takes a long time to select the traffic for all Ips.
> > Is there a way to select these data with using the index correctly ?
> > 
> > Thanks in advance
> > 
> > BTW: Table-schema as follows
> > 
> > db_test=# \d tbl_traffic
> >                 Table "tbl_traffic"
> >    Column   |           Type           | Modifiers 
> > ------------+--------------------------+-----------
> >  time_stamp | timestamp with time zone | not null
> >  ip         | inet                     | 
> >  local_up   | bigint                   | not null
> >  local_down | bigint                   | not null
> >  inet_up    | bigint                   | not null
> >  inet_down  | bigint                   | not null
> > Indexes: idx_ip_time_stamp
> > 
> > db_test=# \d idx_ip_time_stamp
> >        Index "idx_ip_time_stamp"
> >    Column   |           Type           
> > ------------+--------------------------
> >  ip         | inet
> >  time_stamp | timestamp with time zone
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
> > 
> 
> -- 
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-210-8981112
> fax:    +30-210-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: Help on indexing timestamps

От
Achilleus Mantzios
Дата:
On Fri, 7 Mar 2003, Andre Schubert wrote:

> On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> 
> > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > 
> > > Hi all,
> > > 
> > > i have a little problem on indexing a table which contains
> > > about 4 millions of traffic-data.
> > > My problem is, that a want to select all data from
> > > a specific month from a specific ip and this select should use the index.
> > > I use the following select:
> > 
> > Did you try to use BETWEEN ??
> > E.g.
> > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > '2003-02-01 00:00:00'::timestamp
> > 
> 
> Yes and it works if i write the dates by hand, every new month.
> But the query is executed automatically and i dont want
> to write in the dates before the query is executed. Maybe the
> the start and enddate should also be alculated with sql,
> because i want to create a view from this statement and execute it every month.
> Or did i miss something.

You could have an index on the 
whole 
date_trunc('month',tbl_traffic.time_stamp),ip

How does it perform?

> 
> Regards, as
> 
> > > 
> > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> > > NOTICE:  QUERY PLAN:
> > > 
> > > Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
> > >   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> > > Total runtime: 1620.94 msec
> > > 
> > > But it takes a long time to select the traffic for all Ips.
> > > Is there a way to select these data with using the index correctly ?
> > > 
> > > Thanks in advance
> > > 
> > > BTW: Table-schema as follows
> > > 
> > > db_test=# \d tbl_traffic
> > >                 Table "tbl_traffic"
> > >    Column   |           Type           | Modifiers 
> > > ------------+--------------------------+-----------
> > >  time_stamp | timestamp with time zone | not null
> > >  ip         | inet                     | 
> > >  local_up   | bigint                   | not null
> > >  local_down | bigint                   | not null
> > >  inet_up    | bigint                   | not null
> > >  inet_down  | bigint                   | not null
> > > Indexes: idx_ip_time_stamp
> > > 
> > > db_test=# \d idx_ip_time_stamp
> > >        Index "idx_ip_time_stamp"
> > >    Column   |           Type           
> > > ------------+--------------------------
> > >  ip         | inet
> > >  time_stamp | timestamp with time zone
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 3: if posting/reading through Usenet, please send an appropriate
> > > subscribe-nomail command to majordomo@postgresql.org so that your
> > > message can get through to the mailing list cleanly
> > > 
> > 
> > -- 
> > ==================================================================
> > Achilleus Mantzios
> > S/W Engineer
> > IT dept
> > Dynacom Tankers Mngmt
> > Nikis 4, Glyfada
> > Athens 16610
> > Greece
> > tel:    +30-210-8981112
> > fax:    +30-210-8981877
> > email:  achill@matrix.gatewaynet.com
> >         mantzios@softlab.ece.ntua.gr
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

-- 
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-210-8981112
fax:    +30-210-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2)
Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:

> On Fri, 7 Mar 2003, Andre Schubert wrote:
> 
> > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > 
> > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > i have a little problem on indexing a table which contains
> > > > about 4 millions of traffic-data.
> > > > My problem is, that a want to select all data from
> > > > a specific month from a specific ip and this select should use the index.
> > > > I use the following select:
> > > 
> > > Did you try to use BETWEEN ??
> > > E.g.
> > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > > '2003-02-01 00:00:00'::timestamp
> > > 
> > 
> > Yes and it works if i write the dates by hand, every new month.
> > But the query is executed automatically and i dont want
> > to write in the dates before the query is executed. Maybe the
> > the start and enddate should also be alculated with sql,
> > because i want to create a view from this statement and execute it every month.
> > Or did i miss something.
> 
> You could have an index on the 
> whole 
> date_trunc('month',tbl_traffic.time_stamp),ip
> 
> How does it perform?
> 

I'am not sure how to create such an index...

First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as       'select
date_trunc(''month'',$1)'language 'sql' with (iscachable);
 
Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) );

Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where trunc(tbl_traffic.time_stamp)
=trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56';
 
NOTICE:  QUERY PLAN:

Aggregate  (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1) ->  Index Scan using
idx_teston tbl_traffic  (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50 rows=5346 loops=1)
 
Total runtime: 2278.62 msec

Maybe the problem is, that the index is created without ip as the second column....

[schnipp]


Re: Help on indexing timestamps

От
Achilleus Mantzios
Дата:
On Fri, 7 Mar 2003, Andre Schubert wrote:

> On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2)
> Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> 
> > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > 
> > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > 
> > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > i have a little problem on indexing a table which contains
> > > > > about 4 millions of traffic-data.
> > > > > My problem is, that a want to select all data from
> > > > > a specific month from a specific ip and this select should use the index.
> > > > > I use the following select:
> > > > 
> > > > Did you try to use BETWEEN ??
> > > > E.g.
> > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > > > '2003-02-01 00:00:00'::timestamp
> > > > 
> > > 
> > > Yes and it works if i write the dates by hand, every new month.
> > > But the query is executed automatically and i dont want
> > > to write in the dates before the query is executed. Maybe the
> > > the start and enddate should also be alculated with sql,
> > > because i want to create a view from this statement and execute it every month.
> > > Or did i miss something.
> > 
> > You could have an index on the 
> > whole 
> > date_trunc('month',tbl_traffic.time_stamp),ip
> > 
> > How does it perform?
> > 
> 
> I'am not sure how to create such an index...
> 
> First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as 
>        'select date_trunc(''month'',$1)' language 'sql' with (iscachable);
> Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) );
> 
> Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56';
 
> NOTICE:  QUERY PLAN:
> 
> Aggregate  (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1)
>   ->  Index Scan using idx_test on tbl_traffic  (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50
rows=5346loops=1)
 
> Total runtime: 2278.62 msec
> 
> Maybe the problem is, that the index is created without ip as the second column....

Sorry, it sliped my mind that we cannot have compound indexes on 
functions. :(

Anyway.
Did the explicit BETWEEN gave you satisfactory performance?

> 
> [schnipp]
> 

-- 
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-210-8981112
fax:    +30-210-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2)
Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:

> On Fri, 7 Mar 2003, Andre Schubert wrote:
> 
> > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2)
> > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > 
> > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > 
> > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > > 
> > > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > i have a little problem on indexing a table which contains
> > > > > > about 4 millions of traffic-data.
> > > > > > My problem is, that a want to select all data from
> > > > > > a specific month from a specific ip and this select should use the index.
> > > > > > I use the following select:
> > > > > 
> > > > > Did you try to use BETWEEN ??
> > > > > E.g.
> > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > > > > '2003-02-01 00:00:00'::timestamp
> > > > > 
> > > > 
> > > > Yes and it works if i write the dates by hand, every new month.
> > > > But the query is executed automatically and i dont want
> > > > to write in the dates before the query is executed. Maybe the
> > > > the start and enddate should also be alculated with sql,
> > > > because i want to create a view from this statement and execute it every month.
> > > > Or did i miss something.
> > > 
> > > You could have an index on the 
> > > whole 
> > > date_trunc('month',tbl_traffic.time_stamp),ip
> > > 
> > > How does it perform?
> > > 
> > 
> > I'am not sure how to create such an index...
> > 
> > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as 
> >        'select date_trunc(''month'',$1)' language 'sql' with (iscachable);
> > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) );
> > 
> > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56';
 
> > NOTICE:  QUERY PLAN:
> > 
> > Aggregate  (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1)
> >   ->  Index Scan using idx_test on tbl_traffic  (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50
rows=5346loops=1)
 
> > Total runtime: 2278.62 msec
> > 
> > Maybe the problem is, that the index is created without ip as the second column....
> 
> Sorry, it sliped my mind that we cannot have compound indexes on 
> functions. :(
> 
> Anyway.
> Did the explicit BETWEEN gave you satisfactory performance?
> 

Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze.
I think these is a good performance.
But if i use now() instead of manually typed dates the query take about 1400ms :(
I thought the somebody posted to this list, that now() is a function that is not cached,
and thatswhy does not work pretty well with indexes.
I created a cached function cached_now() which returns now() but is declared with "isCacheable".
If i use cached_now() instead of now the query takes also about 200ms :)
Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ?

> > 
> > [schnipp]
> > 
> 
> -- 
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-210-8981112
> fax:    +30-210-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org


Re: Help on indexing timestamps

От
Achilleus Mantzios
Дата:
On Fri, 7 Mar 2003, Andre Schubert wrote:

> On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2)
> Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> 
> > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > 
> > > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2)
> > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > 
> > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > 
> > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > > > 
> > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > > > 
> > > > > > > Hi all,
> > > > > > > 
> > > > > > > i have a little problem on indexing a table which contains
> > > > > > > about 4 millions of traffic-data.
> > > > > > > My problem is, that a want to select all data from
> > > > > > > a specific month from a specific ip and this select should use the index.
> > > > > > > I use the following select:
> > > > > > 
> > > > > > Did you try to use BETWEEN ??
> > > > > > E.g.
> > > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > > > > > '2003-02-01 00:00:00'::timestamp
> > > > > > 
> > > > > 
> > > > > Yes and it works if i write the dates by hand, every new month.
> > > > > But the query is executed automatically and i dont want
> > > > > to write in the dates before the query is executed. Maybe the
> > > > > the start and enddate should also be alculated with sql,
> > > > > because i want to create a view from this statement and execute it every month.
> > > > > Or did i miss something.
> > > > 
> > > > You could have an index on the 
> > > > whole 
> > > > date_trunc('month',tbl_traffic.time_stamp),ip
> > > > 
> > > > How does it perform?
> > > > 
> > > 
> > > I'am not sure how to create such an index...
> > > 
> > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as 
> > >        'select date_trunc(''month'',$1)' language 'sql' with (iscachable);
> > > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) );
> > > 
> > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56';
 
> > > NOTICE:  QUERY PLAN:
> > > 
> > > Aggregate  (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1)
> > >   ->  Index Scan using idx_test on tbl_traffic  (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50
rows=5346loops=1)
 
> > > Total runtime: 2278.62 msec
> > > 
> > > Maybe the problem is, that the index is created without ip as the second column....
> > 
> > Sorry, it sliped my mind that we cannot have compound indexes on 
> > functions. :(
> > 
> > Anyway.
> > Did the explicit BETWEEN gave you satisfactory performance?
> > 
> 
> Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze.
> I think these is a good performance.
> But if i use now() instead of manually typed dates the query take about 1400ms :(
> I thought the somebody posted to this list, that now() is a function that is not cached,
> and thatswhy does not work pretty well with indexes.
> I created a cached function cached_now() which returns now() but is declared with "isCacheable".
> If i use cached_now() instead of now the query takes also about 200ms :)
> Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ?

I think you should use a high level language to wrap your PostgreSQL 
access (e.g. java, c, perl, ....).

Since you are gonna be running this query as a batch job,
you must use a high level lang that can handle dates in 
a UNIX-like way. (milisecs from 1970-01-01 00:00:00)

> 
> > > 
> > > [schnipp]
> > > 
> > 
> > -- 
> > ==================================================================
> > Achilleus Mantzios
> > S/W Engineer
> > IT dept
> > Dynacom Tankers Mngmt
> > Nikis 4, Glyfada
> > Athens 16610
> > Greece
> > tel:    +30-210-8981112
> > fax:    +30-210-8981877
> > email:  achill@matrix.gatewaynet.com
> >         mantzios@softlab.ece.ntua.gr
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> > 
> > http://archives.postgresql.org
> 

-- 
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-210-8981112
fax:    +30-210-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Fri, 7 Mar 2003 15:03:01 -0200 (GMT+2)
Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:

> On Fri, 7 Mar 2003, Andre Schubert wrote:
> 
> > On Fri, 7 Mar 2003 14:17:36 -0200 (GMT+2)
> > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > 
> > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > 
> > > > On Fri, 7 Mar 2003 13:48:04 -0200 (GMT+2)
> > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > > 
> > > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > > 
> > > > > > On Fri, 7 Mar 2003 13:01:16 -0200 (GMT+2)
> > > > > > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote:
> > > > > > 
> > > > > > > On Fri, 7 Mar 2003, Andre Schubert wrote:
> > > > > > > 
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > i have a little problem on indexing a table which contains
> > > > > > > > about 4 millions of traffic-data.
> > > > > > > > My problem is, that a want to select all data from
> > > > > > > > a specific month from a specific ip and this select should use the index.
> > > > > > > > I use the following select:
> > > > > > > 
> > > > > > > Did you try to use BETWEEN ??
> > > > > > > E.g.
> > > > > > > ... and time_stamp between '2003-01-01 00:00:00'::timestamp and 
> > > > > > > '2003-02-01 00:00:00'::timestamp
> > > > > > > 
> > > > > > 
> > > > > > Yes and it works if i write the dates by hand, every new month.
> > > > > > But the query is executed automatically and i dont want
> > > > > > to write in the dates before the query is executed. Maybe the
> > > > > > the start and enddate should also be alculated with sql,
> > > > > > because i want to create a view from this statement and execute it every month.
> > > > > > Or did i miss something.
> > > > > 
> > > > > You could have an index on the 
> > > > > whole 
> > > > > date_trunc('month',tbl_traffic.time_stamp),ip
> > > > > 
> > > > > How does it perform?
> > > > > 
> > > > 
> > > > I'am not sure how to create such an index...
> > > > 
> > > > First: create or replace function trunc_ip(timestamp with time zone) returns timestamptz as 
> > > >        'select date_trunc(''month'',$1)' language 'sql' with (iscachable);
> > > > Then: create index idx_test on tbl_traffic using btree( trunc(time_stamp) );
> > > > 
> > > > Result: db_km3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
trunc(tbl_traffic.time_stamp)= trunc('2003-02-01'::timestamptz) and ip = '80.243.40.56';
 
> > > > NOTICE:  QUERY PLAN:
> > > > 
> > > > Aggregate  (cost=108.78..108.78 rows=1 width=16) (actual time=2278.48..2278.48 rows=1 loops=1)
> > > >   ->  Index Scan using idx_test on tbl_traffic  (cost=0.00..108.78 rows=1 width=16) (actual time=0.23..2240.50
rows=5346loops=1)
 
> > > > Total runtime: 2278.62 msec
> > > > 
> > > > Maybe the problem is, that the index is created without ip as the second column....
> > > 
> > > Sorry, it sliped my mind that we cannot have compound indexes on 
> > > functions. :(
> > > 
> > > Anyway.
> > > Did the explicit BETWEEN gave you satisfactory performance?
> > > 
> > 
> > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze.
> > I think these is a good performance.
> > But if i use now() instead of manually typed dates the query take about 1400ms :(
> > I thought the somebody posted to this list, that now() is a function that is not cached,
> > and thatswhy does not work pretty well with indexes.
> > I created a cached function cached_now() which returns now() but is declared with "isCacheable".
> > If i use cached_now() instead of now the query takes also about 200ms :)
> > Is it safe, or better could it have any side-effects, if i use my "own" cached_now() in such a query ?
> 
> I think you should use a high level language to wrap your PostgreSQL 
> access (e.g. java, c, perl, ....).
> 
> Since you are gonna be running this query as a batch job,
> you must use a high level lang that can handle dates in 
> a UNIX-like way. (milisecs from 1970-01-01 00:00:00)
> 

Ok, i think you are right.
Thanks for the quick advice.

Regards, as



Re: Help on indexing timestamps

От
Tom Lane
Дата:
Andre Schubert <andre@km3.de> writes:
> Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze.
> I think these is a good performance.
> But if i use now() instead of manually typed dates the query take about 1400ms :(
> I thought the somebody posted to this list, that now() is a function that is not cached,
> and thatswhy does not work pretty well with indexes.

Update to 7.3, and that problem will go away.
        regards, tom lane


Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Fri, 07 Mar 2003 10:05:03 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andre Schubert <andre@km3.de> writes:
> > Yes if i use BETWEEN and type in the dates by hand the query takes about 200ms with explain analyze.
> > I think these is a good performance.
> > But if i use now() instead of manually typed dates the query take about 1400ms :(
> > I thought the somebody posted to this list, that now() is a function that is not cached,
> > and thatswhy does not work pretty well with indexes.
> 
> Update to 7.3, and that problem will go away.
> 
What is the difference between 7.2 and 7.3 related to my problem ?

Thanks, as


Re: Help on indexing timestamps

От
Tomasz Myrta
Дата:
Andre Schubert wrote:
> Hi all,
> 
> i have a little problem on indexing a table which contains
> about 4 millions of traffic-data.
> My problem is, that a want to select all data from
> a specific month from a specific ip and this select should use the index.
> I use the following select:
> 
> db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> NOTICE:  QUERY PLAN:
> 
> Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
>   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> Total runtime: 1620.94 msec
> 
> But it takes a long time to select the traffic for all Ips.
> Is there a way to select these data with using the index correctly ?
> 
> Thanks in advance
> 
I have one more solution - try to rewrite your where clause to NOT USE 
function on time_stamp. If your query will look like:
select ... where time_stamp between (function with now() returning first 
day) and (function with now() returning last day);
your index will work fine.

Regards,
Tomasz Myrta




Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Mon, 10 Mar 2003 10:12:15 +0100
Tomasz Myrta <jasiek@klaster.net> wrote:

> Andre Schubert wrote:
> > Hi all,
> > 
> > i have a little problem on indexing a table which contains
> > about 4 millions of traffic-data.
> > My problem is, that a want to select all data from
> > a specific month from a specific ip and this select should use the index.
> > I use the following select:
> > 
> > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> > NOTICE:  QUERY PLAN:
> > 
> > Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
> >   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> > Total runtime: 1620.94 msec
> > 
> > But it takes a long time to select the traffic for all Ips.
> > Is there a way to select these data with using the index correctly ?
> > 
> > Thanks in advance
> > 
> I have one more solution - try to rewrite your where clause to NOT USE 
> function on time_stamp. If your query will look like:
> select ... where time_stamp between (function with now() returning first 
> day) and (function with now() returning last day);
> your index will work fine.
> 
Thanks for the hint, i will test this and report to this list.

Thanks, as


Re: Help on indexing timestamps

От
Tom Lane
Дата:
Andre Schubert <andre@km3.de> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Update to 7.3, and that problem will go away.
>> 
> What is the difference between 7.2 and 7.3 related to my problem ?

7.3 is willing to index comparisons to now(), that's what ;-)

You may care to study the difference between cachable/noncachable
functions in 7.2, versus the difference between
immutable/stable/volatile functions in 7.3.  7.2 has to treat now()
as noncachable and therefore unsafe to index.  7.3 treats it as
stable and therefore safe to index.

You can work around this in 7.2 by cheating with subselects or wrapper
functions, but that's a poor substitute for a real solution.
        regards, tom lane


Re: Help on indexing timestamps

От
Andre Schubert
Дата:
On Mon, 10 Mar 2003 15:24:09 +0100
Andre Schubert <andre@km3.de> wrote:

> On Mon, 10 Mar 2003 10:12:15 +0100
> Tomasz Myrta <jasiek@klaster.net> wrote:
> 
> > Andre Schubert wrote:
> > > Hi all,
> > > 
> > > i have a little problem on indexing a table which contains
> > > about 4 millions of traffic-data.
> > > My problem is, that a want to select all data from
> > > a specific month from a specific ip and this select should use the index.
> > > I use the following select:
> > > 
> > > db_nmkm3aue=# explain analyze select sum(inet_up+inet_down) from tbl_traffic where
date_trunc('month',tbl_traffic.time_stamp)::timestamptz= date_trunc('month',now() - timespan('1 months')) and ip =
'80.243.38.57';
> > > NOTICE:  QUERY PLAN:
> > > 
> > > Aggregate  (cost=116.30..116.30 rows=1 width=16) (actual time=1620.79..1620.79 rows=1 loops=1)
> > >   ->  Index Scan using idx_ip_time_stamp on tbl_traffic  (cost=0.00..116.30 rows=1 width=16) (actual
time=1216.79..1579.89rows=5232 loops=1)
 
> > > Total runtime: 1620.94 msec
> > > 
> > > But it takes a long time to select the traffic for all Ips.
> > > Is there a way to select these data with using the index correctly ?
> > > 
> > > Thanks in advance
> > > 
> > I have one more solution - try to rewrite your where clause to NOT USE 
> > function on time_stamp. If your query will look like:
> > select ... where time_stamp between (function with now() returning first 
> > day) and (function with now() returning last day);
> > your index will work fine.
> > 
> Thanks for the hint, i will test this and report to this list.
> 
I have tested and it doesnt work as it should do.
I think its the same problem posted by Tom Lane,
and it should go away after an update to 7.3.

Regards, as

Thanks as