While profiling loading a table with timestamp columns with COPY, I
spotted that a lot of time is spent in pg_next_dst_boundary. It scans a
sorted array of time segments to find the one that the given timestamp
falls within. That's slow when the array is big; GB timezone for example
has > 200 segments.
Attached is a patch to use binary search instead of linear search. On my
laptop, it reduces the time required to load million rows to this test
table:
CREATE TABLE timestamptable (
id int,
ts1 timestamp,
ts2 timestamp,
ts3 timestamp,
ts4 timestamp
);
from ~16.4s to ~11.9s.
It would be nice to slip this into 8.3...
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Index: src/timezone/localtime.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/timezone/localtime.c,v
retrieving revision 1.17
diff -c -r1.17 localtime.c
*** src/timezone/localtime.c 4 Aug 2007 19:29:25 -0000 1.17
--- src/timezone/localtime.c 18 Sep 2007 13:41:03 -0000
***************
*** 1008,1013 ****
--- 1008,1014 ----
int i;
int j;
const pg_time_t t = *timep;
+ int high, low;
sp = &tz->state;
if (sp->timecnt == 0)
***************
*** 1056,1064 ****
return 1;
}
/* Else search to find the containing segment */
! for (i = 1; i < sp->timecnt; ++i)
! if (t <= sp->ats[i])
! break;
j = sp->types[i - 1];
ttisp = &sp->ttis[j];
*before_gmtoff = ttisp->tt_gmtoff;
--- 1057,1074 ----
return 1;
}
/* Else search to find the containing segment */
! low = 1;
! high = sp->timecnt - 1;
! while (high > low)
! {
! int mid = low + ((high - low) / 2);
! if (t > sp->ats[mid])
! low = mid + 1;
! else
! high = mid;
! }
! i = high;
!
j = sp->types[i - 1];
ttisp = &sp->ttis[j];
*before_gmtoff = ttisp->tt_gmtoff;