Обсуждение: group by week (ww), plus -S performance
I thought I'd pass along a work-around I came up with for the limits
in 'ww' support (7.0 final). Maybe this would be a useful example for
the docs? They're a little lean on date/time examples, IMO. So is the
new book.
Task:
Select a count of records from a table, grouped by the week of the
record. The table is something like
CREATE table EVENTS
(event varchar(128) not null, stamp datetime default now());
And I want the output to use human-readable dates, like
week | count
------------+-------
2000-03-06 | 4
2000-03-13 | 5
2000-03-20 | 3
My immediate solution was something like
SELECT to_date(date_part('year',stamp)||'-'||date_part('week',stamp),
'yyyy-ww'),count(*) FROM EVENTS GROUP BY to_date;
but to_date() doesn't seem to support 'ww' format for text-to-date
translation (not documented, AFAIK).
The solution I eventually found was
SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp)
as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
I haven't tested the '-5' kludge very extensively. It seems to work
ok, for the dates I tested in 2000. I'm sure it would run into
trouble with calendar-change weirdness pre-1900.
I'd also love to hear any suggestions for performance improvements -
it's cpu-bound on my system, and takes about 70 sec to run with
86,000 rows.
The query plan is
Aggregate (cost=9155.76..9584.66 rows=8578 width=20)
-> Group (cost=9155.76..9370.21 rows=85780 width=20)
-> Sort (cost=9155.76..9155.76 rows=85780 width=20)
-> Seq Scan on events (cost=0.00..2126.80 rows=85780 width=20)
The order-by clause doesn't seem to add much overhead - the query
plan is the same with or without it.
I'm running with "-i -N 64 -B 1024 -o '-F'", and I've tried up to '-S
8192' without seeing any noticeable improvement. At higher values,
performance actually went down by almost 50% - something to do with
shmem segment sizes? This is on Solaris 2.6, and I compiled PG7.0
using gcc 2.95.
The -S does reduce disk I/O, but I think that's only about 5% of the
work that's going on (90% user time). An index on 'stamp' doesn't
seem to help, either.
thanks,
-- Mike
Michael Blakeley <mike@blakeley.com> writes:
> SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp)
> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
Seems like the hard way. You are relying on an implicit conversion from
the float8 output of date_part() to the text input to_date expects,
which is kind of slow (and IMHO it's a bug in 7.0 that it will do such
a conversion silently, anyhow). Better to use date_trunc to save the
conversion step:
SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp)
as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
But the real problem is that this is going to use a start-of-week day
that is offset five days from whatever day-of-the-week 1 January is.
If you'd tried a few other years than 2000 you'd likely have been
dissatisfied with the results...
It seems like the right answer is that date_trunc('week',stamp) ought
to do what you want, but it doesn't seem to be implemented. That's
definitely a shortcoming --- anyone want to fix it?
> The order-by clause doesn't seem to add much overhead - the query
> plan is the same with or without it.
Right, the GROUP BY requires a sort anyway, so the planner knows
there's no need to sort again on the same key.
regards, tom lane
At 12:54 AM -0400 5/29/2000, Tom Lane wrote:
>Michael Blakeley <mike@blakeley.com> writes:
>> SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp)
>> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
>
>Seems like the hard way. You are relying on an implicit conversion from
>the float8 output of date_part() to the text input to_date expects,
>which is kind of slow (and IMHO it's a bug in 7.0 that it will do such
>a conversion silently, anyhow). Better to use date_trunc to save the
>conversion step:
>
>SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp)
> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
Thanks - that seems to be about 10% faster. I also got a 25% boost by
recompiling PG7 with -O3.
>But the real problem is that this is going to use a start-of-week day
>that is offset five days from whatever day-of-the-week 1 January is.
>If you'd tried a few other years than 2000 you'd likely have been
>dissatisfied with the results...
Here's something that works a bit better, at the expense of falling
in line with 'dow' and starting each week on Sunday, which I don't
mind. I'd only gone with Monday in the first place because that was
what 'ww' seemed to do.
$ cal 12 1999
December 1999
S M Tu W Th F S
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
$ cal 1 2000
January 2000
S M Tu W Th F S
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
$ cal 1 2001
January 2001
S M Tu W Th F S
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
$ D="'2000-01-01'"; psql -c "SELECT $D::date-date_part('dow',$D::date);"
?column?
------------
1999-12-26
(1 row)
$ D="'2001-01-01'"; psql -c "SELECT $D::date-date_part('dow',$D::date);"
?column?
------------
2000-12-31
(1 row)
$ D="'2000-01-15'"; psql -c "SELECT $D::date-date_part('dow',$D::date);"
?column?
------------
2000-01-09
(1 row)
$ D="'2000-01-16'"; psql -c "SELECT $D::date-date_part('dow',$D::date);"
?column?
------------
2000-01-16
(1 row)
It's also about 35% faster, and hits me with a healthy "duh!" factor
since it's so obvious and so much cleaner. The total time for my
original query has been cut by more than half - from 70 sec to 28
sec. Thanks!
>It seems like the right answer is that date_trunc('week',stamp) ought
>to do what you want, but it doesn't seem to be implemented. That's
>definitely a shortcoming --- anyone want to fix it?
You know what they say - if you want the right answer, just post the
wrong answer and wait to be corrected :-).
If you can't find fault with the query above, maybe it could be
plugged into date_trunc('week', stamp) as-is? I haven't even looked
at the source code, myself....
-- Mike
> Thanks - that seems to be about 10% faster. I also got a 25% boost > by recompiling PG7 with -O3. Do you mean you got a 25% boost by recompiling with -O3 instead of -O2, or instead of -g? If going from -O2 to -O3 gives a 25% boost, time for me to recompile. I'm just now moving from MySQL to PostgreSQL, and I'm also totally changing the data structures into something much less efficient, so I need all the speed boosts I can get. Recompiling is much cheaper than buying a new computer. e
On Sun, 28 May 2000, Michael Blakeley wrote:
> At 12:54 AM -0400 5/29/2000, Tom Lane wrote:
> >Michael Blakeley <mike@blakeley.com> writes:
> >> SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp)
> >> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
> >
> >Seems like the hard way. You are relying on an implicit conversion from
> >the float8 output of date_part() to the text input to_date expects,
> >which is kind of slow (and IMHO it's a bug in 7.0 that it will do such
> >a conversion silently, anyhow). Better to use date_trunc to save the
> >conversion step:
> >
> >SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp)
> > as week,count(*) FROM EVENTS GROUP BY week ORDER BY week;
>
> Thanks - that seems to be about 10% faster. I also got a 25% boost by
> recompiling PG7 with -O3.
Well, support for 'WW' in 'to_date()' is just in my TODO.
Probably will more faster
select to_date( to_char(now(), 'YYYY-WW'), 'YYYY-WW');
than 3 operators and 2 functions in one query.
Or best way will add support for 'start-date' per week / month / quarter to
'to_char()'. Hmm, I try it. But how format-pictires use for this?
SWW / SMM / SQ ?
IMHO, the postgreSQL will first DB with this feature :-)
Karel