Обсуждение: Two optimization questions

Поиск
Список
Период
Сортировка

Two optimization questions

От
Mezei Zoltán
Дата:
Hi,

I think it can be done better than I did and I want to learn...


1. I have a table that registers the history of messages:
output_message_history(id, event_type, event_time)
I need those ID-s from the table where there is one 'MESSAGE SENT' event 
and one 'MESSAGE SUBMITTED' event and there are no more events on that 
message.

select id
from output_message_history
group by content_id
having
sum(case when event_type='MESSAGE SENT' then 1 else 0 end) = 1
and
sum(case when event_type='MESSAGE SUBMITTED' then 1 else 0 end) = 1
and count(*) = 2

Can it be done another, more optimal way?


2. I have to create reports like this one:

time       count
12:00      3
12:01      2
12:02      4
12:03      0  <--- now this one is problematic for me
12:04      5

So I need something like this:

select date_trunc('minute', crd), count(*) from subscriber
where crd between '2006-09-08' and '2006-09-12'
group by date_trunc('minute', crd)

But the output of this query won't show minutes with 0 count. I searched 
the archives and found an example using a view to solve this problem, 
but creating such a view for grouping by minutes or seconds doesn't seem 
to be nice solution. Any ideas how to solve this problem?

Thanks for any answers,

Zizi


Re: Two optimization questions

От
"Aaron Bono"
Дата:
On 9/12/06, Mezei Zoltán <mezei.zoltan@telefor.hu> wrote:
Hi,

I think it can be done better than I did and I want to learn...


1. I have a table that registers the history of messages:
output_message_history(id, event_type, event_time)
I need those ID-s from the table where there is one 'MESSAGE SENT' event
and one 'MESSAGE SUBMITTED' event and there are no more events on that
message.

select id
from output_message_history
group by content_id
having
sum(case when event_type='MESSAGE SENT' then 1 else 0 end) = 1
and
sum(case when event_type='MESSAGE SUBMITTED' then 1 else 0 end) = 1
and count(*) = 2

Can it be done another, more optimal way?

You could split it into sub-queries but would that make the performance better or worse?  I guess it depends on how much data is there, and what frequency you have ot the event_type's but indexing the event_type column would help.  This may be worth a try - use EXPLAIN to see which is better.

SELECT id
FROM output_message_history
WHERE NOT content_id IN (
   SELECT content_id -- distinct(content_id)
   FROM output_message_history
   WHERE NOT event_type IN ('MESSAGE SENT', 'MESSAGE SUBMITTED')
)
GROUP BY content_id
HAVING
   sum(case when event_type='MESSAGE SENT' then 1 else 0 end) = 1
   AND
   sum(case when event_type='MESSAGE SUBMITTED' then 1 else 0 end) = 1

2. I have to create reports like this one:

time       count
12:00      3
12:01      2
12:02      4
12:03      0  <--- now this one is problematic for me
12:04      5

So I need something like this:

select date_trunc('minute', crd), count(*) from subscriber
where crd between '2006-09-08' and '2006-09-12'
group by date_trunc('minute', crd)

But the output of this query won't show minutes with 0 count. I searched
the archives and found an example using a view to solve this problem,
but creating such a view for grouping by minutes or seconds doesn't seem
to be nice solution. Any ideas how to solve this problem?

 
Questions like this come up frequently and there are some nice solutions:

See if this gives you some insight:

select
    '2006-01-15'::date + s.inc
from
    generate_series(0, ('2006-02-20'::date - '2006-01-15'::date)::integer) as s(inc)

See http://www.postgresql.org/docs/8.1/static/functions-srf.html and http://www.postgresql.org/docs/8.0/interactive/functions-datetime.html for more.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Two optimization questions

От
Mezei Zoltán
Дата:
Aaron Bono wrote:
> You could split it into sub-queries but would that make the 
> performance better or worse?  I guess it depends on how much data is 
> there, and what frequency you have ot the event_type's but indexing 
> the event_type column would help.  This may be worth a try - use 
> EXPLAIN to see which is better.
>
Yep, I tought of that, but the solution with having is faster now. I 
think it is because about 30% of the rows are selected in the subquery 
now. Later we will have more records, and the subselect should select 
only about 1-2% of the rows. At that time, I think your solution will be 
faster.

> Questions like this come up frequently and there are some nice solutions:
>
> See if this gives you some insight:
>
> select
>     '2006-01-15'::date + s.inc
> from
>     generate_series(0, ('2006-02-20'::date - 
> '2006-01-15'::date)::integer) as s(inc)
Sup, that's just what I needed. Thank you.

Zizi