Re: How pull

Поиск
Список
Период
Сортировка
От Matt Magoffin
Тема Re: How pull
Дата
Msg-id 3658.192.168.1.65.1190619473.squirrel@msqr.us
обсуждение исходный текст
Ответ на Re: How pull  ("Filip Rembiałkowski" <plk.zuber@gmail.com>)
Список pgsql-sql
> On 23/09/2007, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> Hello,
>>
>> My SQL skills are limited and I'm struggling with a query where I want
>> to
>> return a single item of an aggregate join. The query looks like this:
>>
>> select
>>   (case
>>     when agg.avg_rating is null then 0.0
>>     when agg.avg_rating < 0.75 then 0.5
>>     when agg.avg_rating < 1.25 then 1.0
>>     when agg.avg_rating < 1.75 then 1.5
>>     when agg.avg_rating < 2.25 then 2.0
>>     when agg.avg_rating < 2.75 then 2.5
>>     when agg.avg_rating < 3.25 then 3.0
>>     when agg.avg_rating < 3.75 then 3.5
>>     when agg.avg_rating < 4.25 then 4.0
>>     when agg.avg_rating < 4.75 then 4.5
>>     else 5.0
>>   end) as avg_rating,
>>   count(item.itemid) as item_count
>> from media_item item
>> inner join (
>>   select rating.mediaitem_userrating_hjid as ritemid,
>>     avg(rating.rating) as avg_rating
>>   from media_item_rating rating, media_item item
>>   where rating.mediaitem_userrating_hjid = item.itemid
>>   group by rating.mediaitem_userrating_hjid
>> ) as agg
>> on item.itemid = agg.ritemid
>> group by avg_rating
>> order by avg_rating desc
>>
>> and a sample of results is this:
>>
>>  avg_rating | item_count
>> ------------+------------
>>         5.0 |         21
>>         4.0 |         33
>>         3.0 |         13
>>         2.0 |          4
>>         1.0 |          1
>>
>> What I want as well is the ID of the item (and possibly it's avg_rating
>> value) from the "agg" join with the highest avg_rating for each output
>> row... something like this
>>
>>  avg_rating | item_count | item_id | item_rating
>> ------------+-----------------------------------
>>         5.0 |         21 | 109890  | 4.9
>>         4.0 |         33 | 89201   | 4.1
>>         3.0 |         13 | 119029  | 2.8
>>         2.0 |          4 | 182999  | 2.2
>>         1.0 |          1 | 1929    | 1.0
>>
>> So the intention in this example is that item #109890 has an average
>> rating of 4.9 and that is the highest rating within the > 4.75 rating
>> group.
>>
>> If anyone had any tips I'd greatly appreciate it.
>>
>
> create ranking function to make queries look simpler:
>
> create or replace function ranking_group(numeric) returns numeric as
> $$ select case
>   when $1 < 0.3456 then 'quite small'
>   ...
> end $$ language sql immutable;
>    (I'd make it STRICT, but you allow null rankings)
>
> 1st way: DISTINCT ON + subquery
>
> select *, (select count(*) from rating where rating_group(rating) =
> subq.rating_group ) as rating_group_size
> from (
> select distinct on (rating_group)
>     rating_group(r.rating),
>     r.item_id as best_rated_item_id,
>     r.rating as best_rating
> from rating r
> order by rating_group desc, r.rating desc
> ) subq;
>
>
> 2nd way (faster - actually 2 x faster)
> using FIRST aggregate to calculate all in one pass
>
> create function first(numeric,numeric) returns numeric as 'select $1'
> language sql immutable strict;
> create function first(integer,integer) returns integer as 'select $1'
> language sql immutable strict;
> create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
> create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );
>
> select
>   rating_group(rating),
>   count(*) as num_ratings,
>   first(item_id) as best_rated_item_id,
>   first(rating) as best_rating
> from ( select * from rating order by rating desc ) ordered_ratings
> group by rating_group
> order by rating_group desc;
>
>
> note: if you can, get rid of null ratings. what are they supposed to
> mean? they make things a bit more complicated.


Thanks very much for the helpful tips, Filip, you understand perfectly
what I'm trying to do. For those nulls, they are not actually needed with
the query I gave, you are right. I have another version of this query that
I started with that uses a left outer join instead of the inner join to
produce another grouping for all items that have no rating set (thus the
null value).

An aggregate function like first() is precisely what I need, however I am
looking for a fairly portable solution I can also use in other databases
(Derby and MySQL at a minimum). Unfortunately I don't think either of
these suggestions will work for those systems because they don't support
DISTINCT ON or custom SQL aggregate functions as nicely as Postgres does.

Also, the "avg_rating" I'm using is itself an aggregate value, but I don't
see how your second solution handles this? The rating group values come
from  the avg() calculation on a rating table, like
ratinguser | rating | item_id | idx
------------+--------+---------+-----      1098 |      2 |   11016 |   1         2 |      3 |   11016 |   0        43 |
    5 |    9021 |   2         2 |      4 |    9021 |   1      1098 |      2 |    9021 |   0
 

So here the rating for item 11016 is 2.5 and for 9021 it is 3.66.

In addition, this is not my entire SQL statement, because I need another
join to narrow the item results first... so the full SQL I currently have
is like

select (case   when agg.avg_rating < 0.75 then 0.5   when agg.avg_rating < 1.25 then 1.0   when agg.avg_rating < 1.75
then1.5   when agg.avg_rating < 2.25 then 2.0   when agg.avg_rating < 2.75 then 2.5   when agg.avg_rating < 3.25 then
3.0  when agg.avg_rating < 3.75 then 3.5   when agg.avg_rating < 4.25 then 4.0   when agg.avg_rating < 4.75 then 4.5
else5.0 end) as avg_rating, count(item.itemid) as item_count
 
from media_item item
inner join ( select mi.itemid as sitemid from media_item mi inner join album_item ai on mi.itemid =
ai.mediaitem_item_hjchildidinner join album a on ai.album_item_hjid = a.albumid where a.owner_ = 2 and a.allowanonymous
=TRUE and a.allowbrowse = TRUE
 
) as shared
on item.itemid = shared.sitemid
left outer join ( -- change to inner for only items with a rating select rating.mediaitem_userrating_hjid as ritemid,
avg(rating.rating)as avg_rating from media_item_rating rating, media_item item where rating.mediaitem_userrating_hjid =
item.itemidgroup by rating.mediaitem_userrating_hjid
 
) as agg
on item.itemid = agg.ritemid
group by avg_rating
order by avg_rating desc

Thanks again for the helpful tips.


В списке pgsql-sql по дате отправления:

Предыдущее
От: Erik Jones
Дата:
Сообщение: Re: Many databases
Следующее
От: ivan marchesini
Дата:
Сообщение: foreign key problem