Обсуждение: How to get most frequent and least frequent values in a column?

Поиск
Список
Период
Сортировка

How to get most frequent and least frequent values in a column?

От
Matthew Wilson
Дата:
I'm a noob SQL user, crossing over from SAS.  I have a table with about
200k rows and one of the columns is empssn, which holds the employee
social security number.  The same empssn may appear in lots of different
rows.  I want to get a list of the 40 top empssns, sorted by the number
of times they appear in the table. I also want a list of the very rarest
empssns (ones that only appear once or twice).

Can anyone help me with this?  BTW, this isn't a homework problem.

TIA.

Matt

Re: How to get most frequent and least frequent values in a column?

От
Steve Atkins
Дата:
On Mon, Sep 20, 2004 at 02:27:41PM +0000, Matthew Wilson wrote:
> I'm a noob SQL user, crossing over from SAS.  I have a table with about
> 200k rows and one of the columns is empssn, which holds the employee
> social security number.  The same empssn may appear in lots of different
> rows.  I want to get a list of the 40 top empssns, sorted by the number
> of times they appear in the table. I also want a list of the very rarest
> empssns (ones that only appear once or twice).
>
> Can anyone help me with this?  BTW, this isn't a homework problem.

  select empssn, count(*) from table
         group by empssn
         order by count(*) desc limit 40;

and

  select empssn, count(*) from table
         group by empssn
         having count(*) < 3;

may be close to what you're looking for.

Cheers,
  Steve

Re: How to get most frequent and least frequent values in a column?

От
stig erikson
Дата:
Steve Atkins wrote:
> On Mon, Sep 20, 2004 at 02:27:41PM +0000, Matthew Wilson wrote:
>
>>I'm a noob SQL user, crossing over from SAS.  I have a table with about
>>200k rows and one of the columns is empssn, which holds the employee
>>social security number.  The same empssn may appear in lots of different
>>rows.  I want to get a list of the 40 top empssns, sorted by the number
>>of times they appear in the table. I also want a list of the very rarest
>>empssns (ones that only appear once or twice).
>>
>>Can anyone help me with this?  BTW, this isn't a homework problem.
>
>
>   select empssn, count(*) from table
>          group by empssn
>          order by count(*) desc limit 40;
>
> and
>
>   select empssn, count(*) from table
>          group by empssn
>          having count(*) < 3;
>
> may be close to what you're looking for.
>
> Cheers,
>   Steve
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


to get the 10 rarest empssns, it should be fine to do:

SELECT empssn, count(*) from table
        GROUP BY empssn
        ORDER BY count(*) ASC LIMIT 10;


the only thing here change from the previous post is the ASC (ascending)
or DESC (descending) sort order.
the having count(*) < 3 is good but will only return rows if there are
empssn that only occur less then  3 times in the table.
however with a table with 200 000 records there is a chance/risk that
not any empssn occurs less then 3 times, in such a case the query with
the HAVING clause will return zero rows whereas the LIMIT 10 still will
give the 10 least frequent.