Обсуждение: N-tile function in postgres

Поиск
Список
Период
Сортировка

N-tile function in postgres

От
Rachel Owsley
Дата:

Hi,

 

Can anyone help me with an aggregate query I am having trouble with?

 

I want to get the top 5 or top 10 most frequently shopped in merchant categories for each account holder at a bank and put each of the quintiles/deciles into separate columns. I would also like to put the average transaction amount for each of those top 5-10 categories into separate columns, and the date of the last transaction in each of those 5 to 10 categories into separate columns. I am told that ntile may be an option for doing this, but can’t find any examples for using it in the documentation.

 

We use Postgresql 9.1.

 

Thank you so much for your help.

 

Rachel

Re: N-tile function in postgres

От
François Beausoleil
Дата:

Le 2012-09-24 à 12:32, Rachel Owsley a écrit :

Hi,
 
Can anyone help me with an aggregate query I am having trouble with?
 
I want to get the top 5 or top 10 most frequently shopped in merchant categories for each account holder at a bank and put each of the quintiles/deciles into separate columns. I would also like to put the average transaction amount for each of those top 5-10 categories into separate columns, and the date of the last transaction in each of those 5 to 10 categories into separate columns. I am told that ntile may be an option for doing this, but can’t find any examples for using it in the documentation.
 
We use Postgresql 9.1.
 
Thank you so much for your help.
 
Rachel

Hi!

Look at the tablefunc extension to do cross tabulation. The crosstab family of functions turn a series of rows into columns. Something like this:

a | 1
b | 2

a  |  b
1  |  2

It obviously works with more columns. That would take care of the final part of your query.

I've never used ntile() myself, but the docs say it returns 1 to the value. Then you may want the min/max amount per decile to extract the values you want. Something like this (untested, made up schema):

WITH raw_values(
SELECT

    account_id
  , merchant_category_id
  , amount

FROM transactions
  JOIN merchants USING (merchant_id))

, partitioned_sales AS (
SELECT
    account_id
  , merchant_category_id
  , ntile(10) over (partition by account_id, merchant_category_id order by amount) as "partition"
  , min(amount) over (partition by account_id, merchant_category_id order by amount) as amount
FROM raw_values)

SELECT *
FROM partitioned_sales
ORDER BY account_id, merchant_category_id, partition, amount

Hope that helps!
François

Re: N-tile function in postgres

От
Rachel Owsley
Дата:

Thank you, François! This is very helpful! I’ll give this query a try. I don’t know the cross-tab function, but that’s exactly what I want to do for the column output. Regarding the sample query, I see the min (amount), but how is the upper bound defined for each decile?

 

Thanks,

 

Rachel  

 

From: François Beausoleil [mailto:francois@teksol.info]
Sent: Monday, September 24, 2012 12:57 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] N-tile function in postgres

 

 

Le 2012-09-24 à 12:32, Rachel Owsley a écrit :



Hi,

 

Can anyone help me with an aggregate query I am having trouble with?

 

I want to get the top 5 or top 10 most frequently shopped in merchant categories for each account holder at a bank and put each of the quintiles/deciles into separate columns. I would also like to put the average transaction amount for each of those top 5-10 categories into separate columns, and the date of the last transaction in each of those 5 to 10 categories into separate columns. I am told that ntile may be an option for doing this, but can’t find any examples for using it in the documentation.

 

We use Postgresql 9.1.

 

Thank you so much for your help.

 

Rachel

 

Hi!

 

Look at the tablefunc extension to do cross tabulation. The crosstab family of functions turn a series of rows into columns. Something like this:

 

a | 1

b | 2

 

a  |  b

1  |  2

 

It obviously works with more columns. That would take care of the final part of your query.

 

I've never used ntile() myself, but the docs say it returns 1 to the value. Then you may want the min/max amount per decile to extract the values you want. Something like this (untested, made up schema):

 

WITH raw_values(

SELECT

 

    account_id

  , merchant_category_id

  , amount

 

FROM transactions

  JOIN merchants USING (merchant_id))

 

, partitioned_sales AS (

SELECT

    account_id

  , merchant_category_id

  , ntile(10) over (partition by account_id, merchant_category_id order by amount) as "partition"

  , min(amount) over (partition by account_id, merchant_category_id order by amount) as amount

FROM raw_values)

 

SELECT *

FROM partitioned_sales

ORDER BY account_id, merchant_category_id, partition, amount

 

Hope that helps!

François

Re: N-tile function in postgres

От
François Beausoleil
Дата:

Le 2012-09-24 à 14:12, Rachel Owsley a écrit :

Thank you, François! This is very helpful! I’ll give this query a try. I don’t know the cross-tab function, but that’s exactly what I want to do for the column output. Regarding the sample query, I see the min (amount), but how is the upper bound defined for each decile?

ntile() splits the output in as even partitions as possible. If you have 13 rows, and you want 10 output rows, then each row will receive something like this:

# select id, ntile(10) over () from generate_series(1, 13) as t1(id);
 id | ntile 
----+-------
  1 |     1
  2 |     1
  3 |     2
  4 |     2
  5 |     3
  6 |     3
  7 |     4
  8 |     5
  9 |     6
 10 |     7
 11 |     8
 12 |     9
 13 |    10

The ntile() function isn't tied to the values at all: only to the actual number of rows. I used min(amount) to get the minimal value per group, but you can use use max(amount) to get the other end as well.

Bye!
François

Re: N-tile function in postgres

От
Rachel Owsley
Дата:

Thank you, François!! Got it. J

 

 

From: François Beausoleil [mailto:francois@teksol.info]
Sent: Monday, September 24, 2012 3:37 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] N-tile function in postgres

 

 

Le 2012-09-24 à 14:12, Rachel Owsley a écrit :



Thank you, François! This is very helpful! I’ll give this query a try. I don’t know the cross-tab function, but that’s exactly what I want to do for the column output. Regarding the sample query, I see the min (amount), but how is the upper bound defined for each decile?

 

ntile() splits the output in as even partitions as possible. If you have 13 rows, and you want 10 output rows, then each row will receive something like this:

 

# select id, ntile(10) over () from generate_series(1, 13) as t1(id);

 id | ntile 

----+-------

  1 |     1

  2 |     1

  3 |     2

  4 |     2

  5 |     3

  6 |     3

  7 |     4

  8 |     5

  9 |     6

 10 |     7

 11 |     8

 12 |     9

 13 |    10

 

The ntile() function isn't tied to the values at all: only to the actual number of rows. I used min(amount) to get the minimal value per group, but you can use use max(amount) to get the other end as well.

 

Bye!

François