Re: how to effectively SELECT new "customers"

Поиск
Список
Период
Сортировка
От Jan Ostrochovsky
Тема Re: how to effectively SELECT new "customers"
Дата
Msg-id 976203811.595346.1393587926835.JavaMail.root@mobiletech.sk
обсуждение исходный текст
Ответ на Re: how to effectively SELECT new "customers"  (David Johnston <polobo@yahoo.com>)
Ответы Re: how to effectively SELECT new "customers"  (David Johnston <polobo@yahoo.com>)
Список pgsql-sql

From: "David Johnston" <polobo@yahoo.com>
To: pgsql-sql@postgresql.org
Sent: Thursday, February 27, 2014 4:09:34 PM
Subject: Re: [SQL] how to effectively SELECT new "customers"

Jan Ostrochovsky wrote
> Hello, I am solving following task and it seems hard to me to find
> effective solution. Maybe somebody knows how to help me: We have table
> "purchases" and each one record is identified by "customer_id". We want to
> know not only how many different customers did at least one purchase per
> time period, grouped by time periods (easy task: "COUNT(DISTINCT
> customer_id)" with "GROUP BY period"), but also to know how many NEW
> customers there were. We define new customer as customer_id, which had
> first record in table "purchases" after 12 month of inactivity (no record
> in table "purchases" previous 12 months). I have found one solution, but
> it is very slow and ugly. I tried several other concepts, but without
> success. Any hint could be helpful for me. Thanks in advance! Jano

Without incorporating additional meta-data about the purchases onto the
customer table the most basic solution would be:

SELECT DISTINCT customer_id FROM products WHERE date > (now() - '12
months'::interval)
EXCEPT
SELECT DISTINCT customer_id FROM products WHERE date <= (now() - '12
months'::interval)

---

Another solution:
WHERE ... >12 AND NOT EXISTS (SELECT ... WHERE <= 12)

---

Depending on the frequency that you need to run this query it may be
worthwhile to create a materialized view that captures the necessary data
and then whenever a new sale is generated you simply update that view by
changing the attributes of that single customer.  At any point you can
quickly determine, using the view, which customers were active at the time
of last purchases and which ones were dormant or non-existent.

David J.
Thank you David, your alternatives seem as smart alternatives of my solution, what is based on the difference between COUNT(DISTINCT customer_id) in e.g. 13 months (in case when reporting period is 1 month over previous 12 months) and COUNT(DISTINCT customer_id) in 12 months prior to reporting period (but without the reporting period). But I would also like to incorporate other data in the output, which should seem e.g.:

town, period, payment_channel_id, purchases_count, purchases_total_revenue, distinct_customer_count, distinct_NEW_customer_count
BA, 201401, A, 495, 847, 103, 25
BA, 201401, B, 345, 456, 99, 21
BA, 201312, A, 554, 1021, 105, 30
ZA, 201401, A, 323, 987, 95, 23
...

there is 12-month-long time-window for new customer determination, sliding according to particular period, back to the past

how to achieve this? e.g. by running your or my concept repeatedly (manually or in the higher level loop), but this is too slow... or to find more sophisticated SQL query to accomplish this in one step... any other ideas for such SQL query running once, not repeatedly (in loop) for each reporting period? (I was considering window functions, connectby, grouping sets, ..., but no success yet)

materialized view could be part of that solution, when we will have the right query and uprgade from 9.0 to 9.3 (as I see it: http://www.postgresql.org/docs/9.3/static/rules-materializedviews.html)... but it should be scalable to several to tens purchases every minute

thanks again for your contribution, David

Jano

В списке pgsql-sql по дате отправления:

Предыдущее
От: Jan Ostrochovsky
Дата:
Сообщение: Re: how to effectively SELECT new "customers"
Следующее
От: ALMA TAHIR
Дата:
Сообщение: Re: Function Issue