Re: people who buy A, also buy C, D, E

Поиск
Список
Период
Сортировка
От Achilleus Mantzios
Тема Re: people who buy A, also buy C, D, E
Дата
Msg-id Pine.LNX.4.44.0504261523390.7746-100000@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на Re: people who buy A, also buy C, D, E  (Christoph Haller <ch@rodos.fzk.de>)
Список pgsql-sql
O Christoph Haller έγραψε στις Apr 26, 2005 :

> Dan Langille wrote:
> > 
> > The goal of my query is: given a book, what did other people who
> > bought this book also buy?  I plan the list the 5 most popular such
> > books.  In reality, this isn't about books, but that makes it easier
> > to understand I think.
> > 
> > We have a table of customer_id (watch_list_id) and book_id
> > (element_id).
> > 
> > freshports.org=# \d watch_list_element
> >   Table "public.watch_list_element"
> >     Column     |  Type   | Modifiers
> > ---------------+---------+-----------
> >  watch_list_id | integer | not null
> >  element_id    | integer | not null
> > Indexes:
> >     "watch_list_element_pkey" primary key, btree (watch_list_id,
> > element_id)
> >     "watch_list_element_element_id" btree (element_id)
> > Foreign-key constraints:
> >     "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON
> > UPDATE CASCADE ON DELETE CASCADE
> >     "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE
> > CASCADE ON DELETE CASCADE
> > 
> > freshports.org=#
> > 
> > I have a query which returns the needed results:
> > 
> >      SELECT W.element_id
> >        FROM watch_list_element W
> >       WHERE w.watch_list_id in (select watch_list_id from
> > watch_list_element where element_id = 54968)
> >    GROUP BY W.element_id
> >    ORDER BY count(W.watch_list_id) DESC
> >       LIMIT 5;
> > 
> > But performance is an issue here.  So I'm planning to calculate all
> > the possible values and cache them. That is, given each element_id in
> > a watch_list, what are the top 5 element_id values on all the lists
> > on which the original element_id appears?
> > 
> > I'm having trouble constructing the query.  I'm not even sure I can
> > do this in one select, but that would be nice.  Examples and clues
> > are appreciated.
> > 
> > Any ideas?
> > 
> > Thank you.
> > --
> 
> Just two ideas. 
> 
> 1) Older Postgres versions are notorious for being slow 
> on "IN" clauses. 
> Does this one (untested) perform better: 
> 
> SELECT W.element_id, count(W.watch_list_id)
>   FROM watch_list_element W
> WHERE EXISTS
> (SELECT * FROM watch_list_element E
>  WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id)
> GROUP BY W.element_id
> ORDER BY 2 DESC
> LIMIT 5;
> 
> 2) I suspect calculating all possible values would require time and 
> an enormous cache buffer in size as well as re-calculating pretty often. 
> So my approach would be trying to tune the query before introducing 
> cached results. 

AFAIK, problems like this fall into the "Data Mining" field,
and often their solution go beyond some DB arrangments.
A little research wouldn't hurt, IMO.

> 
> HTH
> 
> Regards, Christoph
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 

-- 
-Achilleus



В списке pgsql-sql по дате отправления:

Предыдущее
От: "Dan Langille"
Дата:
Сообщение: Re: people who buy A, also buy C, D, E
Следующее
От: Christoph Haller
Дата:
Сообщение: Re: people who buy A, also buy C, D, E