Re: people who buy A, also buy C, D, E
От | Achilleus Mantzios |
---|---|
Тема | Re: people who buy A, also buy C, D, E |
Дата | |
Msg-id | Pine.LNX.4.44.0504261523390.7746-100000@matrix.gatewaynet.com обсуждение исходный текст |
Ответ на | Re: people who buy A, also buy C, D, E (Christoph Haller <ch@rodos.fzk.de>) |
Список | pgsql-sql |
O Christoph Haller έγραψε στις Apr 26, 2005 : > Dan Langille wrote: > > > > The goal of my query is: given a book, what did other people who > > bought this book also buy? I plan the list the 5 most popular such > > books. In reality, this isn't about books, but that makes it easier > > to understand I think. > > > > We have a table of customer_id (watch_list_id) and book_id > > (element_id). > > > > freshports.org=# \d watch_list_element > > Table "public.watch_list_element" > > Column | Type | Modifiers > > ---------------+---------+----------- > > watch_list_id | integer | not null > > element_id | integer | not null > > Indexes: > > "watch_list_element_pkey" primary key, btree (watch_list_id, > > element_id) > > "watch_list_element_element_id" btree (element_id) > > Foreign-key constraints: > > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > > UPDATE CASCADE ON DELETE CASCADE > > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > > CASCADE ON DELETE CASCADE > > > > freshports.org=# > > > > I have a query which returns the needed results: > > > > SELECT W.element_id > > FROM watch_list_element W > > WHERE w.watch_list_id in (select watch_list_id from > > watch_list_element where element_id = 54968) > > GROUP BY W.element_id > > ORDER BY count(W.watch_list_id) DESC > > LIMIT 5; > > > > But performance is an issue here. So I'm planning to calculate all > > the possible values and cache them. That is, given each element_id in > > a watch_list, what are the top 5 element_id values on all the lists > > on which the original element_id appears? > > > > I'm having trouble constructing the query. I'm not even sure I can > > do this in one select, but that would be nice. Examples and clues > > are appreciated. > > > > Any ideas? > > > > Thank you. > > -- > > Just two ideas. > > 1) Older Postgres versions are notorious for being slow > on "IN" clauses. > Does this one (untested) perform better: > > SELECT W.element_id, count(W.watch_list_id) > FROM watch_list_element W > WHERE EXISTS > (SELECT * FROM watch_list_element E > WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) > GROUP BY W.element_id > ORDER BY 2 DESC > LIMIT 5; > > 2) I suspect calculating all possible values would require time and > an enormous cache buffer in size as well as re-calculating pretty often. > So my approach would be trying to tune the query before introducing > cached results. AFAIK, problems like this fall into the "Data Mining" field, and often their solution go beyond some DB arrangments. A little research wouldn't hurt, IMO. > > HTH > > Regards, Christoph > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- -Achilleus
В списке pgsql-sql по дате отправления: