Query performance
От | Joe Van Dyk |
---|---|
Тема | Query performance |
Дата | |
Msg-id | CACfv+pKyJWGcU9DjmL3QAkUcb2cUHb-0sqODmjFt-hC9=k2+Kg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Query performance
(Joe Van Dyk <joe@tanga.com>)
Re: Query performance (Marc Mamin <M.Mamin@intershop.de>) |
Список | pgsql-performance |
I have an events table that records page views and purchases (type = 'viewed' or type='purchased'). I have a query that figures out "people who bought/viewed this also bought/viewed that".
It worked fine, taking about 0.1 seconds to complete, until a few hours ago when it started taking hours to complete. Vacuum/analyze didn't help. Turned out there was one session_id that had 400k rows in the system. Deleting that made the query performant again.
Is there anything I can do to make the query work better in cases like that? Missing index, or better query?
This is on 9.3.5.
explain select e1.product_id, e2.site_id, e2.product_id, count(nullif(e2.type='viewed', false)) view_count, count(nullif(e2.type='purchased', false)) purchase_countfrom events e1join events e2 on e1.session_id = e2.session_id and e1.type = e2.typewhere e1.product_id = '82503' and e1.product_id != e2.product_idgroup by e1.product_id, e2.product_id, e2.site_id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------GroupAggregate (cost=828395.67..945838.90 rows=22110 width=19) -> Sort (cost=828395.67..840117.89 rows=4688885 width=19) Sort Key: e1.product_id, e2.product_id, e2.site_id -> Nested Loop (cost=11.85..20371.14 rows=4688885 width=19) -> Bitmap Heap Scan on events e1 (cost=11.29..1404.31 rows=369 width=49) Recheck Cond: (product_id = '82503'::citext) -> Bitmap Index Scan on events_product_id_site_id_idx (cost=0.00..11.20 rows=369 width=0) Index Cond: (product_id = '82503'::citext) -> Index Scan using events_session_id_type_product_id_idx on events e2 (cost=0.56..51.28 rows=12 width=51) Index Cond: ((session_id = e1.session_id) AND (type = e1.type)) Filter: (e1.product_id <> product_id) (11 rows) recommender_production=> \d events Table "public.events" Column | Type | Modifiers -------------+--------------------------+-----------------------------------------------------id | bigint | not null default nextval('events_id_seq'::regclass)user_id | citext |session_id | citext | not nullproduct_id | citext | not nullsite_id | citext | not nulltype | text | not nullhappened_at | timestamp with time zone | not nullcreated_at | timestamp with time zone | not null Indexes: "events_pkey" PRIMARY KEY, btree (id) "events_product_id_site_id_idx" btree (product_id, site_id) "events_session_id_type_product_id_idx" btree (session_id, type, product_id) Check constraints: "events_session_id_check" CHECK (length(session_id::text) < 255) "events_type_check" CHECK (type = ANY (ARRAY['purchased'::text, 'viewed'::text])) "events_user_id_check" CHECK (length(user_id::text) < 255)
В списке pgsql-performance по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: How to tell ANALYZE to collect statistics from the whole table?