Re: Query performance

Поиск
Список
Период
Сортировка
От Joe Van Dyk
Тема Re: Query performance
Дата
Msg-id CACfv+pJAMmk7kTFnCU6gVCZY4Q5Fz71T+o_CYw7Zha4v-pSO+w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query performance  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Query performance
Список pgsql-performance
On Sat, Jan 24, 2015 at 11:14 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-01-25 7:38 GMT+01:00 Joe Van Dyk <joe@tanga.com>:


On Sat, Jan 24, 2015 at 10:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

this plan looks well

Regards

Pavel

Here's one that's not quite as well: http://explain.depesz.com/s/SgT

I see a possible issue

(product_id <> '81716'::citext) .. this operation is CPU expensive and maybe nonsense

product_id should be integer -- and if it isn't - it should not be on 4M rows extremly fast - mainly on citext

try to force a opposite cast - you will safe a case insensitive text comparation

product_id::int <> 81716

It might not always be an integer, just happens to be so here. Should I try text instead? I don't have to have the case-insensitive matching.

Joe
 

Regards

Pavel


 

Joe
 

2015-01-25 6:45 GMT+01:00 Joe Van Dyk <joe@tanga.com>:
Oops, didn't run vacuum analyze after deleting the events. Here is another 'explain analyze': http://explain.depesz.com/s/AviN

On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk <joe@tanga.com> wrote:
On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk <joe@tanga.com> wrote:
I have an events table that records page views and purchases (type = 'viewed' or type='purchased'). I have a query that figures out "people who bought/viewed this also bought/viewed that".

It worked fine, taking about 0.1 seconds to complete, until a few hours ago when it started taking hours to complete. Vacuum/analyze didn't help.  Turned out there was one session_id that had 400k rows in the system. Deleting that made the query performant again. 

Is there anything I can do to make the query work better in cases like that? Missing index, or better query?

This is on 9.3.5.

The below is reproduced at the following URL if it's not formatted correctly in the email. https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt

explain  select  e1.product_id,  e2.site_id,  e2.product_id,  count(nullif(e2.type='viewed', false)) view_count,  count(nullif(e2.type='purchased', false)) purchase_countfrom events e1join events e2 on e1.session_id = e2.session_id and e1.type = e2.typewhere  e1.product_id = '82503' and  e1.product_id != e2.product_idgroup by e1.product_id, e2.product_id, e2.site_id;                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------GroupAggregate  (cost=828395.67..945838.90 rows=22110 width=19)  ->  Sort  (cost=828395.67..840117.89 rows=4688885 width=19)        Sort Key: e1.product_id, e2.product_id, e2.site_id        ->  Nested Loop  (cost=11.85..20371.14 rows=4688885 width=19)              ->  Bitmap Heap Scan on events e1  (cost=11.29..1404.31 rows=369 width=49)                    Recheck Cond: (product_id = '82503'::citext)                    ->  Bitmap Index Scan on events_product_id_site_id_idx  (cost=0.00..11.20 rows=369 width=0)                          Index Cond: (product_id = '82503'::citext)              ->  Index Scan using events_session_id_type_product_id_idx on events e2  (cost=0.56..51.28 rows=12 width=51)                    Index Cond: ((session_id = e1.session_id) AND (type = e1.type))                    Filter: (e1.product_id <> product_id)
(11 rows)

recommender_production=> \d events                                   Table "public.events"  Column    |           Type           |                      Modifiers
-------------+--------------------------+-----------------------------------------------------id          | bigint                   | not null default nextval('events_id_seq'::regclass)user_id     | citext                   |session_id  | citext                   | not nullproduct_id  | citext                   | not nullsite_id     | citext                   | not nulltype        | text                     | not nullhappened_at | timestamp with time zone | not nullcreated_at  | timestamp with time zone | not null
Indexes:   "events_pkey" PRIMARY KEY, btree (id)   "events_product_id_site_id_idx" btree (product_id, site_id)   "events_session_id_type_product_id_idx" btree (session_id, type, product_id)
Check constraints:   "events_session_id_check" CHECK (length(session_id::text) < 255)   "events_type_check" CHECK (type = ANY (ARRAY['purchased'::text, 'viewed'::text]))   "events_user_id_check" CHECK (length(user_id::text) < 255)



After removing the session with 400k events, I was able to do an explain analyze, here is one of them:





В списке pgsql-performance по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Query performance
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Query performance