Re: Need help optimizing this query

Поиск
Список
Период
Сортировка
От Pat Maddox
Тема Re: Need help optimizing this query
Дата
Msg-id 810a540e0707181534w267ae41au570ed0ca1a4edf84@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Need help optimizing this query  (Michael Glaesemann <grzm@seespotcode.net>)
Ответы Re: Need help optimizing this query  (Michael Glaesemann <grzm@seespotcode.net>)
Список pgsql-general
On 7/18/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>
> On Jul 18, 2007, at 16:48 , Pat Maddox wrote:
>
> > For some reason the functions you wrote are giving me trouble (there's
> > a BIGINT involved, I tried changing the functions around but kept
> > having issues).
>
> You might try these, if you're interested.
>
> CREATE OR REPLACE FUNCTION ternary(BOOLEAN, BIGINT, BIGINT)
> RETURNS BIGINT
> LANGUAGE SQL AS $_$SELECT CASE WHEN $1 THEN $2 ELSE $3 END$_$;
>
> CREATE OR REPLACE FUNCTION value_when(BOOLEAN, BIGINT)
> RETURNS BIGINT
> LANGUAGE SQL AS $_$SELECT ternary($1,$2,0)$_$;
>
> CREATE OR REPLACE FUNCTION one_when(BOOLEAN)
> RETURNS BIGINT
> LANGUAGE SQL as $_$SELECT value_when($1,1)$_$;
>
> > So here's the full query, hopefully formatted better:
>
> I'm still curious about why the planner is choosing a hash join over
> using the indexes on the foreign keys, but that might be because the
> tables are relatively small.
>
> > That one runs reasonably fine, because there are only 20 videos being
> > returned and a handful of video views associated with them.  In the
> > real query there are about 1k videos and a couple million views.  That
> > took about 80 minutes to run, according to logs.
>
> The planner will choose different plans based on, among other things,
> what it estimates the size of the result to be, so while looking at a
> small example query might seem like a way to go about looking at
> what's going on, it's most likely not going to give you an accurate
> representation of the situation. Are you looking at two different
> systems (e.g., a development system versus a production system) or
> just choosing a smaller query on the same system? If you can't run
> the query on your production system, you may want to take a dump of
> the production system and set it up on another box. Even with a
> couple million rows in the video_views table, PostgreSQL shouldn't
> really blink too much, as long as the server is tuned properly, the
> hardware is adequate, and the database statistics are up to date.
>
> Michael Glaesemann
> grzm seespotcode net
>
>
>
>

Sorry, I mentioned that it took 90 seconds to run the query but I
didn't show that EXPLAIN ANALYZE output.

Here it is, same query just with a different company_id:

 Limit  (cost=879283.07..879283.32 rows=100 width=64) (actual
time=92486.858..92486.891 rows=100 loops=1)
   ->  Sort  (cost=879283.07..879297.15 rows=5632 width=64) (actual
time=92486.856..92486.867 rows=100 loops=1)
         Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND
(video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END)
         ->  GroupAggregate  (cost=803054.95..878932.21 rows=5632
width=64) (actual time=67145.471..92484.408 rows=730 loops=1)
               ->  Sort  (cost=803054.95..809363.98 rows=2523610
width=64) (actual time=67076.407..75441.274 rows=5799447 loops=1)
                     Sort Key: videos.id, videos.title, videos.guid
                     ->  Hash Join  (cost=1220.63..237115.16
rows=2523610 width=64) (actual time=31.230..11507.406 rows=5799447
loops=1)
                           Hash Cond: (video_views.asset_id = assets.id)
                           ->  Hash Join  (cost=535.62..179627.88
rows=2525294 width=60) (actual time=13.286..7621.950 rows=5799447
loops=1)
                                 Hash Cond: (video_views.video_id = videos.id)
                                 ->  Seq Scan on video_views
(cost=0.00..101352.70 rows=5998470 width=12) (actual
time=0.023..2840.718 rows=5998341 loops=1)
                                 ->  Hash  (cost=465.23..465.23
rows=5632 width=52) (actual time=13.216..13.216 rows=5712 loops=1)
                                       ->  Seq Scan on videos
(cost=0.00..465.23 rows=5632 width=52) (actual time=0.038..9.060
rows=5712 loops=1)
                                             Filter: (company_id = 11)
                           ->  Hash  (cost=487.78..487.78 rows=15778
width=12) (actual time=17.876..17.876 rows=15778 loops=1)
                                 ->  Seq Scan on assets
(cost=0.00..487.78 rows=15778 width=12) (actual time=0.032..10.880
rows=15778 loops=1)
 Total runtime: 92548.006 ms
(17 rows)

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Update of table lags execution of statement by >1 minute?
Следующее
От: "Pg Coder"
Дата:
Сообщение: IN clause performance