Using indexes to speed up join?

Поиск
Список
Период
Сортировка
От James David Smith
Тема Using indexes to speed up join?
Дата
Msg-id CAMu32ADQcufCFfSsoeQ7rUVwscUEZD3dkO0-guL5CJznsQEqHA@mail.gmail.com
обсуждение исходный текст
Список pgsql-novice
Hi there,

The below query works exactly how I want it too, but it's too slow.
Running it on a few hundred rows (LIMIT 100) takes a second or two,
but when I run it against 4.2 rows it takes a long time. I stopped it
after about an hour. I'd appreciate some advice on the most effective
way to use indexes to improve it please? The query, as well as the
result of 'EXPLAIN ANALYZE' are shown below. I can see it's doing some
sequential scans which I know are bad... but?

Cheers

James

QUERY:
-------------------------------------------------------
SELECT            unique_hybrid_hour_locations.*,
            CASE WHEN    pm25_io_ratio IS NULL
            THEN         monthly_london_average_pm25_io_ratio
            ELSE        pm25_io_ratio
            END AS        pm25_io_ratio
FROM            unique_hybrid_hour_locations
-- This mini sub query joins the IO PM2.5 ratio from the ratios table.
The join is done by month and by location.
LEFT JOIN        io_postcode_ratios
            ON        EXTRACT(MONTH FROM time) = io_postcode_ratios.month_num
            AND        ST_Within(
st_transform(st_setsrid(st_makepoint(lon, lat),4326),27700),
                            io_postcode_ratios.the_geom
                            ) = true
-- This mini sub query calculates the monthly averages for London
-- and joins them depending on the month. They're used if can't get
the proper one.
LEFT JOIN        (
            SELECT        month_num,
                    avg(pm25_io_ratio) as monthly_london_average_pm25_io_ratio
            FROM        io_postcode_ratios
            GROUP BY    month_num
            ORDER BY    month_num) a
            ON        a.month_num = EXTRACT(MONTH FROM time)
-------------------------------------------------------

EXPLAIN ANALYZE:
"Hash Left Join  (cost=902914.76..70538171.09 rows=4760537 width=98)"
"  Hash Cond: (date_part('month'::text,
unique_hybrid_hour_locations."time") = (a.month_num)::double
precision)"
"  ->  Merge Left Join  (cost=900483.64..70503130.29 rows=4760537 width=66)"
"        Merge Cond: ((date_part('month'::text,
unique_hybrid_hour_locations."time")) =
((io_postcode_ratios.month_num)::double precision))"
"        Join Filter:
((st_transform(st_setsrid(st_makepoint(unique_hybrid_hour_locations.lon,
unique_hybrid_hour_locations.lat), 4326), 27700) &&
io_postcode_ratios.the_geom) AND
_st_contains(io_postcode_ratios.the_geom,
st_transform(st_setsrid(st_makepoin (...)"
"        ->  Sort  (cost=848686.52..860587.86 rows=4760537 width=58)"
"              Sort Key: (date_part('month'::text,
unique_hybrid_hour_locations."time"))"
"              ->  Seq Scan on unique_hybrid_hour_locations
(cost=0.00..141692.37 rows=4760537 width=58)"
"        ->  Materialize  (cost=51797.11..51847.93 rows=10164 width=11199)"
"              ->  Sort  (cost=51797.11..51822.52 rows=10164 width=11199)"
"                    Sort Key: ((io_postcode_ratios.month_num)::double
precision)"
"                    ->  Seq Scan on io_postcode_ratios
(cost=0.00..2379.64 rows=10164 width=11199)"
"  ->  Hash  (cost=2430.98..2430.98 rows=12 width=36)"
"        ->  Subquery Scan on a  (cost=2430.83..2430.98 rows=12 width=36)"
"              ->  Sort  (cost=2430.83..2430.86 rows=12 width=12)"
"                    Sort Key: io_postcode_ratios_1.month_num"
"                    ->  HashAggregate  (cost=2430.46..2430.61 rows=12
width=12)"
"                          ->  Seq Scan on io_postcode_ratios
io_postcode_ratios_1  (cost=0.00..2379.64 rows=10164 width=12)"


В списке pgsql-novice по дате отправления:

Предыдущее
От: Matthew Foster - NOAA Federal
Дата:
Сообщение: Re: Logging behavior not changing on reload
Следующее
От: Michael Rowan
Дата:
Сообщение: Case insensitive unique constraint