Re: PostgreSQL and Linux 2.6 kernel.

Поиск
Список
Период
Сортировка
От Gary Doades
Тема Re: PostgreSQL and Linux 2.6 kernel.
Дата
Msg-id E7C9E6AB2B6A@gpdnet.co.uk
обсуждение исходный текст
Ответ на PostgreSQL and Linux 2.6 kernel.  ("Gary Doades" <gpd@gpdnet.co.uk>)
Список pgsql-performance
Following on from Josh's response and my previous reply on SQLServer planning.

The main problem query is this one:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS,
SC.MIN_HOURS,
 (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON
(MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND
MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID =
SC.CONTRACT_ID) AS VISITS,
(SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R
 JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID)
 JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE
B.CONTRACT_ID = SC.CONTRACT_ID
 AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS
FROM VSTAFF VS
JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID)
JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID)
WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN
(SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C
WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND
P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE
SEARCH_ID = 1)  AND C.AVAIL_DATE_FROM <= '2004-06-12' AND
C.AVAIL_DATE_TO >= '2004-06-18'  GROUP BY C.CONTRACT_ID
 HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID)
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1)))

The explain analyze is:
QUERY PLAN
Nested Loop  (cost=101.54..1572059.57 rows=135 width=152) (actual
time=13749.100..1304586.501 rows=429 loops=1)
  InitPlan
    ->  Index Scan using fk_idx_wruserarea on wruserarea  (cost=3.26..6.52 rows=1
width=4) (actual time=0.944..0.944 rows=1 loops=1)
          Index Cond: (area_id = 1)
          Filter: (uid = $4)
          InitPlan
            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (actual
time=0.686..0.691 rows=1 loops=1)
                  Filter: ((username)::name = "current_user"())
  ->  Hash Join  (cost=95.02..3701.21 rows=215 width=138) (actual
time=100.476..1337.392 rows=429 loops=1)
        Hash Cond: ("outer".staff_id = "inner".staff_id)
        Join Filter: (subplan)
        ->  Seq Scan on staff_contract sc  (cost=0.00..33.24 rows=1024 width=37) (actual
time=0.114..245.366 rows=1024 loops=1)
        ->  Hash  (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563
rows=0 loops=1)
              ->  Hash Join  (cost=47.47..93.95 rows=430 width=109) (actual
time=15.502..36.627 rows=429 loops=1)
                    Hash Cond: ("outer".staff_id = "inner".staff_id)
                    ->  Seq Scan on staff  (cost=34.61..66.48 rows=1030 width=105) (actual
time=9.655..15.264 rows=1030 loops=1)
                          Filter: ((hashed subplan) OR $5)
                          SubPlan
                            ->  Seq Scan on staff_area  (cost=10.73..33.38 rows=493 width=4)
(actual time=8.452..8.452 rows=0 loops=1)
                                  Filter: ((hashed subplan) OR (area_id = 1))
                                  SubPlan
                                    ->  Seq Scan on wruserarea  (cost=3.26..10.72 rows=5 width=4)
(actual time=0.977..1.952 rows=1 loops=1)
                                          Filter: (uid = $1)
                                          InitPlan
                                            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4)
(actual time=0.921..0.926 rows=1 loops=1)
                                                  Filter: ((username)::name = "current_user"())
                    ->  Hash  (cost=11.79..11.79 rows=430 width=4) (actual time=5.705..5.705
rows=0 loops=1)
                          ->  Index Scan using fk_idx_search_reqt_result on search_reqt_result
sr  (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1)
                                Index Cond: (search_id = 1)
        SubPlan
          ->  HashAggregate  (cost=8.32..8.32 rows=1 width=4) (actual time=2.157..2.157
rows=1 loops=429)
                Filter: (count(contract_id) = $9)
                InitPlan
                  ->  Aggregate  (cost=1.04..1.04 rows=1 width=4) (actual time=0.172..0.173
rows=1 loops=1)
                        ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.022..0.038 rows=1 loops=1)
                              Filter: (search_id = 1)
                ->  Hash IN Join  (cost=1.04..7.27 rows=1 width=4) (actual time=2.064..2.117
rows=1 loops=429)
                      Hash Cond: ("outer".product_id = "inner".product_id)
                      ->  Nested Loop  (cost=0.00..6.19 rows=7 width=8) (actual
time=1.112..2.081 rows=8 loops=429)
                            ->  Index Scan using fk_idx_staff_contract_2 on staff_contract c
(cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429)
                                  Index Cond: (staff_id = $8)
                                  Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to
>= '2004-06-18'::date))
                            ->  Index Scan using fk_idx_staff_product on staff_product p
(cost=0.00..3.08 rows=6 width=8) (actual time=0.873..1.764 rows=8 loops=429)
                                  Index Cond: (p.contract_id = "outer".contract_id)
                      ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.086..0.086
rows=0 loops=1)
                            ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.037..0.050 rows=1 loops=1)
                                  Filter: (search_id = 1)
  ->  Index Scan using location_pkey on "location"  (cost=0.00..12.66 rows=1 width=18)
(actual time=0.876..0.887 rows=1 loops=429)
        Index Cond: ("location".location_id = "outer".location_id)
        Filter: ((area_id = 1) OR (subplan))
        SubPlan
          ->  Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea
(cost=3.26..9.64 rows=1 width=4) (never executed)
                Index Cond: ((area_id = 1) OR (area_id = $7))
                Filter: (uid = $6)
                InitPlan
                  ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (never executed)
                        Filter: ((username)::name = "current_user"())
  SubPlan
    ->  Aggregate  (cost=11233.28..11233.29 rows=1 width=2) (actual
time=3036.814..3036.815 rows=1 loops=429)
          ->  Nested Loop  (cost=10391.71..11233.21 rows=30 width=2) (actual
time=2817.923..3036.516 rows=34 loops=429)
                ->  Hash Join  (cost=10391.71..11142.43 rows=30 width=4) (actual
time=2813.349..3007.936 rows=34 loops=429)
                      Hash Cond: ("outer".booking_id = "inner".booking_id)
                      ->  Index Scan using booking_plan_idx2 on booking_plan bp
(cost=0.00..572.52 rows=23720 width=4) (actual time=0.070..157.028 rows=24613
loops=429)
                            Index Cond: ((booking_date >= '2004-06-12'::date) AND
(booking_date <= '2004-06-18'::date))
                      ->  Hash  (cost=10382.78..10382.78 rows=3571 width=8) (actual
time=2746.122..2746.122 rows=0 loops=429)
                            ->  Index Scan using fk_idx_staff_booking on staff_booking b
(cost=0.00..10382.78 rows=3571 width=8) (actual time=14.168..2733.315 rows=3815
loops=429)
                                  Index Cond: (contract_id = $0)
                ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..3.01
rows=1 width=6) (actual time=0.826..0.832 rows=1 loops=14401)
                      Index Cond: ("outer".reqt_id = r.reqt_id)
    ->  Aggregate  (cost=363.94..363.94 rows=1 width=0) (actual time=0.057..0.058
rows=1 loops=429)
          ->  Nested Loop  (cost=0.00..363.94 rows=1 width=0) (actual time=0.034..0.034
rows=0 loops=429)
                ->  Index Scan using fk_idx_main_order on main_order  (cost=0.00..4.99
rows=1 width=4) (actual time=0.031..0.031 rows=0 loops=429)
                      Index Cond: (client_id = 6)
                ->  Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail
(cost=0.00..358.93 rows=1 width=4) (never executed)
                      Index Cond: ("outer".order_id = timesheet_detail.order_id)
                      Filter: (contract_id = $0)
Total runtime: 1304591.861 ms

Long Time! The main issue here is that the RHOURS subselect is executed as a nested
join 429 times. unfortunately this is an expensive subquery.

SQLServer executed this in just over 1 second on comparable hardware. Looking at its
execution plan it flattens out the two subselects with a merge join. So I manually rewrote
the query using derived tables and joins as:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS,
SC.MIN_HOURS, TBOOK.RHOURS, TVIS.VISITS FROM SEARCH_REQT_RESULT
SR
JOIN STAFF_CONTRACT SC ON (SR.STAFF_ID = SC.STAFF_ID) AND
SC.AVAIL_DATE_FROM <= '2004-06-12' AND SC.AVAIL_DATE_TO >= '2004-06-18'
JOIN VSTAFF VS ON (VS.STAFF_ID = SC.STAFF_ID)
JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
LEFT OUTER JOIN (SELECT B.CONTRACT_ID, SUM(R.DURATION+1)/60.0 AS
RHOURS FROM STAFF_BOOKING B
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND
BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18'
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
 GROUP BY B.CONTRACT_ID) AS TBOOK
ON (SC.CONTRACT_ID = TBOOK.CONTRACT_ID)
LEFT OUTER JOIN (SELECT CONTRACT_ID,COUNT(*) AS VISITS FROM
TIMESHEET_DETAIL
JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID =
TIMESHEET_DETAIL.ORDER_ID) WHERE MAIN_ORDER.CLIENT_ID = 6
GROUP BY CONTRACT_ID) AS TVIS ON (TVIS.CONTRACT_ID = SC.CONTRACT_ID)
JOIN (SELECT P.CONTRACT_ID FROM STAFF_PRODUCT P,
SEARCH_ORDER_REQT SR
WHERE P.PRODUCT_ID = SR.PRODUCT_ID AND SR.SEARCH_ID = 1
GROUP BY P.CONTRACT_ID
HAVING (COUNT(P.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID)
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) AS TCONT ON
(TCONT.CONTRACT_ID = SC.CONTRACT_ID)
WHERE SR.SEARCH_ID = 1

With the explain analyze as:
QUERY PLAN
Hash Join  (cost=137054.42..137079.74 rows=159 width=192) (actual
time=6228.354..6255.058 rows=429 loops=1)
  Hash Cond: ("outer".contract_id = "inner".contract_id)
  InitPlan
    ->  Index Scan using fk_idx_wruserarea on wruserarea  (cost=3.26..6.52 rows=1
width=4) (actual time=0.850..0.850 rows=1 loops=1)
          Index Cond: (area_id = 1)
          Filter: (uid = $3)
          InitPlan
            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (actual
time=0.670..0.675 rows=1 loops=1)
                  Filter: ((username)::name = "current_user"())
  ->  Subquery Scan tcont  (cost=152.63..161.81 rows=612 width=4) (actual
time=36.312..42.268 rows=612 loops=1)
        ->  HashAggregate  (cost=152.63..155.69 rows=612 width=4) (actual
time=36.301..40.040 rows=612 loops=1)
              Filter: (count(contract_id) = $7)
              InitPlan
                ->  Aggregate  (cost=1.04..1.04 rows=1 width=4) (actual time=0.107..0.108
rows=1 loops=1)
                      ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.025..0.037 rows=1 loops=1)
                            Filter: (search_id = 1)
              ->  Hash Join  (cost=1.04..148.53 rows=612 width=4) (actual
time=0.419..32.284 rows=612 loops=1)
                    Hash Cond: ("outer".product_id = "inner".product_id)
                    ->  Seq Scan on staff_product p  (cost=0.00..109.91 rows=6291 width=8)
(actual time=0.117..17.943 rows=6291 loops=1)
                    ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.190..0.190
rows=0 loops=1)
                          ->  Seq Scan on search_order_reqt sr  (cost=0.00..1.04 rows=1
width=4) (actual time=0.165..0.177 rows=1 loops=1)
                                Filter: (search_id = 1)
  ->  Hash  (cost=136894.61..136894.61 rows=266 width=192) (actual
time=6191.923..6191.923 rows=0 loops=1)
        ->  Merge Left Join  (cost=136886.03..136894.61 rows=266 width=192) (actual
time=6143.315..6189.685 rows=429 loops=1)
              Merge Cond: ("outer".contract_id = "inner".contract_id)
              ->  Merge Left Join  (cost=136517.64..136525.04 rows=266 width=184) (actual
time=6142.896..6171.676 rows=429 loops=1)
                    Merge Cond: ("outer".contract_id = "inner".contract_id)
                    ->  Sort  (cost=5529.68..5530.34 rows=266 width=152) (actual
time=129.548..130.027 rows=429 loops=1)
                          Sort Key: sc.contract_id
                          ->  Nested Loop  (cost=88.35..5518.96 rows=266 width=152) (actual
time=33.213..121.666 rows=429 loops=1)
                                ->  Hash Join  (cost=88.35..143.88 rows=424 width=138) (actual
time=32.739..76.357 rows=429 loops=1)
                                      Hash Cond: ("outer".staff_id = "inner".staff_id)
                                      ->  Hash Join  (cost=47.47..93.95 rows=430 width=109) (actual
time=15.232..40.040 rows=429 loops=1)
                                            Hash Cond: ("outer".staff_id = "inner".staff_id)
                                            ->  Seq Scan on staff  (cost=34.61..66.48 rows=1030
width=105) (actual time=9.412..16.105 rows=1030 loops=1)
                                                  Filter: ((hashed subplan) OR $4)
                                                  SubPlan
                                                    ->  Seq Scan on staff_area  (cost=10.73..33.38
rows=493 width=4) (actual time=8.380..8.380 rows=0 loops=1)
                                                          Filter: ((hashed subplan) OR (area_id = 1))
                                                          SubPlan
                                                            ->  Seq Scan on wruserarea  (cost=3.26..10.72
rows=5 width=4) (actual time=0.953..1.941 rows=1 loops=1)
                                                                  Filter: (uid = $0)
                                                                  InitPlan
                                                                    ->  Seq Scan on wruser  (cost=0.00..3.26
rows=1 width=4) (actual time=0.902..0.908 rows=1 loops=1)
                                                                          Filter: ((username)::name =
"current_user"())
                                            ->  Hash  (cost=11.79..11.79 rows=430 width=4) (actual
time=5.670..5.670 rows=0 loops=1)
                                                  ->  Index Scan using fk_idx_search_reqt_result on
search_reqt_result sr  (cost=0.00..11.79 rows=430 width=4) (actual time=0.448..4.516
rows=429 loops=1)
                                                        Index Cond: (search_id = 1)
                                      ->  Hash  (cost=38.36..38.36 rows=1008 width=37) (actual
time=17.386..17.386 rows=0 loops=1)
                                            ->  Seq Scan on staff_contract sc  (cost=0.00..38.36
rows=1008 width=37) (actual time=0.222..14.063 rows=1008 loops=1)
                                                  Filter: ((avail_date_from <= '2004-06-12'::date) AND
(avail_date_to >= '2004-06-18'::date))
                                ->  Index Scan using location_pkey on "location"  (cost=0.00..12.66
rows=1 width=18) (actual time=0.043..0.050 rows=1 loops=429)
                                      Index Cond: ("location".location_id = "outer".location_id)
                                      Filter: ((area_id = 1) OR (subplan))
                                      SubPlan
                                        ->  Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on
wruserarea  (cost=3.26..9.64 rows=1 width=4) (never executed)
                                              Index Cond: ((area_id = 1) OR (area_id = $6))
                                              Filter: (uid = $5)
                                              InitPlan
                                                ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4)
(never executed)
                                                      Filter: ((username)::name = "current_user"())
                    ->  Sort  (cost=130987.97..130989.96 rows=797 width=36) (actual
time=6013.254..6014.112 rows=746 loops=1)
                          Sort Key: tbook.contract_id
                          ->  Subquery Scan tbook  (cost=130933.62..130949.56 rows=797
width=36) (actual time=5993.070..6007.677 rows=746 loops=1)
                                ->  HashAggregate  (cost=130933.62..130941.59 rows=797
width=6) (actual time=5993.055..6004.099 rows=746 loops=1)
                                      ->  Merge Join  (cost=74214.90..130815.02 rows=23720
width=6) (actual time=4950.951..5807.985 rows=24613 loops=1)
                                            Merge Cond: ("outer".reqt_id = "inner".reqt_id)
                                            ->  Index Scan using order_reqt_pkey on order_reqt r
(cost=0.00..50734.20 rows=2206291 width=6) (actual time=0.444..2753.374
rows=447439 loops=1)
                                            ->  Sort  (cost=74214.90..74274.20 rows=23720 width=8)
(actual time=1822.405..1856.081 rows=24613 loops=1)
                                                  Sort Key: b.reqt_id
                                                  ->  Nested Loop  (cost=0.00..72491.19 rows=23720
width=8) (actual time=1.955..1633.124 rows=24613 loops=1)
                                                        ->  Index Scan using booking_plan_idx2 on
booking_plan bp  (cost=0.00..572.52 rows=23720 width=4) (actual time=1.468..243.827
rows=24613 loops=1)
                                                              Index Cond: ((booking_date >= '2004-06-
12'::date) AND (booking_date <= '2004-06-18'::date))
                                                        ->  Index Scan using staff_booking_pkey on
staff_booking b  (cost=0.00..3.02 rows=1 width=12) (actual time=0.037..0.042 rows=1
loops=24613)
                                                              Index Cond: ("outer".booking_id = b.booking_id)
              ->  Sort  (cost=368.38..368.55 rows=68 width=12) (actual time=0.338..0.338
rows=0 loops=1)
                    Sort Key: tvis.contract_id
                    ->  Subquery Scan tvis  (cost=365.46..366.31 rows=68 width=12) (actual
time=0.307..0.307 rows=0 loops=1)
                          ->  HashAggregate  (cost=365.46..365.63 rows=68 width=4) (actual
time=0.302..0.302 rows=0 loops=1)
                                ->  Nested Loop  (cost=0.00..365.12 rows=68 width=4) (actual
time=0.290..0.290 rows=0 loops=1)
                                      ->  Index Scan using fk_idx_main_order on main_order
(cost=0.00..4.99 rows=1 width=4) (actual time=0.286..0.286 rows=0 loops=1)
                                            Index Cond: (client_id = 6)
                                      ->  Index Scan using fk_idx_timesheet_detail_3 on
timesheet_detail  (cost=0.00..358.63 rows=120 width=8) (never executed)
                                            Index Cond: ("outer".order_id = timesheet_detail.order_id)
Total runtime: 6266.205 ms

This now gives me the same results, but with orders of magnitude better execution
times!

Oddly enough, SQLServer really struggles with the second query, taking longer then
PostgreSQL!!!!

Regards,
Gary.


On 3 Apr 2004 at 10:59, Josh Berkus wrote:

Gary,

> There are no indexes on the columns involved in the update, they are
> not required for my usual select statements. This is an attempt to
> slightly denormalise the design to get the performance up comparable
> to SQL Server 2000. We hope to move some of our databases over to
> PostgreSQL later in the year and this is part of the ongoing testing.
> SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> so I am hand optimising some of the more frequently used
> SQL and/or tweaking the database design slightly.

Hmmm ... that hasn't been my general experience on complex queries.   However,
it may be due to a difference in ANALYZE statistics.   I'd love to see you
increase your default_stats_target, re-analyze, and see if PostgreSQL gets
"smarter".

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


В списке pgsql-performance по дате отправления:

Предыдущее
От: Gary Doades
Дата:
Сообщение: Re: PostgreSQL and Linux 2.6 kernel.
Следующее
От: Gary Doades
Дата:
Сообщение: Re: PostgreSQL and Linux 2.6 kernel.