Analyzer is clueless

Поиск
Список
Период
Сортировка
От David Brown
Тема Analyzer is clueless
Дата
Msg-id 20041117160824.B694D3A3ECC@svr1.postgresql.org
обсуждение исходный текст
Ответы Re: Analyzer is clueless  (Josh Berkus <josh@agliodbs.com>)
Re: Analyzer is clueless  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: Analyzer is clueless  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-performance
I'm doing some performance profiling with a simple two-table query:

SELECT L."ProductID", sum(L."Amount")
FROM "drinv" H
JOIN "drinvln" L ON L."OrderNo" = H."OrderNo"
WHERE
("OrderDate" between '2003-01-01' AND '2003-04-30')
GROUP BY L."ProductID"

drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB.
OrderNois indexed in both tables, as is OrderDate. 

The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking
variousconf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer
doesn'tseem to take resources into account like some of the doco suggests. 

The date selection represents about 5% of the range. Here's the plan summaries:

Three months (2003-01-01 to 2003-03-30) = 1 second

HashAggregate  (cost=119365.53..119368.74 rows=642 width=26)
  ->  Nested Loop  (cost=0.00..118791.66 rows=114774 width=26)
        ->  Index Scan using "drinv_OrderDate" on drinv h  (cost=0.00..200.27 rows=3142 width=8)
              Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date))
        ->  Index Scan using "drinvln_OrderNo" on drinvln l  (cost=0.00..28.73 rows=721 width=34)
              Index Cond: (l."OrderNo" = "outer"."OrderNo")


Four months (2003-01-01 to 2003-04-30) = 60 seconds

HashAggregate  (cost=126110.53..126113.74 rows=642 width=26)
  ->  Hash Join  (cost=277.55..125344.88 rows=153130 width=26)
        Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo")
        ->  Seq Scan on drinvln l  (cost=0.00..106671.35 rows=3372935 width=34)
        ->  Hash  (cost=267.07..267.07 rows=4192 width=8)
              ->  Index Scan using "drinv_OrderDate" on drinv h  (cost=0.00..267.07 rows=4192 width=8)
                    Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))


Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds


HashAggregate  (cost=130565.83..130569.04 rows=642 width=26)
  ->  Merge Join  (cost=519.29..129800.18 rows=153130 width=26)
        Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo")
        ->  Sort  (cost=519.29..529.77 rows=4192 width=8)
              Sort Key: h."OrderNo"
              ->  Index Scan using "drinv_OrderDate" on drinv h  (cost=0.00..267.07 rows=4192 width=8)
                    Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))
        ->  Index Scan using "drinvln_OrderNo" on drinvln l  (cost=0.00..119296.29 rows=3372935 width=34)

Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see,
theestimated costs are wild anyway. 

As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.

MySQL (InnoDB) took 2 seconds, which is 30 times faster.

The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong?

В списке pgsql-performance по дате отправления:

Предыдущее
От: "David Parker"
Дата:
Сообщение: Re: query plan question
Следующее
От: Mike Rylander
Дата:
Сообщение: Re: memcached and PostgreSQL