Re: Performance Optimization for Dummies 2 - the SQL

От: Carlo Stonebanks
Тема: Re: Performance Optimization for Dummies 2 - the SQL
Дата: ,
Msg-id: egua6n$1fv4$1@news.hub.org
(см: обсуждение, исходный текст)
Ответ на: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks")
Ответы: Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane)
Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure")
Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure")
Список: pgsql-performance

Скрыть дерево обсуждения

Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Alex Stapleton, )
   Re: Performance Optimization for Dummies 2 - the SQL  (Markus Schaber, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  index growth problem  (Graham Davis, )
   Re: index growth problem  ("Jim C. Nasby", )
    Re: index growth problem  (Graham Davis, )
     Re: index growth problem  ("Jim C. Nasby", )
    Re: index growth problem  (Tom Lane, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
    Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Scott Marlowe, )
   Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
    Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Tom Lane, )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Merlin Moncure", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  ("Jim C. Nasby", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
  Re: Performance Optimization for Dummies 2 - the SQL  (Shaun Thomas, )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )
 Re: Performance Optimization for Dummies 2 - the SQL  ("Carlo Stonebanks", )

Hi Merlin,

Well, I'm back. first of all, thanks for your dogged determination to help
me out - it is much appreciated. I owe you a beer or twelve.

The import has been running for a week. The import program got faster as I
tuned things. I capture the dynamic SQL statements generated by the app, as
well as an accompanying EXPLAIN - and put it out to an XML file. I turned
off seq scan in the config, and ran a trial import. I knew that with seq
scan off that if I saw a seq scan in my log, it's because there were no
indexes available to satisfy the query - I adjusted accordingly and this
worked really well.

When the import runs against an empty or small db, it's blisteringly fast
(considering that it's a heauristically based process). This proved that it
wasn't the app or the SQL connection that was slow. Once again, though, as
the data db grows, it slows down. Now it's crawling again. All of the
queries appear to be fine, taking advantage of the indexes. There is ONE
query, though, that seems to be the troublemaker - the same one I had
brought up before. I believe that it is one sub-query that is causing the
problem, taking what appears to be 500 to 1000+ms to run every time. (See
below).

Curiously, it's using index scans, and it really looks like a simple query
to me. I am completely baffled. The two tables in question have about 800K
rows each - not exactly an incredible number. The EXPLAIN is simple, but the
performance is dreadful. All the other queries run much faster than this -
does ANYTHING about this query strike you as odd?

Carlo

/*
Find all facilities that do not have full address information
but do have default location information that indicates
its the facilitiy's US zip code.
NULL values cast as columns are placeholders to allow
this sub-query to be unioned with another subquery
that contains full address data
*/
select
    f.facility_id,
    null as facility_address_id,
    null as address_id,
    f.facility_type_code,
    f.name,
    null as address,
    f.default_city as city,
    f.default_state_code as state_code,
    f.default_postal_code as postal_code,
    f.default_country_code as country_code,
    null as parsed_unit
from
    mdx_core.facility as f
left outer join mdx_core.facility_address as fa
    on fa.facility_id = f.facility_id
where
     facility_address_id is null
     and f.default_country_code = 'US'
     and (f.default_postal_code = '14224-1945' or f.default_postal_code =
'14224')

"Nested Loop Left Join  (cost=22966.70..23594.84 rows=93 width=71) (actual
time=662.075..662.075 rows=0 loops=1)"
"  Filter: ("inner".facility_address_id IS NULL)"
"  ->  Bitmap Heap Scan on facility f  (cost=22966.70..23231.79 rows=93
width=71) (actual time=661.907..661.929 rows=7 loops=1)"
"        Recheck Cond: (((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text)) OR
((default_country_code = 'US'::bpchar) AND ((default_postal_code)::text =
'14224'::text)))"
"        ->  BitmapOr  (cost=22966.70..22966.70 rows=93 width=0) (actual
time=661.891..661.891 rows=0 loops=1)"
"              ->  Bitmap Index Scan on
facility_country_state_postal_code_idx  (cost=0.00..11483.35 rows=47
width=0) (actual time=374.284..374.284 rows=7 loops=1)"
"                    Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224-1945'::text))"
"              ->  Bitmap Index Scan on
facility_country_state_postal_code_idx  (cost=0.00..11483.35 rows=47
width=0) (actual time=287.599..287.599 rows=0 loops=1)"
"                    Index Cond: ((default_country_code = 'US'::bpchar) AND
((default_postal_code)::text = '14224'::text))"
"  ->  Index Scan using facility_address_facility_address_address_type_idx
on facility_address fa  (cost=0.00..3.89 rows=1 width=8) (actual
time=0.014..0.016 rows=1 loops=7)"
"        Index Cond: (fa.facility_id = "outer".facility_id)"
"Total runtime: 662.203 ms"
>




В списке pgsql-performance по дате сообщения:

От: "Craig A. James"
Дата:
Сообщение: Re: Hints proposal
От: "Harald Armin Massa"
Дата:
Сообщение: measuring shared memory usage on Windows