reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

Поиск
Список
Период
Сортировка
От Mark Stosberg
Тема reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Дата
Msg-id eqvdbm$1c8p$1@news.hub.org
обсуждение исходный текст
Ответ на Re: cube operations slower than geo_distance() on production server  ("Merlin Moncure" <mmoncure@gmail.com>)
Ответы Re: reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Список pgsql-performance
Merlin Moncure wrote:
> On 2/14/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are two things wrong here: first, that the estimated row
>> count is only 20% of actual; it should certainly not be that far
>> off for such a simple condition. I wonder if your vacuum/analyze
>> procedures are actually working. Second, you mentioned somewhere
>> along the line that 'available' pets are about 10% of all the
>> entries, which means that this indexscan is more than likely
>> entirely counterproductive: it would be cheaper to ignore this
>> index altogether.

Tom,

Thanks for the generosity of your time. We are using  8.1.3 currently. I
have read there are some performance improvements in 8.2, but we have
not started evaluating that yet.

Your suggestion about the pet_state index was right on. I tried
"Analyze" on it, but still got the same bad estimate. However, I then
used "reindex" on that index, and that fixed the estimate accuracy,
which made the query run faster! The cube search now benchmarks faster
than the old search in production, taking about 2/3s of the time of the
old one.

Any ideas why the manual REINDEX did something that "analyze" didn't? It
makes me wonder if there is other tuning like this to do.

Attached is the EA output from the most recent run, after the "re-index".

> I think switching the index on pet_state to a composite on (pet_state,
> species_id) might help too.
>
> or even better:
>
> create function is_pet_available(text) returns bool as
> $$
>  select $1='available';
> $$ language sql immutable;
>
> create index pets_available_species_idx on
> pets(is_pet_available(pet_state), species_id);

Merlin,

Thanks for this suggestion. It is not an approach I had used before, and
I was interested to try it. However, the new index didn't get chosen.
(Perhaps I would need to drop the old one?) However, Tom's suggestions
did help. I'll follow up on that in just a moment.

>
> refactor your query something similar to:
>
> SELECT * FROM
> (
> SELECT
> earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
> FROM pets
> JOIN shelters_active as shelters USING (shelter_id)
> JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
> JOIN zipcodes q ON q.zipcode = '90210'
> WHERE
>   is_pet_available(pet_state)
>   AND species_id = 1
>   AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
> ) p order by radius
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Sort  (cost=5276.93..5277.00 rows=28 width=64) (actual time=1981.830..1984.415 rows=1344 loops=1)
   Sort Key: (cube_distance(public.zipcodes.earth_coords, public.zipcodes.earth_coords) / 1609.344::double precision)
   ->  Nested Loop  (cost=291.32..5276.26 rows=28 width=64) (actual time=24.080..1976.479 rows=1344 loops=1)
         ->  Nested Loop  (cost=2.15..575.79 rows=11 width=68) (actual time=2.637..34.067 rows=131 loops=1)
               ->  Nested Loop  (cost=2.15..153.48 rows=42 width=73) (actual time=1.939..3.972 rows=240 loops=1)
                     ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 width=32) (actual
time=0.283..0.287rows=1 loops=1) 
                           Index Cond: ((zipcode)::text = '90210'::text)
                     ->  Bitmap Heap Scan on zipcodes  (cost=2.15..149.84 rows=42 width=41) (actual time=1.403..2.323
rows=240loops=1) 
                           Recheck Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double
precision,3) @ zipcodes.earth_coords) 
                           ->  Bitmap Index Scan on zip_earth_coords_idx  (cost=0.00..2.15 rows=42 width=0) (actual
time=1.377..1.377rows=240 loops=1) 
                                 Index Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double
precision,3) @ zipcodes.earth_coords) 
               ->  Index Scan using shelters_postal_code_for_joining_idx on shelters  (cost=0.00..10.03 rows=2
width=12)(actual time=0.064..0.118 rows=1 loops=240) 
                     Index Cond: ((shelters.postal_code_for_joining)::text = ("outer".zipcode)::text)
                     Filter: ((shelter_state)::text = 'active'::text)
         ->  Bitmap Heap Scan on pets  (cost=289.17..426.86 rows=35 width=4) (actual time=14.362..14.746 rows=10
loops=131)
               Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND ((pets.pet_state)::text = 'available'::text))
               Filter: (species_id = 1)
               ->  BitmapAnd  (cost=289.17..289.17 rows=35 width=0) (actual time=14.219..14.219 rows=0 loops=131)
                     ->  Bitmap Index Scan on pets_shelter_id_idx  (cost=0.00..3.89 rows=256 width=0) (actual
time=0.188..0.188rows=168 loops=131) 
                           Index Cond: (pets.shelter_id = "outer".shelter_id)
                     ->  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..285.02 rows=41149 width=0) (actual
time=22.043..22.043rows=40397 loops=82) 
                           Index Cond: ((pet_state)::text = 'available'::text)
 Total runtime: 1988.962 ms

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Merlin Moncure"
Дата:
Сообщение: Re: Benchmarking PGSQL?
Следующее
От: Kenji Morishige
Дата:
Сообщение: Re: quad or dual core Intel CPUs