Inaccurate Explain Cost

Поиск
Список
Период
Сортировка
От Robert Sosinski
Тема Inaccurate Explain Cost
Дата
Msg-id ED7ED14D0D304B2FB38B191F3C160A22@ticketevolution.com
обсуждение исходный текст
Ответы Re: Inaccurate Explain Cost  (Shaun Thomas <sthomas@optionshouse.com>)
Re: Inaccurate Explain Cost  (hubert depesz lubaczewski <depesz@depesz.com>)
Список pgsql-performance
Hey Everyone,

I seem to be getting an inaccurate cost from explain.  Here are two examples for one query with two different query plans:

exchange_prod=# set enable_nestloop = on;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
                                                                                            QUERY PLAN                                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=190169.54..190169.55 rows=1 width=4) (actual time=199806.806..199806.807 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..190162.49 rows=2817 width=4) (actual time=163.293..199753.548 rows=43904 loops=1)
         ->  Nested Loop  (cost=0.00..151986.53 rows=2817 width=4) (actual time=163.275..186869.844 rows=43904 loops=1)
               ->  Index Scan using upload_destinations_office_id_idx on upload_destinations  (cost=0.00..29.95 rows=4 width=8) (actual time=0.060..0.093 rows=6 loops=1)
                     Index Cond: (office_id = 6)
               ->  Index Scan using index_exchange_uploads_on_upload_destination_id on exchange_uploads  (cost=0.00..37978.21 rows=875 width=12) (actual time=27.197..31140.375 rows=7317 loops=6)
                     Index Cond: (upload_destination_id = upload_destinations.id)
                     Filter: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
         ->  Index Scan using index_import_errors_on_exchange_upload_id on import_errors  (cost=0.00..8.49 rows=405 width=4) (actual time=0.291..0.291 rows=0 loops=43904)
               Index Cond: (exchange_upload_id = exchange_uploads.id)
 Total runtime: 199806.951 ms
(11 rows)

exchange_prod=# 
exchange_prod=# set enable_nestloop = off;
SET
exchange_prod=# 
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
                                                                                                QUERY PLAN                                                                                                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2535992.33..2535992.34 rows=1 width=4) (actual time=133447.507..133447.507 rows=1 loops=1)
   ->  Hash Right Join  (cost=1816553.69..2535985.56 rows=2708 width=4) (actual time=133405.326..133417.078 rows=43906 loops=1)
         Hash Cond: (import_errors.exchange_upload_id = exchange_uploads.id)
         ->  Seq Scan on import_errors  (cost=0.00..710802.71 rows=2300471 width=4) (actual time=0.006..19199.569 rows=2321888 loops=1)
         ->  Hash  (cost=1816519.84..1816519.84 rows=2708 width=4) (actual time=112938.606..112938.606 rows=43906 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1544kB
               ->  Hash Join  (cost=28.25..1816519.84 rows=2708 width=4) (actual time=42.957..112892.689 rows=43906 loops=1)
                     Hash Cond: (exchange_uploads.upload_destination_id = upload_destinations.id)
                     ->  Index Scan using index_upload_destinations_on_created_at on exchange_uploads  (cost=0.00..1804094.96 rows=3298545 width=12) (actual time=17.686..111649.272 rows=3303488 loops=1)
                           Index Cond: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
                     ->  Hash  (cost=28.20..28.20 rows=4 width=8) (actual time=0.043..0.043 rows=6 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 1kB
                           ->  Bitmap Heap Scan on upload_destinations  (cost=6.28..28.20 rows=4 width=8) (actual time=0.026..0.036 rows=6 loops=1)
                                 Recheck Cond: (office_id = 6)
                                 ->  Bitmap Index Scan on upload_destinations_office_id_idx  (cost=0.00..6.28 rows=4 width=0) (actual time=0.020..0.020 rows=6 loops=1)
                                       Index Cond: (office_id = 6)
 Total runtime: 133447.790 ms
(17 rows)



The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.  When I disable nested loop, I get a cost of 2,535,992.34 which runs in only 133,447.790 ms.  We have run queries on our database with a cost of 200K cost before and they ran less then a few seconds, which makes me wonder if the first query plan is inaccurate.  The other issue is understanding why a query plan with a much higher cost is taking less time to run.

I do not think these queries are cached differently, as we have gotten the same results ran a couple of times at across a few days.  We also analyzed the tables that we are querying before trying the explain analyze again, and were met with the same statistics.  Any help on how Postgres comes up with a query plan like this, and why there is a difference would be very helpful.

Thanks!

-- 
Robert Sosinski

В списке pgsql-performance по дате отправления:

Предыдущее
От: Nikolay Ulyanitsky
Дата:
Сообщение: Re: Same query doing slow then quick
Следующее
От: Shaun Thomas
Дата:
Сообщение: Re: Inaccurate Explain Cost