Re: Bitmap scan is undercosted?

Поиск
Список
Период
Сортировка
От Vitaliy Garnashevich
Тема Re: Bitmap scan is undercosted?
Дата
Msg-id d06c555e-6710-2dd7-d019-4d7bc40750b4@gmail.com
обсуждение исходный текст
Ответ на Re: Bitmap scan is undercosted?  (Vitaliy Garnashevich <vgarnashevich@gmail.com>)
Список pgsql-performance
> What seems odd to me is that in different kinds of tests (with 
> different frequency of column values):
>
> i1 Rows Removed by Filter = 900156, 179792, 89762 (decreased a lot)
> i1 buffers = 46983, 44373, 39928 (decreased, but not a lot)
> i1 best case time = 756.045, 127.814, 79.492 (decreased a lot, as well 
> as probably average case too)
> i1 cost estimates = 67358.15, 48809.34, 46544.80 (did not decrease a lot)
>
> i2 Rows Removed by Filter = 900835, 980350, 991389
> i2 buffers = 46985, 46985, 46987
> i2 best case time = 377.554, 346.481, 387.874
> i2 cost estimates = 39166.34, 39247.83, 40167.34
>
> It's odd that increase in actual execution time for "i1" was not 
> reflected enough in cost estimates. The cost even didn't go below "i2" 
> costs.

I've added some logging, in order to get the actual numbers which were 
used for estimation.

--drop table if exists aaa;
--create table aaa as select floor(random()*100)::int num, (random()*10 
< 1)::int flag from generate_series(1, 10000000) id;
--analyze aaa;

--set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;
--set seq_page_cost = 1.0; set random_page_cost = 1.0; set 
cpu_tuple_cost = 0.01; set cpu_index_tuple_cost = 0.005; set 
cpu_operator_cost = 0.0025;

--create index i1 on aaa  (num);
--drop index if exists i2;
--explain (analyze,verbose,costs,buffers) select * from aaa where num = 
1 and flag = 1;

Index Scan using i1 on public.aaa  (cost=0.43..46697.59 rows=10641 
width=8) (actual time=0.047..153.521 rows=9826 loops=1)
   Rows Removed by Filter: 89948

--drop index if exists i1;
--create index i2 on aaa  (flag);
--explain (analyze,verbose,costs,buffers) select * from aaa where num = 
1 and flag = 1;

Index Scan using i2 on public.aaa  (cost=0.43..39583.11 rows=10641 
width=8) (actual time=0.098..351.454 rows=9826 loops=1)
   Rows Removed by Filter: 990249


LOG:  cost_index:
         seq_page_cost=1.00, random_page_cost=1.00, 
cpu_tuple_cost=0.0100, cpu_index_tuple_cost=0.0050, 
cpu_operator_cost=0.0025, effective_cache_size=131072
         indexStartupCost=0.43, indexTotalCost=1103.94, 
indexSelectivity=0.01076667, indexCorrelation=0.00208220
         baserel->tuples=10000033.00, baserel->pages=44248.00, 
baserel->allvisfrac=0.00000000
         tuples_fetched=107667.00, pages_fetched=477.00
         max_IO_cost=44248.0000, min_IO_cost=477.0000, csquared=0.0000
         qpqual_cost.startup=0.0000, qpqual_cost.per_tuple=0.0025, 
cpu_per_tuple=0.0125
         spc_seq_page_cost=1.00, spc_random_page_cost=1.00
         startup_cost=0.43, total_cost=46697.59

LOG:  cost_index:
         seq_page_cost=1.00, random_page_cost=1.00, 
cpu_tuple_cost=0.0100, cpu_index_tuple_cost=0.0050, 
cpu_operator_cost=0.0025, effective_cache_size=131072
         indexStartupCost=0.43, indexTotalCost=10123.93, 
indexSelectivity=0.09883333, indexCorrelation=0.82505685
         baserel->tuples=10000000.00, baserel->pages=44248.00, 
baserel->allvisfrac=0.00000000
         tuples_fetched=988333.00, pages_fetched=4374.00
         max_IO_cost=44248.0000, min_IO_cost=4374.0000, csquared=0.6807
         qpqual_cost.startup=0.0000, qpqual_cost.per_tuple=0.0025, 
cpu_per_tuple=0.0125
         spc_seq_page_cost=1.00, spc_random_page_cost=1.00
         startup_cost=0.43, total_cost=39583.11


Here is a break down of the total_cost into components, for i1 query and 
for i2 query (some rounding was removed from the formula for brevity):

path->path.total_cost =
   (indexTotalCost + qpqual_cost.startup) +
   (max_IO_cost + csquared * (min_IO_cost - max_IO_cost)) +
   (cpu_tuple_cost + qpqual_cost.per_tuple) * (indexSelectivity * 
baserel->tuples);
path->path.total_cost =
   1103.94 + 0.0000 +                                // 1103.94 +
   44248.0000 + 0.0000 * (477.0000 - 44248.0000) +   // 44248.00 +
   (0.0100 + 0.0025) * (0.01076667 * 10000033.00)    // 1345.84
   = 46697.78;                                       // = 46697.78;

path->path.total_cost =
   (indexTotalCost + qpqual_cost.startup) +
   (max_IO_cost + csquared * (min_IO_cost - max_IO_cost)) +
   (cpu_tuple_cost + qpqual_cost.per_tuple) * (indexSelectivity * 
baserel->tuples);
path->path.total_cost =
   10123.93 + 0.0000 +                               // 10123.93 +
   44248.0000 + 0.6807 * (4374.0000 - 44248.0000) +  // 17105.77 +
   (0.0100 + 0.0025) * (0.09883333 * 10000000.00)    // 12354.17
   = 39583.86;                                       // = 39583.86;


PS.
The code used for logging:
/postgresql-9.3.1/src/backend/optimizer/path/costsize.c : cost_index()

     ereport(LOG,
             (errmsg("cost_index: \n"
                     "seq_page_cost=%.2f, random_page_cost=%.2f, 
cpu_tuple_cost=%.4f, cpu_index_tuple_cost=%.4f, cpu_operator_cost=%.4f, 
effective_cache_size=%.0f\n"
                     "indexStartupCost=%.2f, indexTotalCost=%.2f, 
indexSelectivity=%.8f, indexCorrelation=%.8f\n"
                     "baserel->tuples=%.2f, baserel->pages=%.2f, 
baserel->allvisfrac=%.8f\n"
                     "tuples_fetched=%.2f, pages_fetched=%.2f\n"
                     "max_IO_cost=%.4f, min_IO_cost=%.4f, csquared=%.4f\n"
                     "qpqual_cost.startup=%.4f, 
qpqual_cost.per_tuple=%.4f, cpu_per_tuple=%.4f\n"
                     "spc_seq_page_cost=%.2f, spc_random_page_cost=%.2f\n"
                     "startup_cost=%.2f, total_cost=%.2f\n",

                     seq_page_cost, random_page_cost, cpu_tuple_cost, 
cpu_index_tuple_cost, cpu_operator_cost, (double)effective_cache_size,
                     indexStartupCost, indexTotalCost, indexSelectivity, 
indexCorrelation,
                     baserel->tuples, (double) baserel->pages, 
baserel->allvisfrac,
                     tuples_fetched, pages_fetched,
                     max_IO_cost, min_IO_cost, csquared,
                     qpqual_cost.startup, qpqual_cost.per_tuple, 
cpu_per_tuple,
                     spc_seq_page_cost, spc_random_page_cost,
                     startup_cost, startup_cost + run_cost
                     )));

Regards,
Vitaliy


В списке pgsql-performance по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Bitmap scan is undercosted? - overestimated correlation andcost_index
Следующее
От: Matthew Hall
Дата:
Сообщение: Re: insert and query performance on big string table with pg_trgm