Обсуждение: Cluster vs. non-cluster query planning
I'm running postgres 8.0.7, and I've got a table of orders with about
100,000 entries. I want to just look at the new orders, right now 104 of
them.
EXPLAIN ANALYZE
SELECT
order_id
FROM
orders
WHERE
order_statuses_id = (SELECT id FROM order_statuses WHERE id_name
= 'new');
Seq Scan on orders o (cost=1.20..11395.51 rows=7029 width=8) (actual
time=286.038..287.662 rows=104 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.030..0.039 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 288.102 ms
The dreaded sequential scan. I've got an index on order_statuses_id and
I've VACUUM ANALYZEd the table, but I'm currently clustered on the
primary key (order_id).
With enable_seqscan = off, I get:
-------------------------------------------------
Index Scan using orders_status_btree_idx on orders o
(cost=4.64..12457.14 rows=7031 width=8) (actual time=0.164..0.664
rows=104 loops=1)
Index Cond: (order_statuses_id = $0)
InitPlan
-> Index Scan using order_statuses_id_name_key on order_statuses
(cost=0.00..4.64 rows=1 width=4) (actual time=0.128..0.134 rows=1 loops=1)
Index Cond: ((id_name)::text = 'new'::text)
Total runtime: 1.108 ms
If I hard-code the 'new' status ID, I get:
-------------------------------------------------
EXPLAIN ANALYZE
SELECT
order_id
FROM
orders
WHERE
order_statuses_id = 1;
Index Scan using orders_status_btree_idx on orders o
(cost=0.00..4539.65 rows=1319 width=8) (actual time=0.132..1.883
rows=104 loops=1)
Index Cond: (order_statuses_id = 1)
Total runtime: 2.380 ms
Here is the pg_stats entry for orders.order_statuses_id:
schemaname | tablename | attname | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs
| histogram_bounds | correlation
------------+-----------+-------------------+-------------+-----------+------------+------------------+--------------------------------------+-------------------------+-------------
public | orders | order_statuses_id | 0.000208333 | 4
| 14 | {8,24,10,25} | {0.385417,0.242083,0.230625,0.07875} |
{1,7,7,9,9,9,9,9,23,26} | 0.740117
This is with SET STATISTICS = 16 on the column, since that's how many
different values the column can currently take.
Now, here's the thing - if I cluster on the index on order_statuses_id,
the original query produces:
Index Scan using orders_status_btree_idx on orders o
(cost=1.20..978.94 rows=8203 width=8) (actual time=0.097..0.598 rows=104
loops=1)
Index Cond: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.056..0.065 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 1.042 ms
Estimated cost went way down. The pg_stats entry becomes:
schemaname | tablename | attname | null_frac | avg_width |
n_distinct | most_common_vals | most_common_freqs
| histogram_bounds | correlation
------------+-----------+-------------------+-----------+-----------+------------+------------------+----------------------------------------+---------------------+-------------
public | orders | order_statuses_id | 0 | 4
| 12 | {8,24,10,25} | {0.386458,0.244167,0.238333,0.0720833}
| {1,7,7,9,9,9,22,26} | 1
I'm hesitant to cluster on the order_statuses_id index, because there
are a lot of other queries using this table, many of which join on
order_id. I also feel like I ought to be able to get the planner to do
an index scan without hard-coding the order_statuses_id value.
Questions:
* What can I do to reduce the estimated row count on the query?
* Why does clustering drive down the estimated cost for the index scan
so much? Does a change in correlation from .72 to 1 make that much of a
difference?
* Can I convince my query planner to index scan without clustering on
the order_statuses_id index, or setting enable_seqscan = off?
Potential note of interest: This is a very wide, monolithic table - no
less than 100 columns, with several check constraints, foreign key
constraints, and indexes, including three functional indexes.
Side question: Sometimes, when I VACUUM ANALYZE the table, the pg_stats
entry for order_statuses_id has almost all of the possible values in
most_common_vals, instead of just a handful. Example:
schemaname | tablename | attname | null_frac | avg_width |
n_distinct | most_common_vals
|
most_common_freqs
| histogram_bounds | correlation
------------+-----------+-------------------+-----------+-----------+------------+-----------------------------------+------------------------------------------------------------------------------------------------------------------------------+------------------+-------------
public | orders | order_statuses_id | 0 | 4
| 13 | {8,24,10,25,9,7,23,26,1,22,2,5,4} |
{0.393125,0.240208,0.226042,0.07875,0.0275,0.0145833,0.0110417,0.00291667,0.00229167,0.001875,0.000625,0.000625,0.000416667}
| | 1
This doesn't appear to influence whether the index scan is chosen, but I
am curious as to why this happens.
> Questions:
> * What can I do to reduce the estimated row count on the query?
> * Why does clustering drive down the estimated cost for the index scan
> so much? Does a change in correlation from .72 to 1 make that much of
> a difference?
> * Can I convince my query planner to index scan without clustering on
> the order_statuses_id index, or setting enable_seqscan = off?
After some more digging on the mailing list, I found some comments on
effective_cache_size. Bringing it up from the default of 1000 does pust
the estimated cost for the index scan below that of the sequential scan,
but not by much.
With SET effective_cache_size = 1000:
Seq Scan on orders o (cost=1.20..11395.53 rows=7029 width=8) (actual
time=280.148..281.512 rows=105 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.012..0.020 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 281.700 ms
With SET effective_cache_size = 10000:
Index Scan using orders_status_btree_idx on orders o
(cost=1.20..9710.91 rows=7029 width=8) (actual time=0.050..0.372
rows=105 loops=1)
Index Cond: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.016..0.024 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
The ratios between estimated costs are still nowhere near the ratio of
actual costs.
Nolan Cafferky <Nolan.Cafferky@rbsinteractive.com> writes:
> After some more digging on the mailing list, I found some comments on
> effective_cache_size. Bringing it up from the default of 1000 does pust
> the estimated cost for the index scan below that of the sequential scan,
> but not by much.
The first-order knob for tuning indexscan vs seqscan costing is
random_page_cost. What have you got that set to?
regards, tom lane
Tom Lane wrote:
>The first-order knob for tuning indexscan vs seqscan costing is
>random_page_cost. What have you got that set to?
>
>
This is currently at the default of 4. All of my other planner cost
constants are at default values as well. Dropping it to 1 drops the
estimated cost by a comparable ratio:
Index Scan using orders_status_btree_idx on orders o
(cost=1.20..3393.20 rows=7026 width=8) (actual time=0.050..0.314
rows=105 loops=1)
Index Cond: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4)
(actual time=0.017..0.025 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 0.498 ms
But, I'm guessing that random_page_cost = 1 is not a realistic value.
Nolan Cafferky <Nolan.Cafferky@rbsinteractive.com> writes:
> But, I'm guessing that random_page_cost = 1 is not a realistic value.
Well, that depends. If all your data can be expected to fit in memory
then it is a realistic value. (If not, you should be real careful not
to make performance decisions on the basis of test cases that *do* fit
in RAM...)
In any case, if I recall your numbers correctly you shouldn't need to
drop it nearly that far to get the thing to make the right choice.
A lot of people run with random_page_cost set to 2 or so.
regards, tom lane
Tom Lane wrote:
I did learn why the estimated row count was so high. This is new knowledge to me, so I'm going to share it.
SELECT reltuples FROM pg_class WHERE relname = 'orders'; -> produces 98426.
SELECT n_distinct FROM pg_stats WHERE tablename = 'orders' and attname = 'order_statuses_id'; -> currently 13.
Seq Scan on orders o (cost=1.20..11395.53 rows=7570 width=8) (actual time=283.599..285.031 rows=105 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4) (actual time=0.031..0.038 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 285.225 ms
(98426 / 13)::integer = 7571 ~= 7570, the estimated row count.
So the query planner isn't able to combine the knowledge of the id value from order_statuses with most_common_vals, most_common_freqs, or histogram_bounds from pg_stats. That seems a little odd to me, but maybe it makes sense. I suppose the planner can't start executing parts of the query to aid in the planning process.
In the future, I will probably pre-select from order_statuses before executing this query.
Thanks!
Thanks for the advice. I will check what changing random_page_cost does for the rest of the queries on our system.Nolan Cafferky <Nolan.Cafferky@rbsinteractive.com> writes:But, I'm guessing that random_page_cost = 1 is not a realistic value.Well, that depends. If all your data can be expected to fit in memory then it is a realistic value. (If not, you should be real careful not to make performance decisions on the basis of test cases that *do* fit in RAM...) In any case, if I recall your numbers correctly you shouldn't need to drop it nearly that far to get the thing to make the right choice. A lot of people run with random_page_cost set to 2 or so.
I did learn why the estimated row count was so high. This is new knowledge to me, so I'm going to share it.
SELECT reltuples FROM pg_class WHERE relname = 'orders'; -> produces 98426.
SELECT n_distinct FROM pg_stats WHERE tablename = 'orders' and attname = 'order_statuses_id'; -> currently 13.
Seq Scan on orders o (cost=1.20..11395.53 rows=7570 width=8) (actual time=283.599..285.031 rows=105 loops=1)
Filter: (order_statuses_id = $0)
InitPlan
-> Seq Scan on order_statuses (cost=0.00..1.20 rows=1 width=4) (actual time=0.031..0.038 rows=1 loops=1)
Filter: ((id_name)::text = 'new'::text)
Total runtime: 285.225 ms
(98426 / 13)::integer = 7571 ~= 7570, the estimated row count.
So the query planner isn't able to combine the knowledge of the id value from order_statuses with most_common_vals, most_common_freqs, or histogram_bounds from pg_stats. That seems a little odd to me, but maybe it makes sense. I suppose the planner can't start executing parts of the query to aid in the planning process.
In the future, I will probably pre-select from order_statuses before executing this query.
Thanks!
On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote: > Nolan Cafferky <Nolan.Cafferky@rbsinteractive.com> writes: > > But, I'm guessing that random_page_cost = 1 is not a realistic value. > > Well, that depends. If all your data can be expected to fit in memory > then it is a realistic value. (If not, you should be real careful not > to make performance decisions on the basis of test cases that *do* fit > in RAM...) > > In any case, if I recall your numbers correctly you shouldn't need to > drop it nearly that far to get the thing to make the right choice. > A lot of people run with random_page_cost set to 2 or so. Also, the index scan cost estimator comments indicate that it does a linear interpolation between the entimated cost for a perfectly correlated table and a table with 0 correlation, but in fact the interpolation is exponential, or it's linear based on the *square* of the correlation, which just doesn't make much sense. I did some investigating on this some time ago, but never got very far with it. http://stats.distributed.net/~decibel/summary.txt has some info, and http://stats.distributed.net/~decibel/ has the raw data. Graphing that data, if you only include correlations between 0.36 and 0.5, it appears that there is a linear correlation between correlation and index scan time. Of course this is very coarse data and it'd be great if someone did more research in this area, preferably using pg_bench or other tools to generate the data so that others can test this stuff as well. But even with as rough as this data is, it seems to provide a decent indication that it would be better to actually interpolate linearly based on correlation, rather than correlation^2. This is a production machine so I'd rather not go mucking about with testing such a change here. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461