Обсуждение: Planner's choice
Ok, I'm not sure why the planner is making the choices it is.
Given:
From pg_stats:
 tablename |   attname   | null_frac | avg_width | n_distinct |
most_common_vals                                                       |
most_common_freqs                                          |
                                                                      histogram_bounds
                                                                                                | correlation 
-----------+-------------+-----------+-----------+------------+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 chat_post | poster_id   |         0 |         2 |        341 | {2149,1130,731,2595,1879,2473,1842,688,521,1656}
                                                                      |
{0.066,0.039,0.0276667,0.0256667,0.0253333,0.023,0.0226667,0.022,0.021,0.021}                        |
{4,252,582,896,1162,1526,1747,1907,2114,2472,2946}
                                  |    0.036261 
 chat_post | time        |         0 |         8 |  -0.417335 | {"1998-07-08 15:09:00-04","1999-02-26
19:31:00-05","2000-01-2712:07:00-05","2001-05-24 14:30:00-04","2002-01-22 10:04:00-05"} |
{0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}                                        | {"1998-03-05
22:47:00-05","1998-08-2911:49:00-04","1999-03-19 19:39:00-05","1999-08-23 09:26:00-04","2000-03-07
13:45:00-05","2000-11-2915:48:00-05","2001-04-11 11:32:00-04","2001-08-31 15:43:00-04","2002-02-01
12:40:00-05","2002-06-1906:32:00-04","2002-11-12 04:44:00-05"} |           1 
with the table definition:
                 Table "chat_post"
   Column    |           Type           | Modifiers
-------------+--------------------------+-----------
 session_id  | smallint                 | not null
 poster_id   | smallint                 | not null
 time        | timestamp with time zone | not null
 post_number | smallint                 | not null
 fts         | txtidx                   |
Indexes: chat_post_text_idx,
         chat_post_time_idx,
         chat_post_timeuser_idx,
         chat_post_user_idx,
         chat_post_usertime_idx
Primary key: chat_post_pkey
Triggers: RI_ConstraintTrigger_23712080,
          RI_ConstraintTrigger_23706474
where chat_post_timeuser_idx is defined on the columns (time,poster_id)
and chat_post_usertime_idx is defined on the columns (poster_id,time)
Why is the planner not choosing the user_time index? For the first of these I
suspect it's the sky high correlation number on the time column but for the
other two?
avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2;
NOTICE:  QUERY PLAN:
Limit  (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1)
  ->  Index Scan Backward using chat_post_time_idx on chat_post  (cost=0.00..42370.93 rows=2616 width=46) (actual
time=96204.49..96204.64rows=3 loops=1) 
Total runtime: 96205.18 msec
EXPLAIN
avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by user, time desc limit 2;
NOTICE:  QUERY PLAN:
Limit  (cost=10262.07..10262.07 rows=2 width=46) (actual time=17400.89..17400.95 rows=2 loops=1)
  ->  Sort  (cost=10262.07..10262.07 rows=2616 width=46) (actual time=17400.85..17400.88 rows=3 loops=1)
        ->  Index Scan using chat_post_user_idx on chat_post  (cost=0.00..10113.60 rows=2616 width=46) (actual
time=99.53..16327.97rows=3372 loops=1) 
Total runtime: 17450.35 msec
EXPLAIN
avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc, user limit 2;
NOTICE:  QUERY PLAN:
Limit  (cost=10262.07..10262.07 rows=2 width=46) (actual time=1109.68..1109.73 rows=2 loops=1)
  ->  Sort  (cost=10262.07..10262.07 rows=2616 width=46) (actual time=1109.65..1109.67 rows=3 loops=1)
        ->  Index Scan using chat_post_user_idx on chat_post  (cost=0.00..10113.60 rows=2616 width=46) (actual
time=1.47..598.56rows=3372 loops=1) 
Total runtime: 1166.65 msec
               (don't forget this is a bogus time due to the caching)
EXPLAIN
This is part of a larger query/issue. By redoing a query sprinkling 'limit's
throughout I managed to make a >90s query complete in 1s (with the loss of the
total number of results that would be returned). However, I then noticed that
was a feature of the particular poster_id I'd been testing with. Picking a far
less frequent poster_id pushed the time up to >300s. The reason for this was
the planner's choice of chat_post_time_idx which for low numbers of occurances
of poster_id was effectively kicking off a seqscan through an indexscan. My
plan now is to maintain my own set of poster_id stats and use one of several
query variants depending on what they say but this requires at least some
understanding of the choices made by the planner.
BTW, the largest number of occurances of a single poster_id is still only 6% of
the entire table. So changing the stats gathering shouldn't make any
difference? There's also >2000 distinct values of poster_id, not the 341
estimated by the stats, but again this shouldn't matter?
--
Nigel J. Andrews
			
		"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> where chat_post_timeuser_idx is defined on the columns (time,poster_id)
> and chat_post_usertime_idx is defined on the columns (poster_id,time)
> Why is the planner not choosing the user_time index [for]
> avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2;
> NOTICE:  QUERY PLAN:
> Limit  (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1)
>   ->  Index Scan Backward using chat_post_time_idx on chat_post  (cost=0.00..42370.93 rows=2616 width=46) (actual
time=96204.49..96204.64rows=3 loops=1) 
> Total runtime: 96205.18 msec
If you'd said "order by poster_id desc, time desc" then that index would be
considered to match the ORDER BY clause, and so would be usable in this
same type of plan.  As-is, the index is only useful for matching
poster_id and not for obtaining the required order, so the only plan
type considered for it involves an explicit sort step, which isn't
considered a win for the estimated number of rows matching the poster_id.
> My plan now is to maintain my own set of poster_id stats and use one
> of several query variants depending on what they say but this requires
> at least some understanding of the choices made by the planner.
Rather than maintaining your own stats, consider boosting the statistics
target for the poster_id column.  You probably want the pg_stats info to
cover all the poster_ids that account for more than 1% of the entries.
The n_distinct value should improve too, producing a better estimate for
the infrequent poster_ids even though they're not explicitly stored.
            regards, tom lane
			
		On Wed, 13 Nov 2002, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > where chat_post_timeuser_idx is defined on the columns (time,poster_id) > > and chat_post_usertime_idx is defined on the columns (poster_id,time) > > > Why is the planner not choosing the user_time index [for] > > > avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2; > > NOTICE: QUERY PLAN: > > > Limit (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1) > > -> Index Scan Backward using chat_post_time_idx on chat_post (cost=0.00..42370.93 rows=2616 width=46) (actual time=96204.49..96204.64rows=3 loops=1) > > Total runtime: 96205.18 msec > > If you'd said "order by poster_id desc, time desc" then that index would be > considered to match the ORDER BY clause, and so would be usable in this > same type of plan. As-is, the index is only useful for matching > poster_id and not for obtaining the required order, so the only plan > type considered for it involves an explicit sort step, which isn't > considered a win for the estimated number of rows matching the poster_id. Ok, I see. Thinking about this a similar question may have been on the list several months ago. I was just thinking in terms that the first column selects out of a btree on only that column and that each leaf node in that tree contains a btree of the second column. Given that structure I thought it would be reasonable for the planner to choose what I was expecting it to, but then I was guessing. > > > My plan now is to maintain my own set of poster_id stats and use one > > of several query variants depending on what they say but this requires > > at least some understanding of the choices made by the planner. > > Rather than maintaining your own stats, consider boosting the statistics > target for the poster_id column. You probably want the pg_stats info to > cover all the poster_ids that account for more than 1% of the entries. > The n_distinct value should improve too, producing a better estimate for > the infrequent poster_ids even though they're not explicitly stored. > For completeness I've attached the real [rewritten] query. The original was much more simply of the form: select collist from chat_post p, chat_user u where p.poster_id = u.id and p.time > X and p.time < Y and u.name = 'someone' order by p.time, p.post_number I tried increasing the stats gathering. 2000 and 1000 targets ran out of memory and 500 made no difference. The lowest of the top 500 targets contained way less than 1%. The problem being that the original or rewritten forms either start at time X and scan the index until time Y or do a seqscan. In the rewritten form with the limit this is a big win where the result 'quota' is fill early in the time interval but a huge loss when not. This is because the index scan results in the entire table being scanned anyway, with the exists subselect running for each tuple. So what I am trying to achieve is a query where both high and low occuring poster_ids can be retrieved in a reasonable time and I can't see how to do that without having more than one form of the query and submitting which ever one is going to force use of whatever index I think most appropiate. This I think is reasonable, despite requiring an extra query to get the stats, since as the DBA I know more about the data than the general usage optimser does. I'm happy to take other suggests of course and you've already made me think of something I haven't tried. -- Nigel J. Andrews