Re: The standard 'why does it take so long' question
| От | Nigel J. Andrews |
|---|---|
| Тема | Re: The standard 'why does it take so long' question |
| Дата | |
| Msg-id | Pine.LNX.4.21.0208091136240.3235-100000@ponder.fairway2k.co.uk обсуждение исходный текст |
| Ответ на | Re: The standard 'why does it take so long' question (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: The standard 'why does it take so long' question
|
| Список | pgsql-general |
First I'll just say thanks to Joe Conway for the suggestion of an index on the
poster_id column. Although I didn't show it I do have an index on that field as
well, however, as I understand it the index on (poster_id,time) should fulfil
this same function.
On Fri, 9 Aug 2002, Tom Lane wrote:
> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > The first question is why would the index scan on chat_user take
> > significantly longer than before?
>
> I think the indexscan is having to skip over more irrelevant data in the
> second case. Think about it: you can scan an index range consisting of
> a single user's posts between times T1 and T2, or you can scan an index
> range consisting of all posts between times T1 and T2. The second
> column of the index will save you from actually going to the heap for
> posts from other users, but you'll still have to pass over those index
> entries, because the contiguous range of index entries that covers the
> data you want will include a a lot of posts from other users.
That was the idea that popped into my head while I was writing the questions so
that certainly seems reasonable to me.
>
> What interests me is why the planner chose the second index when it
> had a choice; I'd have thought its cost models were good enough to
> handle this subtlety. Can you post the pg_stats rows for the columns
> in question?
I really don't know sufficient to be able to see why the stats would favour one
index over the other. Although looking at the pg_stats entries below now I
notice that the correlation for the time column is 1, compared to 0.058 for the
poster_id. As there is a natural order to the data, i.e. it's loaded in time
order, I am not entirely surprised that there's a high correlation, just
slightly surprised the stats gatherer picked up on it.
Note that the poster_ids for the example query are also shown below and are not
listed in the most_common_vals array. I've also shown the human understandable
values for the time limits in the query and due to the nature/distribution of
this column it's unsurprising to find that I managed to pick a time period that
didn't include one of the more common times.
So, what's making the planner choose the slower index is the correlation figure
right?
archive=> select * from pg_stats where attname = 'time' or attname = 'poster_id' and tablename = 'chat_post';
tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds
| correlation
-----------+-----------+-----------+-----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
chat_post | poster_id | 0 | 2 | 338 | {2149,731,1130,2595,1879,1656,688,1842,2473,1747}
|
{0.0603333,0.0306667,0.0306667,0.03,0.0286667,0.0276667,0.0273333,0.025,0.02,0.0196667}|
{4,252,521,819,1112,1423,1639,1820,2096,2427,2879}
| 0.0583958
chat_post | time | 0 | 8 | -0.350654 | {"1998-03-26 11:09:00-05","1999-05-13
00:49:00-04","2000-12-1413:05:00-05","2001-01-25 12:18:00-05","2002-04-04 11:37:00-05","2002-05-28
15:21:00-04","2002-07-1009:53:00-04"} |
{0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667} | {"1998-03-06
01:26:00-05","1998-08-2111:00:00-04","1999-02-22 15:15:00-05","1999-07-22 05:37:00-04","2000-01-24
09:53:00-05","2000-10-2317:45:00-04","2001-02-26 16:22:00-05","2001-06-28 20:19:00-04","2001-12-06
14:06:00-05","2002-04-2616:05:00-04","2002-08-09 00:10:00-04"} | 1
(2 rows)
archive=> select * from chat_user where name in ('thrifty', 'hope1');
id | name | lower_name
------+---------+------------
2494 | thrifty | thrifty
1112 | hope1 | hope1
(2 rows)
archive=> select 'epoch'::timestamptz + '959904000 seconds'::interval as from, 'epoch'::timestamptz + '1023667200
seconds'::intervalas to;
from | to
------------------------+------------------------
2000-06-02 01:00:00+01 | 2002-06-10 01:00:00+01
(1 row)
Thanks for the input Tom.
--
Nigel J. Andrews
Director
---
Logictree Systems Limited
Computer Consultants
В списке pgsql-general по дате отправления: