On 1/22/08, Richard Huxton <dev@archonet.com> wrote:
> Although the row-estimate still seems quite high. You might want to
> increase it even further (maximum is 1000). If this is a common query,
> I'd look at an index on (user,id) rather than just (user) perhaps.
Actually that index (with the same statistics setting as before)
yields slightly worse performance:
# explain analyze select max(id) from user_messages where user_id = 13604;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.86..3.87 rows=1 width=0) (actual time=0.051..0.052
rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.86 rows=1 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
-> Index Scan Backward using user_messages_user_id_id on
user_messages (cost=0.00..1486.79 rows=385 width=4) (actual
time=0.042..0.042 rows=0 loops=1)
Index Cond: (user_id = 13604)
Filter: (id IS NOT NULL)
Total runtime: 0.128 ms
Compare with the plain index on the one attribute:
# explain analyze select max(id) from user_messages where user_id = 13604;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1388.34..1388.35 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
-> Index Scan using user_messages_user on user_messages
(cost=0.00..1387.40 rows=374 width=4) (actual time=0.030..0.030 rows=0
loops=1)
Index Cond: (user_id = 13604)
Total runtime: 0.085 ms
> > But it's worrying that PostgreSQL should be so off in planning the
> > query. Does this behaviour qualify as a bug, or is this -- that is,
> > the need to tweak statistics parameters -- just your garden-variety
> > application-specific optimization?
>
> Well, it's data-specific rather than application specific I suppose. The
> issue is that there is a cost to tracking 100 values and you don't
> want to pay that on every column in every table. If user 13604 isn't in
> the list of most-common users then all it can really do is fix an upper
> bound on how many matches it can have. Of course you and I can reason
> outside of the data and guess that manu users won't have more than a
> handful of messages, but that's not something PG can do.
Absolutely. Thanks for the pointers.
Alexander.