Thanks to everybody's input -- as a first-time poster to this listserv,
I wasn't sure how long it would take to get a response. ;)
I was frankly astonished to see that the composite index on (a,b) was
used when I searched for (a), but Chris' response makes total sense.
In this case, I don't want to go with a MAP due to the fact that I'm
actually using Java Hibernate to generate this schema and access it.
My sample query of using RANDOM() to select a random subset of the
overall outlets was actually to try and defeat any prior caching of
results, and give a more reasonable measurement -- but I didn't realize
the implications. I had thought that coupled with a MAX clause at the
end it would simply randomize and then bail out early instead of a full
table scan - so thanks to Merlin for pointing that out.
I'll go with a 2nd index on MEDIA_ID and do some measurements of speed
increase, but it makes a lot more sense now.
thank you Postgres gurus! :D
regards
Bill
On 5/21/12 5:11 PM, Merlin Moncure wrote:
> On Mon, May 21, 2012 at 3:36 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
>>> So you can get fully index lookups on all of a, b, ab, and ba. the
>>> primary key can't optimize ba because indexes only fully match if
>>> candidate fields are supplied from left to right order. They can
>>> still help somewhat, but to a lesser degree.
>> BTW, I would like to know is it worth it to create 3rd index on map(a)
>> to reduce the size of the index which will be used by the planer
>> to save some server's RAM (obviously, at the cost of extra disk space) ?
> What Dmitriy is talking about here is that even though an index on
> (a,b) can efficiently (in terms of searching through the tree) match
> terms on just 'a', you still pay a price because the entries on the
> index have to store the data for b as well, So even though it's
> algorithmically efficient you have to browse more data to do it which
> pressures RAM. In other words, an index on just 'a' is ideal for
> searches on just 'a', although a,b is much better than (b,a) or no
> index at all.
>
> I personally think that generally it's better not to do that in most
> cases especially if you're indexing integer keys since you're not
> making *that* much difference on the overall index size. Also,
> primary key indexes are much more likely to have to stay 'hot' in the
> cache anyways since they will be serving fkey reference lookups and
> stuff like that so in the end you might be consuming *more* ram, not
> less.
>
> An exception might be if your key on a,b has a very small 'a' and a
> very large 'b'. But that's pretty rare in practice and it's usually a
> good idea to avoid indexing large fields if you can help it. It
> really depends on the workload.
>
> merlin