On Wed, 2 Oct 2013 11:19:58 -0400, Kevin Grittner <kgrittn@ymail.com> wrote:
> David Rysdam <drysdam@ll.mit.edu> wrote:
>
> > We have a by-our-standards large table (about 40e6 rows). Since it is
> > the bottleneck in some places, I thought I'd experiment with
> > partitioning.
>
> In my personal experience I have gone into hundreds of millions of
> rows with good performance without partitioning. It's all about
> designing good indexes for the workload.
Well, our performance is still good. Certainly better than a lot of
projects I've seen even with less data. But it's still our "worst" table
and I have some free time to experiment...
> > Because object number figure so prominently, I thought I'd partition on
> > that. To me, it makes the most sense from a load-balancing perspective
>
> Load balancing? Hitting a single partition more heavily improves
> your cache hit ratio. What sort of benefit are you expecting from
> spreading the reads across all the partitions? *Maybe* that could
> help if you carefully placed each partition table on a separate set
> of spindles, but usually you are better off having one big RAID so
> that every partition is spread across all the spindles
> automatically.
Now that you spell it out, I guess that does make more sense. I had some
vague notion of tables "doing work" but really if it can load one
partition into RAM and get most of my hits from there, it'd be a big
win.
> > Lower numbers are going to be queried much less often than higher
> > numbers.
>
> This suggests to me that you *might* get a performance boost if you
> define partitions on object number *ranges*. It still seems a bit
> dubious, but it has a chance.
Would the planner be smart enough to figure out ranges without me having
to "hint" my queries?
In any case, my speed tests are coming out the opposite what I
expected. Within-partition queries are taking longer than the whole
table did while across-partition queries are faster. I'll have to do
more thinking on that.