On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <adam@starsilk.net> wrote:
> Richard Huxton wrote:
> > How many distinct values do you have in groups.groupid and
> group_access.group_id?
> >
>
> for the small database (since it shows the same problem):
>
> group_access: 280/268
> groups: 2006/139
>
> for the large database:
>
> group_access: same
> groups: 1712647/140
>
> the groupid key is an MD5 hash, so it should be uniformly distributed.
> maybe that throws the stats? but, again, char works, varchar doesn't :-(
OK, I'm thinking the varchar/char part is not the issue.
the database is very unbalanced, most of the groups are 1000 or less
records, with one group occupying 95% of the records.
I *think* that when I analyze using char instead of varchar, it is
recording a stat for the large group, but for some reason with varchar
doesn't add a stat for that one.
so, the real question is, how do I fix this? I can turn the stats way
up to 1000, but that doesn't guarantee that I'll get a stat for the
large group :-(
can I turn the statistics off completely for this column? I'm guessing
that if I can, that will mean it takes a guess based on the number of
distinct values in the groups table, which is still large number of
records, possibly enough to trigger the seqscan anyway.
does postgres have a way of building a 'counted index' that the
planner can use for it's record counts? some way of forcibly
maintaining a stat for every group?
the groups are not related to one another - is it possible to
partition them into their own indexes somehow?
ahh. lots of questions, no (obvious to me) answers from googling around.