The data is not distributed well...
Top 20 occurrences of fmin and fmax:
fmin | count
----------+--------
0 | 214476
19281576 | 2870
2490005 | 2290
1266332 | 2261
15539680 | 2086
11022233 | 2022
25559658 | 1923
3054411 | 1906
10237885 | 1890
13827272 | 1876
19187021 | 1847
18101335 | 1845
1518230 | 1843
21199488 | 1842
1922518 | 1826
1216144 | 1798
25802126 | 1762
8307335 | 1745
21271866 | 1736
8361667 | 1721
fmax | count
----------+--------
25 | 197551
21272002 | 547
21271988 | 335
21271969 | 321
6045781 | 247
1339301 | 243
21669151 | 235
7779506 | 232
2571422 | 229
7715946 | 228
27421323 | 222
7048089 | 221
87364 | 219
13656535 | 217
26034147 | 214
19184612 | 213
7048451 | 213
21668877 | 213
6587492 | 212
9484598 | 212
Also, out of 5.7 million rows there are 1.6 million unique fmin and 1.6
million unique fmax values.
Thanks,
Tom
-----Original Message-----
From: Craig James [mailto:craig_james@emolecules.com]
Sent: Friday, June 29, 2007 12:14 PM
To: Dolafi, Tom
Cc: Tom Lane; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space
in 8.2.3
Dolafi, Tom wrote:
> min(fmin) | max(fmin) | avg(fmin)
> 1 | 55296469 | 11423945
>
> min(fmax) | max(fmax) | avg(fmax)
> 18 | 55553288 | 11424491
>
> There are 5,704,211 rows in the table.
When you're looking for weird index problems, it's more interesting to
know if there are certain numbers that occur a LOT. From your
statistics above, each number occurs about 10 times in the table. But
do some particular numbers occur thousands, or even millions, of times?
Here is a query that will print a list of the highest-occuring values.
You might expect a few occurances of 20, and maybe 30, but if you have
thousands or millions of occurances of certain numbers, then that can
screw up an index.
select fmax, c from
(select fmax, count(fmax) as c from your_table group by fmax) as foo
where c > 3 order by c desc;
Craig