On Fri, Oct 8, 2010 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>>> values or non-restrictive queries, by ensuring there's at least one
>>> index entry for every row. This has been discussed before; see the TODO
>>> section for GIN.
>
>> That seems like it could waste an awful lot of disk space (and
>> therefore I/O, etc.). No?
>
> How so? In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words. In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows. Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows. There is no way to fix those limitations
> unless it undertakes to have some index entry for every row. That
> will take disk space, but it's *necessary*. (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)
>
> In any case, I would expect that GIN could actually do this quite
> efficiently. What we'd probably want is a concept of a "null word",
> with empty indexable rows entered in the index as if they contained the
> null word. So there'd be just one index entry with a posting list of
> however many such rows there are.
<thinks about it more>
Yeah, I think you're right.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company