Обсуждение: Prefix compression of B-Tree keys
On Thu, Jan 30, 2014 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> On more occasions than I care to recall, someone has suggested that it >> would be valuable to do something with strxfrm() blobs in order to >> have cheaper locale-aware text comparisons. One obvious place to do so >> would be in indexes, but in the past that has been dismissed on the >> following grounds: > >> * Index-only scans need fully formed datums to work, and strxfrm() is >> a one-way function (or so I believe). There is no reason to think that >> the original string can be reassembled from the blob, so clearly that >> won't fly. > >> * There is a high cost to be paid in storage overhead. Even for >> collations like "en_US.UTF-8", that can mean that the blob is as much >> as 3-4 times larger than the original text string. Who is to say that >> we'll come out ahead even with the savings of just doing a strcmp() >> rather than a strcoll()? > > Quite aside from the index bloat risk, this effect means a 3-4x reduction > in the maximum string length that can be indexed before getting the > dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error. > Worse, a value insertion might well succeed, with the failure happening > only (much?) later when that entry is chosen as a page split boundary. > > It's possible that TOAST compression of the strings would save you, but > I'm not convinced of that; it certainly doesn't seem like we could > guarantee no post-insertion failures that way. Maybe not TOAST compression, but prefix compression. I've been wondering lately, whether a format change in the B-Tree could be worth the effort: store values with prefix compression. It would certainly help indexes of many kinds (text-valued, multi-column, etc...). Now, I haven't checked if it's already done. Sorry if it is. I did mock around btree code a lot and don't remember any of this, but I do remember stuff that could be used to achieve it (specifically, all the index-only-scan machinery, which allows for implicit info). Granted, this isn't strxfrm, but it may provide some of the benefits (faster comparisons because less bytes are compared?). On Fri, Jan 31, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote: > We're already suffering from the fact > that strcoll() mandates a NULL-terminated string, since we have to > copy text datums to a buffer to deal with the impedance mismatch. > Furthermore, multi-column comparisons probably have a lot of overhead > at present from all the indirection alone. Which makes the above even more likely to help.
On Thu, Jan 30, 2014 at 9:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > Maybe not TOAST compression, but prefix compression. I've thought about it as well. It's totally feasible, and likely worthwhile, but a little more tricky. > I've been wondering lately, whether a format change in the B-Tree > could be worth the effort: store values with prefix compression. It > would certainly help indexes of many kinds (text-valued, multi-column, > etc...). You can play tricks with the AM version, generally stored in the meta page, as in the recent commit 36a35c550ac114caa423bcbe339d3515db0cd957, without breaking pg_upgrade. Although a lot of these techniques may not actually require that kind of additional effort. > Now, I haven't checked if it's already done. Sorry if it is. I did > mock around btree code a lot and don't remember any of this, but I do > remember stuff that could be used to achieve it (specifically, all the > index-only-scan machinery, which allows for implicit info). I think it would make sense to do it on leaf pages, where there is frequently a lot of redundancy. The reason that I myself wouldn't do it first is that offhand I think that it'd be harder to come up with a very generic infrastructure to make it work across diverse types, and it isn't that useful where there isn't much redundancy in prefixes. The B-Tree code doesn't really know anything about the type indexed, and that's a useful property. But it's still definitely a useful goal. -- Peter Geoghegan
On Fri, Jan 31, 2014 at 3:51 AM, Peter Geoghegan <pg@heroku.com> wrote: >> Now, I haven't checked if it's already done. Sorry if it is. I did >> mock around btree code a lot and don't remember any of this, but I do >> remember stuff that could be used to achieve it (specifically, all the >> index-only-scan machinery, which allows for implicit info). > > I think it would make sense to do it on leaf pages, where there is > frequently a lot of redundancy. The reason that I myself wouldn't do > it first is that offhand I think that it'd be harder to come up with a > very generic infrastructure to make it work across diverse types, and > it isn't that useful where there isn't much redundancy in prefixes. > The B-Tree code doesn't really know anything about the type indexed, > and that's a useful property. But it's still definitely a useful goal. Well, for multi-column it should be really simple. You already have all the necessary information to decide on the prefix (the common prefix columns of leftmost and rightmost keys). It's harder for text columns, since you must abstract out the "common prefix check","append prefix", and "remove prefix" operations. That would probably require extending opclasses.