Обсуждение: itemptr_encode/itemptr_decode
What on God's green earth are these functions doing in src/include/catalog/index.h? They don't have any obvious connection to indexes, let alone catalog operations on indexes, which is what that file is for. They weren't there before 2a96909a4, either. regards, tom lane
Hi, On 2019-04-17 18:57:00 -0400, Tom Lane wrote: > What on God's green earth are these functions doing in > src/include/catalog/index.h? > They don't have any obvious connection to indexes, let alone > catalog operations on indexes, which is what that file is for. Well, they were previously declared & defined in src/backend/catalog/index.c - that's where the location is coming from (and where they still are defined). And they're currently only used to implement the index validation scans, which requires the validation scan to decode item pointers stored in the tuplesort presented to it. I'm happy to move them elsewhere, but I'm not sure there's really a good location. I guess we could move them to itemptr.h - but they're not really something particularly generally usable. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-04-17 18:57:00 -0400, Tom Lane wrote: >> What on God's green earth are these functions doing in >> src/include/catalog/index.h? > I'm happy to move them elsewhere, but I'm not sure there's really a good > location. I guess we could move them to itemptr.h - but they're not > really something particularly generally usable. I don't have a better idea than that either, but I sure feel that they don't belong in index.h. Is it worth inventing a whole new header for these? If we stick 'em in itemptr.h, they'll be getting compiled by a whole lot of files :-( As for the general usability argument, I'm not sure --- as we start to look at alternate AMs, we might have more use for them. When I first saw the functions, I thought maybe they were part of sort acceleration for TIDs; evidently they're not (yet), but that seems like another possible use-case. regards, tom lane
Hi, On 2019-04-17 19:22:08 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2019-04-17 18:57:00 -0400, Tom Lane wrote: > >> What on God's green earth are these functions doing in > >> src/include/catalog/index.h? > > > I'm happy to move them elsewhere, but I'm not sure there's really a good > > location. I guess we could move them to itemptr.h - but they're not > > really something particularly generally usable. > > I don't have a better idea than that either, but I sure feel that they > don't belong in index.h. Is it worth inventing a whole new header > for these? If we stick 'em in itemptr.h, they'll be getting compiled > by a whole lot of files :-( itemptr_utils.h? I don't have an opinion on whether we ought to move them in v12 or v13. Don't think there's a beta1 pressure. > As for the general usability argument, I'm not sure --- as we start > to look at alternate AMs, we might have more use for them. When I first > saw the functions, I thought maybe they were part of sort acceleration > for TIDs; evidently they're not (yet), but that seems like another > possible use-case. We ought to use them in a few more places. E.g. nodeTidscan.c's sorting would likely be faster if we used something of that kind. And, what'd probably substantially beneficial, for the junk ctid columns - where they're currently IIRC transported as a by-ref datum, even on 64bit machines. Mildly related: Is there a reason we don't optimize fixed-length !byval datum copies for typlen < sizeof(Datum) to something better than a full palloc? I guess it'd be somewhat of a breaking change? Greetings, Andres Freund
On Wed, Apr 17, 2019 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As for the general usability argument, I'm not sure --- as we start > to look at alternate AMs, we might have more use for them. When I first > saw the functions, I thought maybe they were part of sort acceleration > for TIDs; evidently they're not (yet), but that seems like another > possible use-case. There is also your join-or-to-union patch, which I thought might make use of this for its TID sort. Maybe it would make sense to put this infrastructure in tuplesort.c, but probably not. TIDs are 6 bytes, which as you once pointed out, is not something that we have appropriate infrastructure for (there isn't a DatumGet*() macro, and so on). The encoding scheme (which you originally suggested as an alternative to my first idea, sort support for item pointers) works particularly well as these things go -- it was about 3x faster when everything fit in memory, and faster still with external sorts. It allowed us to resolve comparisons at the SortTuple level within tuplesort.c, but also allowed tuplesort.c to use the pass-by-value datum qsort specialization. It even allowed sorted array entries (TIDs/int8s) to be fetched without extra pointer chasing -- that can be a big bottleneck these days. The encoding scheme is a bit ugly, but I suspect it would be simpler to stick to the same approach elsewhere than to try and hide all the details within tuplesort.c, or something like that. Unless we're willing to treat TIDs as a whole new type of tuple with its own set of specialized functions in tuplesort.c, which has problems of its own, then it's kind of awkward to do it some other way. -- Peter Geoghegan