Обсуждение: itemptr_encode/itemptr_decode

Поиск
Список
Период
Сортировка

itemptr_encode/itemptr_decode

От
Tom Lane
Дата:
What on God's green earth are these functions doing in
src/include/catalog/index.h?

They don't have any obvious connection to indexes, let alone
catalog operations on indexes, which is what that file is for.

They weren't there before 2a96909a4, either.

            regards, tom lane



Re: itemptr_encode/itemptr_decode

От
Andres Freund
Дата:
Hi,

On 2019-04-17 18:57:00 -0400, Tom Lane wrote:
> What on God's green earth are these functions doing in
> src/include/catalog/index.h?

> They don't have any obvious connection to indexes, let alone
> catalog operations on indexes, which is what that file is for.

Well, they were previously declared & defined in
src/backend/catalog/index.c - that's where the location is coming from
(and where they still are defined). And they're currently only used to
implement the index validation scans, which requires the validation scan
to decode item pointers stored in the tuplesort presented to it.

I'm happy to move them elsewhere, but I'm not sure there's really a good
location. I guess we could move them to itemptr.h - but they're not
really something particularly generally usable.

Greetings,

Andres Freund



Re: itemptr_encode/itemptr_decode

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-17 18:57:00 -0400, Tom Lane wrote:
>> What on God's green earth are these functions doing in
>> src/include/catalog/index.h?

> I'm happy to move them elsewhere, but I'm not sure there's really a good
> location. I guess we could move them to itemptr.h - but they're not
> really something particularly generally usable.

I don't have a better idea than that either, but I sure feel that they
don't belong in index.h.  Is it worth inventing a whole new header
for these?  If we stick 'em in itemptr.h, they'll be getting compiled
by a whole lot of files :-(

As for the general usability argument, I'm not sure --- as we start
to look at alternate AMs, we might have more use for them.  When I first
saw the functions, I thought maybe they were part of sort acceleration
for TIDs; evidently they're not (yet), but that seems like another
possible use-case.

            regards, tom lane



Re: itemptr_encode/itemptr_decode

От
Andres Freund
Дата:
Hi,

On 2019-04-17 19:22:08 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-17 18:57:00 -0400, Tom Lane wrote:
> >> What on God's green earth are these functions doing in
> >> src/include/catalog/index.h?
> 
> > I'm happy to move them elsewhere, but I'm not sure there's really a good
> > location. I guess we could move them to itemptr.h - but they're not
> > really something particularly generally usable.
> 
> I don't have a better idea than that either, but I sure feel that they
> don't belong in index.h.  Is it worth inventing a whole new header
> for these?  If we stick 'em in itemptr.h, they'll be getting compiled
> by a whole lot of files :-(

itemptr_utils.h?  I don't have an opinion on whether we ought to move
them in v12 or v13. Don't think there's a beta1 pressure.


> As for the general usability argument, I'm not sure --- as we start
> to look at alternate AMs, we might have more use for them.  When I first
> saw the functions, I thought maybe they were part of sort acceleration
> for TIDs; evidently they're not (yet), but that seems like another
> possible use-case.

We ought to use them in a few more places. E.g. nodeTidscan.c's sorting
would likely be faster if we used something of that kind. And, what'd
probably substantially beneficial, for the junk ctid columns - where
they're currently IIRC transported as a by-ref datum, even on 64bit
machines.

Mildly related: Is there a reason we don't optimize fixed-length !byval
datum copies for typlen < sizeof(Datum) to something better than a full
palloc?  I guess it'd be somewhat of a breaking change?

Greetings,

Andres Freund



Re: itemptr_encode/itemptr_decode

От
Peter Geoghegan
Дата:
On Wed, Apr 17, 2019 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As for the general usability argument, I'm not sure --- as we start
> to look at alternate AMs, we might have more use for them.  When I first
> saw the functions, I thought maybe they were part of sort acceleration
> for TIDs; evidently they're not (yet), but that seems like another
> possible use-case.

There is also your join-or-to-union patch, which I thought might make
use of this for its TID sort.

Maybe it would make sense to put this infrastructure in tuplesort.c,
but probably not. TIDs are 6 bytes, which as you once pointed out, is
not something that we have appropriate infrastructure for (there isn't
a DatumGet*() macro, and so on). The encoding scheme (which you
originally suggested as an alternative to my first idea, sort support
for item pointers) works particularly well as these things go -- it
was about 3x faster when everything fit in memory, and faster still
with external sorts. It allowed us to resolve comparisons at the
SortTuple level within tuplesort.c, but also allowed tuplesort.c to
use the pass-by-value datum qsort specialization. It even allowed
sorted array entries (TIDs/int8s) to be fetched without extra pointer
chasing -- that can be a big bottleneck these days.

The encoding scheme is a bit ugly, but I suspect it would be simpler
to stick to the same approach elsewhere than to try and hide all the
details within tuplesort.c, or something like that. Unless we're
willing to treat TIDs as a whole new type of tuple with its own set of
specialized functions in tuplesort.c, which has problems of its own,
then it's kind of awkward to do it some other way.


--
Peter Geoghegan