Re: B-Tree support function number 3 (strxfrm() optimization)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: B-Tree support function number 3 (strxfrm() optimization)
Дата
Msg-id CAM3SWZSoE3Do7Edm07xpSDL6soYx1yYQ1K5G5=jmRXwARFGxYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: B-Tree support function number 3 (strxfrm() optimization)  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Re: B-Tree support function number 3 (strxfrm() optimization)  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-hackers
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> The comment in tuplesort_begin_datum that abbreviation can't be used
> seems wrong to me; why is the copy of the original value pointed to by
> stup->tuple (in the case of by-reference types, and abbreviation is
> obviously not needed for by-value types) not sufficient?

We haven't formalized the idea that pass-by-value types are not
targets for abbreviation (it's just that the practical application of
abbreviated keys is likely to be limited to pass-by-reference types,
generating a compact pass-by-value abbreviated representation). That
could be a useful restriction to formalize, and certainly seems likely
to be a harmless one, but for now that's the way it is.

It might be sufficient for some tuplesort_begin_datum() callers. Datum
tuple sorts require the original values. Aside from the formalization
of abbreviation only applying to pass-by-value types, you'd have to
teach tuplesort_getdatum() to reconstruct the non-abbreviated
representation transparently from each SortTuple's "tuple proper".
However, the actual tuplesort_getdatum() calls could be the dominant
cost, not the sort  (I'm not sure of that offhand - that's total
speculation).

Basically, the intersection of the datum sort case with abbreviated
keys seems complicated. I tended to think that the solution was to
force a heaptuple sort instead (where abbreviation naturally can be
used), since clearly that could work in some important cases like
nodeAgg.c, iff the gumption to do it that way was readily available.
Rightly or wrongly, I preferred that idea to the idea of teaching the
Datum case to handle abbreviation across the board. Maybe that's the
wrong way of fixing that, but for now I don't think it's acceptable
that abbreviation isn't always used in certain cases where it could
make sense (e.g. not for simple GroupAggregates with a single
attribute -- only multiple attribute GroupAggregates). After all, most
sort cases (e.g. B-Tree builds) didn't use SortSupport for several
years, simply because no one got around to it until I finally did a
few months back.

Note that most tuplesort non-users of abbreviation don't use
abbreviation for sensible reasons. For example, abbreviation simply
doesn't make sense for Top-N heap sorts, or MJExamineQuals(). The
non-single-attribute GroupAggregate/nodeAgg.c case seems bad, but I
don't have a good sense of how bad things are with orderedsetaggs.c
non-use is...it might matter less than the other case.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: New CF app deployment
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: pgaudit - an auditing extension for PostgreSQL