On Mon, Sep 2, 2019 at 6:53 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attach is v10, which fixes the Valgrind issue.
Attached is v11, which makes the kill_prior_tuple optimization work
with posting list tuples. The only catch is that it can only work when
all "logical tuples" within a posting list are known-dead, since of
course there is only one LP_DEAD bit available for each posting list.
The hardest part of this kill_prior_tuple work was writing the new
_bt_killitems() code, which I'm still not 100% happy with. Still, it
seems to work well -- new pageinspect LP_DEAD status info was added to
the second patch to verify that we're setting LP_DEAD bits as needed
for posting list tuples. I also had to add a new nbtree-specific,
posting-list-aware version of index_compute_xid_horizon_for_tuples()
-- _bt_compute_xid_horizon_for_tuples(). Finally, it was necessary to
avoid splitting a posting list with the LP_DEAD bit set. I took a
naive approach to avoiding that problem, adding code to
_bt_findinsertloc() to prevent it. Posting list splits are generally
assumed to be rare, so the fact that this is slightly inefficient
should be fine IMV.
I also refactored deduplication itself in anticipation of making the
WAL logging more efficient, and incremental. So, the structure of the
code within _bt_dedup_one_page() was simplified, without really
changing it very much (I think). I also fixed a bug in
_bt_dedup_one_page(). The check for dead items was broken in previous
versions, because the loop examined the high key tuple in every
iteration.
Making _bt_dedup_one_page() more efficient and incremental is still
the most important open item for the patch.
--
Peter Geoghegan