GiST seems to drop left-branch leaf tuples

Поиск
Список
Период
Сортировка
От Peter Tanski
Тема GiST seems to drop left-branch leaf tuples
Дата
Msg-id 5D31F5EB-A473-4176-B4BD-77DF184409FD@raditaz.com
обсуждение исходный текст
Ответы Re: GiST seems to drop left-branch leaf tuples  (Peter Tanski <ptanski@raditaz.com>)
Re: GiST seems to drop left-branch leaf tuples  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
I have been working on a plugin for GiST that has some unusual features:

* The data type for both Node and Leaf keys is large (typically 4222  
bytes on 32-bit; 4230 bytes on 64-bit).

* Due to the large size the storage class is EXTENDED (main would only  
degrade to EXTENDED in any case).  Tests using EXTERNAL show the same  
results so I do not believe compression is an issue.

* This is a hash-type index: the values are large hashes for an audio  
fingerprinting application and the sole relationship between keys is a  
float match probability between two values.  For example, if A matches  
B with a score of 0.18 but A matches C with a score of 0.61, A is  
closer to C than to B.  The distance metric is calculated in reverse  
(1.0 - match_score) so the penalty for inserting A under the same Node  
as B would be 0.82 (scaled by 1000 to 820).

* The Leaf Keys contain the same data as the rows themselves.

* The Node (union) Keys are:
  - a bitwise OR of the lower-level Leaf Keys
  - a penalty or consistency comparison of Leaf or Query (L) value  
against a Node union-key (N) is effectively a scaled hamming distance  
of:      L ^ (L & N)    so if L is under N the check will always return 1.0 (true for  
consistency; 0.0 for penalty);    by the same operation, newly inserted values compare closer to  
Node (union) keys where the corresponding Leaf keys are closer
  - Comparisons between two Nodes, N1 and N2 (used in Same(), for  
example)  have used a:    -- Tanimoto bit distance popcount(N1 & N2) / popcount(N1 | N2);    -- the same scaled-hamming
distancecheck used for a Leaf against  
 
another Leaf;    -- simple memcmp for identity.

Whatever test I use for Same(),  Penalty() and Consistent() does not  
seem to affect the problem significantly.  For now I am only using  
Consistent() as a check for retrieval.

I have developed this under debug source-builds of postgresql 8.4.5  
and 9.0.1 on Mac OS X 10.6 (Snow Leopard, x86 64-bit) and Ubuntu Linux  
10.04 (Lucid, x86 64-bit and 32-bit).  There is no difference between  
the platforms or architectures.  I am using the standard PostgreSQL  
Makefile build setup so compiler flags are the same as used by  
PostgreSQL.

The problem may be my own understanding of Picksplit(), Penalty() and  
Same().  Using gevel, on a table with 500 newly-inserted rows:

postgres=# \d fps2           Table "public.fps2"   Column    |     Type      | Modifiers
-------------+---------------+----------- soid        | character(24) | not null fingerprint | fprint        | not
null
Indexes:    "fps2_fingerprint_ix" gist (fingerprint)

postgres=# select gist_stat('fps2_fingerprint_ix');                gist_stat
----------------------------------------- Number of levels:          4           + Number of pages:           61
 + Number of leaf pages:      42          + Number of tuples:          193         + Number of invalid tuples:  0
   + Number of leaf tuples:     133         + Total size of tuples:      271704 bytes+ Total size of leaf tuples:
187236bytes+ Total size of index:       499712 bytes+
 


Note that there are only 133 leaf tuples -- for 500 rows.  If the  
index process were operating correctly, there should have been 500  
leaf tuples there.  If I REINDEX the table the number of leaf tuples  
may change slightly but not by much.  This closely corresponds to a  
query:

postgres=# select count(*) from fps2 f1 join fps2 f2 on f1.fingerprint  
= f2.fingerprint; count
-------   133

where = is a match operator that returns rows where the Leaf key  
comparison is > .98 (on the scaled probability score pretty much  
exactly equal).  The above query is using the index:

postgres=# explain select count(*) from fps2 f1 join fps2 f2 on  
f1.fingerprint ~= f2.fingerprint;                                           QUERY PLAN
------------------------------------------------------------------------------------------------ Aggregate
(cost=1173.67..1173.68rows=1 width=0)   ->  Nested Loop  (cost=0.00..1170.54 rows=1250 width=0)         ->  Seq Scan on
fps2f1  (cost=0.00..105.00 rows=500 width=32)         ->  Index Scan using fps2_fingerprint_ix on fps2 f2   
 
(cost=0.00..2.11 rows=2 width=32)               Index Cond: (f1.fingerprint ~= f2.fingerprint)


If I use a table scan instead of the index, the query would return:

postgres=# select count(*) from fps2 f1 join fps2 f2 on  
fprint_cmp(f1.fingerprint, f2.fingerprint) > .98; count
-------   500


The GiST tree looks like:

postgres=# select gist_tree('fps2_fingerprint_ix');                                          gist_tree
---------------------------------------------------------------------------------------------- 0(l:0) blk: 0 numTuple:
4free: 2532b(68.97%) rightlink:4294967295  
 
(InvalidBlockNumber)     +     1(l:1) blk: 37 numTuple: 2 free: 5340b(34.56%) rightlink:105  
(OK)                       +         1(l:2) blk: 90 numTuple: 3 free: 3936b(51.76%) rightlink:109  
(OK)                   +             1(l:3) blk: 9 numTuple: 5 free: 1128b(86.18%) rightlink: 
94 (OK)                 +             2(l:3) blk: 114 numTuple: 5 free: 1128b(86.18%)  
rightlink:89 (OK)               +             3(l:3) blk: 106 numTuple: 5 free: 1128b(86.18%)  
rightlink:114 (OK)              +         2(l:2) blk: 109 numTuple: 3 free: 3936b(51.76%) rightlink:24  
(OK)                   +             1(l:3) blk: 55 numTuple: 5 free: 1128b(86.18%) rightlink: 
45 (OK)                +             2(l:3) blk: 94 numTuple: 1 free: 6744b(17.35%) rightlink: 
121 (OK)               +             3(l:3) blk: 121 numTuple: 5 free: 1128b(86.18%)  
rightlink:108 (OK)              +     2(l:1) blk: 38 numTuple: 4 free: 2532b(68.97%) rightlink:77  
(OK)                        +         1(l:2) blk: 84 numTuple: 1 free: 6744b(17.35%) rightlink:120  
(OK)                   +             1(l:3) blk: 88 numTuple: 4 free: 2532b(68.97%) rightlink: 
110 (OK)               +         2(l:2) blk: 120 numTuple: 3 free: 3936b(51.76%) rightlink:42  
(OK)                   +             1(l:3) blk: 5 numTuple: 3 free: 3936b(51.76%) rightlink: 
88 (OK)                 +             2(l:3) blk: 110 numTuple: 4 free: 2532b(68.97%)  
rightlink:71 (OK)               +             3(l:3) blk: 72 numTuple: 4 free: 2532b(68.97%) rightlink: 
119 (OK)               +         3(l:2) blk: 96 numTuple: 5 free: 1128b(86.18%) rightlink:84  
(OK)                    +             1(l:3) blk: 93 numTuple: 4 free: 2532b(68.97%) rightlink: 
95 (OK)                +             2(l:3) blk: 87 numTuple: 1 free: 6744b(17.35%) rightlink: 
99 (OK)                +             3(l:3) blk: 102 numTuple: 3 free: 3936b(51.76%)  
rightlink:93 (OK)               +             4(l:3) blk: 91 numTuple: 3 free: 3936b(51.76%) rightlink: 
102 (OK)               +             5(l:3) blk: 99 numTuple: 5 free: 1128b(86.18%) rightlink: 
91 (OK)                +         4(l:2) blk: 12 numTuple: 3 free: 3936b(51.76%) rightlink:96  
(OK)                    +             1(l:3) blk: 85 numTuple: 1 free: 6744b(17.35%) rightlink: 
98 (OK)                +             2(l:3) blk: 98 numTuple: 2 free: 5340b(34.56%) rightlink: 
125 (OK)               +             3(l:3) blk: 125 numTuple: 3 free: 3936b(51.76%)  
rightlink:87 (OK)               +     3(l:1) blk: 105 numTuple: 3 free: 3936b(51.76%) rightlink:38  
(OK)                       +         1(l:2) blk: 8 numTuple: 2 free: 5340b(34.56%) rightlink:70  
(OK)                     +             1(l:3) blk: 66 numTuple: 1 free: 6744b(17.35%) rightlink: 
97 (OK)                +             2(l:3) blk: 97 numTuple: 5 free: 1128b(86.18%) rightlink: 
67 (OK)                +         2(l:2) blk: 70 numTuple: 3 free: 3936b(51.76%) rightlink:104  
(OK)                   +             1(l:3) blk: 59 numTuple: 1 free: 6744b(17.35%) rightlink: 
115 (OK)               +             2(l:3) blk: 115 numTuple: 1 free: 6744b(17.35%)
rightlink:116 (OK)              +             3(l:3) blk: 116 numTuple: 4 free: 2532b(68.97%)  
rightlink:64 (OK)               +         3(l:2) blk: 46 numTuple: 5 free: 1128b(86.18%) rightlink:90  
(OK)                    +             1(l:3) blk: 65 numTuple: 5 free: 1128b(86.18%) rightlink: 
107 (OK)               +             2(l:3) blk: 112 numTuple: 4 free: 2532b(68.97%)  
rightlink:113 (OK)              +             3(l:3) blk: 107 numTuple: 5 free: 1128b(86.18%)  
rightlink:112 (OK)              +             4(l:3) blk: 113 numTuple: 1 free: 6744b(17.35%)  
rightlink:126 (OK)              +             5(l:3) blk: 126 numTuple: 4 free: 2532b(68.97%)  
rightlink:57 (OK)               +     4(l:1) blk: 77 numTuple: 5 free: 1128b(86.18%) rightlink: 
4294967295 (InvalidBlockNumber)+         1(l:2) blk: 53 numTuple: 5 free: 1128b(86.18%) rightlink:76  
(OK)                    +             1(l:3) blk: 81 numTuple: 2 free: 5340b(34.56%) rightlink: 
82 (OK)                +             2(l:3) blk: 92 numTuple: 4 free: 2532b(68.97%) rightlink: 
81 (OK)                +             3(l:3) blk: 82 numTuple: 1 free: 6744b(17.35%) rightlink: 
100 (OK)               +             4(l:3) blk: 100 numTuple: 4 free: 2532b(68.97%)  
rightlink:74 (OK)               +             5(l:3) blk: 73 numTuple: 3 free: 3936b(51.76%) rightlink: 
92 (OK)                +         2(l:2) blk: 24 numTuple: 1 free: 6744b(17.35%) rightlink:118  
(OK)                   +             1(l:3) blk: 39 numTuple: 4 free: 2532b(68.97%) rightlink: 
47 (OK)                +         3(l:2) blk: 118 numTuple: 4 free: 2532b(68.97%) rightlink:36  
(OK)                   +             1(l:3) blk: 34 numTuple: 4 free: 2532b(68.97%) rightlink: 
39 (OK)                +             2(l:3) blk: 47 numTuple: 1 free: 6744b(17.35%) rightlink: 
111 (OK)               +             3(l:3) blk: 111 numTuple: 1 free: 6744b(17.35%)  
rightlink:122 (OK)              +             4(l:3) blk: 122 numTuple: 4 free: 2532b(68.97%)  
rightlink:117 (OK)              +         4(l:2) blk: 7 numTuple: 1 free: 6744b(17.35%) rightlink:124  
(OK)                    +             1(l:3) blk: 54 numTuple: 1 free: 6744b(17.35%) rightlink: 
86 (OK)                +         5(l:2) blk: 124 numTuple: 3 free: 3936b(51.76%) rightlink:53  
(OK)                   +             1(l:3) blk: 80 numTuple: 4 free: 2532b(68.97%) rightlink: 
54 (OK)                +             2(l:3) blk: 79 numTuple: 2 free: 5340b(34.56%) rightlink: 
80 (OK)                +             3(l:3) blk: 86 numTuple: 4 free: 2532b(68.97%) rightlink: 
123 (OK)               +

There are no memory errors in the code and everything seems to work  
correctly but GiST seems to discard Leaf tuples.  In implementing  
Picksplit() I have followed the code in the contrib/intarray/ 
_int_gist.c, contrib/hstore/hstore_gist.c modules and used Guttman's  
poly-time split algorithm.  Note that a raw split (just split the page  
in half) does not seem to make a difference in the number of Leaf keys  
that are discarded.  The assignments are correct, such that on return  
from Picksplit() the GIST_SPLITVEC *v, contains something like:

v->spl_left = {1, 2, 3}
v->spl_nleft = 3
v->spl_ldatum = (Datum)union_of_1_2_3

v->spl_right = {4,5}
v->spl_nright = 2
v->spl_rdatum = (Datum)union_of_4_5

Since the default page size is 4, splits are generally when inserting  
a 5th element and a split is 3/2, left or right.  I used a simple  
example above but when using Guttman's poly-time split algorithm the  
output might be:

v->spl_left = {5,3}
v->spl_right = {2,4,1}

As I understand it, the union process for the parent Node key should  
"contain" each Leaf or Node key under it.  The bitwise OR and  
comparison by (L ^ (L & N)) shows that is the case.  By the same  
token, Same() on two Leaf keys should never return true (indeed,  
changing Same() to always return false makes no difference in testing).

I stepped through code from Picksplit() through gistplacetopage() in  
gcc but could not detect any errors or where the Leaf tuples may be  
discarded.  If you have any ideas or would like test code, please let  
me know.

Cheers,
Pete Tanski




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Extensions, this time with a patch
Следующее
От: Peter Tanski
Дата:
Сообщение: Re: GiST seems to drop left-branch leaf tuples