On 11/29/2013 09:54 AM, Antonin Houska wrote:
> On 11/29/2013 01:13 AM, Andreas Karlsson wrote:
>
>> When doing partial matching the code need to be able to return the union
>> of all TIDs in all the matching posting trees in TID order (to be able
>> to do AND and OR operations with multiple search keys later). It does
>> this by traversing them posting tree after posting tree and collecting
>> them all in a TIDBitmap which is later iterated over.
>
> I think it's not a plain union. My understanding is that - to evaluate a
> single key (typically array) - you first need to get all the TID streams
> for that key (i.e. one posting list/tree per element of the key array)
> and then iterate all these streams in parallel and 'merge' them using
> consistent() function. That's how I understand ginget.c:keyGetItem().
For partial matches the merging is done in two steps: first a simple
union of all the streams per key and then second merging those union
streams using the consistent() function.
It is the first step that can be lossy.
> So the problem of partial match is (IMO) that there can be too many TID
> streams to merge - much more than the number of elements of the key array.
Agreed.
--
Andreas Karlsson