Re: Sparse bit set data structure

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Sparse bit set data structure
Дата
Msg-id 1035d8e6-cfd1-0c27-8902-40d8d45eb6e8@iki.fi
обсуждение исходный текст
Ответ на Re: Sparse bit set data structure  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: Sparse bit set data structure
Re: Sparse bit set data structure
Список pgsql-hackers
On 14/03/2019 17:37, Julien Rouhaud wrote:
> On Wed, Mar 13, 2019 at 8:18 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> I started to consider rewriting the data structure into something more
>> like B-tree. Then I remembered that I wrote a data structure pretty much
>> like that last year already! We discussed that on the "Vacuum: allow
>> usage of more than 1GB of work mem" thread [2], to replace the current
>> huge array that holds the dead TIDs during vacuum.
>>
>> So I dusted off that patch, and made it more general, so that it can be
>> used to store arbitrary 64-bit integers, rather than ItemPointers or
>> BlockNumbers. I then added a rudimentary form of compression to the leaf
>> pages, so that clusters of nearby values can be stored as an array of
>> 32-bit integers, or as a bitmap. That would perhaps be overkill, if it
>> was just to conserve some memory in GiST vacuum, but I think this will
>> turn out to be a useful general-purpose facility.
> 
> I had a quick look at it, so I thought first comments could be helpful.

Thanks!

> +       if (newitem <= sbs->last_item)
> +           elog(ERROR, "cannot insert to sparse bitset out of order");
> 
> Is there any reason to disallow inserting duplicates?  AFAICT nothing
> prevents that in the current code.  If that's intended, that probably
> should be documented.

Yeah, we could easily allow setting the last item again. It would be a 
no-op, though, which doesn't seem very useful. It would be useful to 
lift the limitation that the values have to be added in ascending order, 
but current users that we're thinking of don't need it. Let's add that 
later, if the need arises.

Or did you mean that the structure would be a "bag" rather than a "set", 
so that it would keep the duplicates? I don't think that would be good. 
I guess the vacuum code that this will be used in wouldn't care either 
way, but "set" seems like a more clean concept.

On 13/03/2019 21:18, I wrote:
> I'll do some more performance testing on this, to make sure it performs
> well enough on random lookups, to also replace VACUUM's dead item
> pointer array.

Turns out, it didn't perform very well for that use case. I tested with 
distributions where you have clusters of 1-200 integers, at 2^16 
intervals. That's close to the distribution of ItemPointers in a VACUUM, 
where you have 1-200 (dead) items per page, and the offset number is 
stored in the low 16 bits.  It used slightly less memory than the plain 
array of ItemPointers that we use today, but the code to use a bitmap at 
the leaf level hardly ever kicks in, because there just isn't ever 
enough set bits for that to win. In order to get the dense packing, it 
needs to be done at a much more fine-grained fashion.

So I rewrote the way the leaf nodes work, so that the leaf nodes no 
longer use a bitmap, but a simple array of items, like on internal 
nodes. To still get the dense packing, the leaf items are packed using 
an algorithm called Simple-8b, which can encode between 1-240 integers 
in a single 64-bit word, depending on how far the integers are from each 
other. That works much better, and actually makes the code simpler, too.

I renamed this thing to IntegerSet. That seems like a more accurate name 
than the "sparse bitset" that I used call it. There aren't any "bits" 
visible in the public interface of this, after all.

I improved the regression tests, so that it tests all the interface 
functions, and covers various corner-cases. It tests the set with 
different patterns of integers, and it can print the memory usage and 
execution times of adding values to the set, probing random values, and 
iterating through the set. That is a useful micro-benchmark. The speed 
of all the operations seem to be in the same ballpark as with a simple 
sorted array, but it uses much less memory.

I'm now pretty satisfied with this. Barring objections, I'll commit this 
in the next few days. Please review, if you have a chance.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: speeding up planning with partitions
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [survey] New "Stable" QueryId based on normalized query text