Re: Bloom Indexes - bit array length and the total number of bits(or hash functions ?? ) !
От | Fabien COELHO |
---|---|
Тема | Re: Bloom Indexes - bit array length and the total number of bits(or hash functions ?? ) ! |
Дата | |
Msg-id | alpine.DEB.2.21.1906100758230.9922@lancre обсуждение исходный текст |
Ответ на | Re: Bloom Indexes - bit array length and the total number of bits (orhash functions ?? ) ! (Avinash Kumar <avinash.vallarapu@gmail.com>) |
Список | pgsql-hackers |
> But the 2 direct questions i have are : > > 1. What is the structure of the Bloom Index ? Can you please let me know > what are the fields of a Bloom Index ? Is it just the Item Pointer > and BloomSignatureWord ? I'm not sure of Postgres actual implementation, I have just looked at the underlying hash functions implementation. > When i describe my bloom index it looks like following. > > postgres=# \d+ foo.idx_bloom_bar > Index "foo.idx_bloom_bar" > Column | Type | Key? | Definition | Storage | Stats target > ---------+---------+------+------------+---------+-------------- > id | integer | yes | id | plain | > dept | integer | yes | dept | plain | > id2 | integer | yes | id2 | plain | > id3 | integer | yes | id3 | plain | > id4 | integer | yes | id4 | plain | > id5 | integer | yes | id5 | plain | > id6 | integer | yes | id6 | plain | > zipcode | integer | yes | zipcode | plain | > bloom, for table "foo.bar" The bloom index associates a signature, i.e. a bitfield the size of which is specified by the parameter "length", to the tuple location. The bitfield is computed by hashing the value of columns which are listed above in the index definition. The many values are somehow compressed into the small signature. > Options: length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, > col8=4 I doubt that these parameters are good. The is no point to include a unique column into a bloom index. If you look at my blog, the number of bits associated to each field should depend on the expected selectivity, which depends on the entropy of each field and the signature size. The column entropy can be computed with a query. > 2. If we are storing just one signature word per row, how is this > aggregated for all column values of that row into one signature in high > level ? The aggregation, if performed, is not very useful in practice because it can only be effective on a few (first) bits, which are randomly computed anyway and the value of the query is not likely to hit them. Fundamentally all bitfields are scanned to extract which tuples are possibly of interest, i.e. are not excluded by the index. The "full scan" of the bloom index is not a bad thing if it is much smaller than the table itself. > For example, if length = 64, does it mean that a bit array of 64 bits is > generated per each row ? Yes. > If col1=4, does it mean the value of col1 is passed to 4 hash functions > that generate 4 positions that can be set to 1 in the bit array ? Yes. Try to apply the formula in the blog to see what you get for your parameters, but it is likely to be < 4. -- Fabien.
В списке pgsql-hackers по дате отправления: