Here comes part 2 of 2 of the review.
inet-gist
---------
In general the code looks good and I think your approach makes sense.
Not an expert on GiST though so I would like a second opinion on this.
Maybe there is some clever trick which would make the index more
efficient, but the design I see is simple and logical. Like this much
more than the incorrect GiST index for inet in btree_gist.
The only thing is that I think your index would do poorly on the case
where all values share a prefix before the netmask but have different
values after the netmask, since gist_union and gist_penalty does not
care about the bits after the netmask. Am I correct? Have you thought
anything about this?
To create such data:
CREATE TABLE a (ip inet);
INSERT INTO a SELECT '127.0.0.0/16'::inet + s FROM generate_series(1,
pow(2, 16)::int) s;
CREATE INDEX aidx ON a USING gist (ip);
Saw also some minor things too.
Typo in comment, weird sentence:
* The main question to calculate the union is that they have how many * bits in common. [...]
I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.
Typo in comment, "reverse" should be "inverse":
* Penalty is reverse of the common IP bits of the two addresses.
Typo in comment, missing "be":
* Values bigger than 1 are used when the common IP bits cannot * calculated.
Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.
* The second and the important way is to split by the union of the keys. * Union of the keys calculated same way with
theinet_gist_union function. * The first and the last biggest subnets created from the calculated * union. In this case
addressescontained by the first subnet will be put * to the left bucket, address contained by the last subnet will be
putto * the right bucket.
Could you not just use memcmp in inet_gist_same() instead of bitncmp()
since it only cares about equality?
There is an extra empty line at the end of network_gist.c
inet-selfuncs
-------------
Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and
the general idea of looking at the histogram is correct. I am just have
no idea if the details are right.
DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.
Typo in comment, "constrant" -> "constant".
There is an extra empty line at the end of network_selfuncs.c
--
Andreas Karlsson