Re: GiST support for inet datatypes

Поиск
Список
Период
Сортировка
От Emre Hasegeli
Тема Re: GiST support for inet datatypes
Дата
Msg-id CAE2gYzzgY96OPrU7nn2jrh2G0_57Qj_XZRkrVyOaGVwRv+Qarg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GiST support for inet datatypes  (Andreas Karlsson <andreas@proxel.se>)
Ответы Re: GiST support for inet datatypes  (Andreas Karlsson <andreas@proxel.se>)
Список pgsql-hackers
2014/1/20 Andreas Karlsson <andreas@proxel.se>:
> Here comes part 2 of 2 of the review.

Second versions of the patches attached. They address both of your reviews.

>
> inet-gist
> ---------
>
> In general the code looks good and I think your approach makes sense. Not an
> expert on GiST though so I would like a second opinion on this. Maybe there
> is some clever trick which would make the index more efficient, but the
> design I see is simple and logical. Like this much more than the incorrect
> GiST index for inet in btree_gist.

I realized that create extension btree_gist fails with the patch.

ERROR:  could not make operator class "gist_inet_ops" be default for type inet
DETAIL:  Operator class "inet_ops" already is the default.

I think making the new operator class default makes more sense. Name conflict
can be a bigger problem. Should I rename the new operator class?

>
> The only thing is that I think your index would do poorly on the case where
> all values share a prefix before the netmask but have different values after
> the netmask, since gist_union and gist_penalty does not care about the bits
> after the netmask. Am I correct? Have you thought anything about this?

Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
but I could not make it work with the cidr type. It can be done by adding
operator as a parameter for union, penalty and picksplit functions on the
GiST framework. I am not sure it worths the trouble. It would only help
the basic comparison operators and it would slow down other operators because
network length comparison before network address would not be possible for
the intermediate nodes. I mentioned this behavior of the operator class on
the paragraph added to the documentation.

> Saw also some minor things too.
>
> Typo in comment, weird sentence:
>
>  * The main question to calculate the union is that they have how many
>  * bits in common. [...]

I rewrite it. I am sorry about my English.

>
> I do not like how you called the result key i inet_union_gist() "tmp".
> Something like "result" or "result_ip" would be better.
>
> Typo in comment, "reverse" should be "inverse":
>
>  * Penalty is reverse of the common IP bits of the two addresses.
>
> Typo in comment, missing "be":
>
>  * Values bigger than 1 are used when the common IP bits cannot
>  * calculated.
>
> Language can be improved in this comment. Both ways to split are by the
> union of the keys, it is more of a split by address prefix.
>
>  * The second and the important way is to split by the union of the keys.
>  * Union of the keys calculated same way with the inet_gist_union function.
>  * The first and the last biggest subnets created from the calculated
>  * union. In this case addresses contained by the first subnet will be put
>  * to the left bucket, address contained by the last subnet will be put to
>  * the right bucket.
>
> Could you not just use memcmp in inet_gist_same() instead of bitncmp() since
> it only cares about equality?
>
> There is an extra empty line at the end of network_gist.c

I changed them.

>
> inet-selfuncs
> -------------
>
> Here I have to honestly admit I am not sure if I can tell if your
> selectivity function is correct. Your explanation sounds convincing and the
> general idea of looking at the histogram is correct. I am just have no idea
> if the details are right.

I tried to improve it in this version. I hope it is more readable now. I used
the word inclusion instead of overlap, made some changes on the algorithm
to make it more suitable to the other operators.

Using the histogram for inclusion which is originally for basic comparison,
is definitely not correct. It is an empirical approach. I have tried several
versions of it, tested them with different data sets and thought it is better
than nothing.

>
> DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
> DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.
>
> Typo in comment, "constrant" -> "constant".
>
> There is an extra empty line at the end of network_selfuncs.c

I changed them.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Add %z support to elog/ereport?
Следующее
От: "MauMau"
Дата:
Сообщение: Re: [bug fix] pg_ctl always uses the same event source