Обсуждение: [HACKERS] GiST API Adancement

Поиск
Список
Период
Сортировка

[HACKERS] GiST API Adancement

От
Yuan Dong
Дата:

Hi hackers,

I want to hack a little. I applied for GSoC project (https://wiki.postgresql.org/wiki/GSoC_2017#GiST_API_advancement), but now I'm going to hack on my own. With the help of Andrew Borodin, I want to start the project with adding a third state to collision check. The third state is that:
 subtree is totally within the query. In this case, GiST scan can scan down subtree without key checks.

After reading some code, I get this plan to modify the code:

1. Modify the consistent function of datatypes supported by GiST. 

1.1 Start with cube, g_cube_consistent should return a third state when calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two solutions, 1)modify return value, 2) modify a reference type parameter. 

2. Modify the gistindex_keytest in gistget.c to return multiple states 

2.1 Need declare a new enum type(or define values) in gist_private.h or somewhere 

3. Modify the gitsScanPage in gistget.c to deal with the extra state 

4. Add a state to mark the nodes under this GISTSearchItem are all with in the query 

4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2) Somewhere else to record all this kind of items 

4.2 We may use the block number as the key 

4.3 Next time when the gistScanPage met this item, just find all the leaves directly(need a new function)

After this, I shall start to benchmark the improvement and edit the code of other datatypes.

Hope you hackers can give me some suggestions~

--
Dong



Re: [HACKERS] GiST API Adancement

От
Andrew Borodin
Дата:
Hi, Dong!

2017-06-15 21:19 GMT+05:00 Yuan Dong <Doffery@hotmail.com>:
> I'm going to hack on my own. With the help of Andrew Borodin, I want to
> start the project with adding a third state to collision check. The third
> state is that:
>  subtree is totally within the query. In this case, GiST scan can scan down
> subtree without key checks.

That's fine!

> After reading some code, I get this plan to modify the code:
>
> 1. Modify the consistent function of datatypes supported by GiST.
>
> 1.1 Start with cube, g_cube_consistent should return a third state when
> calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two
> solutions, 1)modify return value, 2) modify a reference type parameter.
>
> 2. Modify the gistindex_keytest in gistget.c to return multiple states
>
> 2.1 Need declare a new enum type(or define values) in gist_private.h or
> somewhere
>
> 3. Modify the gitsScanPage in gistget.c to deal with the extra state
>
> 4. Add a state to mark the nodes under this GISTSearchItem are all with in
> the query
>
> 4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2)
> Somewhere else to record all this kind of items
>
> 4.2 We may use the block number as the key
>
> 4.3 Next time when the gistScanPage met this item, just find all the leaves
> directly(need a new function)
>
> After this, I shall start to benchmark the improvement and edit the code of
> other datatypes.
>
> Hope you hackers can give me some suggestions~

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

Though for benchmarking purposes backward compatibility can be omitted.

Best regards, Andrey Borodin



[HACKERS] 答复: GiST API Adancement

От
Yuan Dong
Дата:

Hi Andrey,


Thank you for your suggestion.

 

I should still maintain original API of GiST after modification.


--

Dong




发件人: Andrew Borodin <borodin@octonica.com>
发送时间: 2017年6月16日 6:24:03
收件人: Yuan Dong
抄送: pgsql-hackers@postgresql.org
主题: Re: GiST API Adancement
 
Hi, Dong!

2017-06-15 21:19 GMT+05:00 Yuan Dong <Doffery@hotmail.com>:
> I'm going to hack on my own. With the help of Andrew Borodin, I want to
> start the project with adding a third state to collision check. The third
> state is that:
>  subtree is totally within the query. In this case, GiST scan can scan down
> subtree without key checks.

That's fine!

> After reading some code, I get this plan to modify the code:
>
> 1. Modify the consistent function of datatypes supported by GiST.
>
> 1.1 Start with cube, g_cube_consistent should return a third state when
> calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two
> solutions, 1)modify return value, 2) modify a reference type parameter.
>
> 2. Modify the gistindex_keytest in gistget.c to return multiple states
>
> 2.1 Need declare a new enum type(or define values) in gist_private.h or
> somewhere
>
> 3. Modify the gitsScanPage in gistget.c to deal with the extra state
>
> 4. Add a state to mark the nodes under this GISTSearchItem are all with in
> the query
>
> 4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2)
> Somewhere else to record all this kind of items
>
> 4.2 We may use the block number as the key
>
> 4.3 Next time when the gistScanPage met this item, just find all the leaves
> directly(need a new function)
>
> After this, I shall start to benchmark the improvement and edit the code of
> other datatypes.
>
> Hope you hackers can give me some suggestions~

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

Though for benchmarking purposes backward compatibility can be omitted.

Best regards, Andrey Borodin

Re: [HACKERS] 答复: GiST API Adancement

От
Tom Lane
Дата:
Yuan Dong <Doffery@hotmail.com> writes:
> ·¢¼þÈË: Andrew Borodin <borodin@octonica.com>
>> I think there is one more aspect of development: backward
>> compatibility: it's impossible to update all existing extensions. This
>> is not that major feature to ignore them.

> I should still maintain original API of GiST after modification.

To put a little context on that: we have always felt that it's okay
to require extension authors to make small adjustments when going to
a new major release.  For instance, adding a new parameter to a
globally visible function is fine, especially if callers can just
pass NULL or some such to get the old behavior.  So in the context
here, you shouldn't feel compelled to come up with a bizarre API
design just to preserve exact compatibility of old code.  You should
indeed think about reducing the amount of work that extension
authors have to do to update, but that doesn't have to mean "zero".
Also, it's wise to make sure that any places where code changes
have to be made will result in compile errors if the change isn't
made.
        regards, tom lane



[HACKERS] Re: [HACKERS] 答复: GiST API Adancement

От
Andrew Borodin
Дата:
2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Yuan Dong <Doffery@hotmail.com> writes:
>> ·¢¼þÈË: Andrew Borodin <borodin@octonica.com>
>>> I think there is one more aspect of development: backward
>>> compatibility: it's impossible to update all existing extensions. This
>>> is not that major feature to ignore them.
>
>> I should still maintain original API of GiST after modification.
>
> To put a little context on that: we have always felt that it's okay
> to require extension authors to make small adjustments when going to
> a new major release.

Then maybe it worth to consider more general API advancement at once?
Here's my wish list:
1. Choose subtree function as an alternative for penalty
2. Bulk load support
3. Better split framework: now many extensions peek two seeds and
spread split candidates among them, some with subtle but noisy
mistakes
4. Better way of key compares (this thread, actually)
5. Split in many portions

Best regards, Andrey Borodin.



[HACKERS] 答复: [HACKERS] 答复: GiST API Adancement

От
Yuan Dong
Дата:

2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Yuan Dong <Doffery@hotmail.com> writes:
>> ·¢¼þÈË: Andrew Borodin <borodin@octonica.com>
>>> I think there is one more aspect of development: backward
>>> compatibility: it's impossible to update all existing extensions. This
>>> is not that major feature to ignore them.
>
>> I should still maintain original API of GiST after modification.
>
> To put a little context on that: we have always felt that it's okay
> to require extension authors to make small adjustments when going to
> a new major release.


Tom's comment is very helpful, I will keep it in mind.


>Then maybe it worth to consider more general API advancement at once?
>Here's my wish list:
>1. Choose subtree function as an alternative for penalty
>2. Bulk load support
>3. Better split framework: now many extensions peek two seeds and
>spread split candidates among them, some with subtle but noisy
>mistakes
>4. Better way of key compares (this thread, actually)
>5. Split in many portions

Okay, I would try to do more advancements step by step.

I do not understand the problem of choosing subtree as an alternative for penalty. What do yo mean by choosing subtree function?


There will be more questions coming, hope you will be comfortable with that~


Best wishes.

--

Dong



发件人: Andrew Borodin <borodin@octonica.com>
发送时间: 2017年6月16日 16:48:04
收件人: Tom Lane
抄送: Yuan Dong; pgsql-hackers@postgresql.org
主题: Re: [HACKERS] 答复: GiST API Adancement
 
2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:
> Yuan Dong <Doffery@hotmail.com> writes:
>> ·¢¼þÈË: Andrew Borodin <borodin@octonica.com>
>>> I think there is one more aspect of development: backward
>>> compatibility: it's impossible to update all existing extensions. This
>>> is not that major feature to ignore them.
>
>> I should still maintain original API of GiST after modification.
>
> To put a little context on that: we have always felt that it's okay
> to require extension authors to make small adjustments when going to
> a new major release.

Then maybe it worth to consider more general API advancement at once?
Here's my wish list:
1. Choose subtree function as an alternative for penalty
2. Bulk load support
3. Better split framework: now many extensions peek two seeds and
spread split candidates among them, some with subtle but noisy
mistakes
4. Better way of key compares (this thread, actually)
5. Split in many portions

Best regards, Andrey Borodin.