Обсуждение: proposal: searching in array function - array_position
Hi
I am proposing a simple function, that returns a position of element in array.CREATE OR REPLACE FUNCTION array_position(anyarray, anyelement)
RETURNS int AS $$
DECLARE i int := 0;
BEGIN
FOREACH a IN ARRAY $1
LOOP
IF a = $1 THEN
RETURN i;
END IF;
i := i + 1;
END LOOP;
RETURN NULL;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
On 1/16/15 3:39 AM, Pavel Stehule wrote: > I am proposing a simple function, that returns a position of element in array. Yes please! > FUNCTION array_position(anyarray, anyelement) RETURNS int That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier forthe slice. This wouldn't be so bad if we had an easier way to extract subsets of an array, but even that is really ugly because whateveryou extract keeps the original number of dimensions. > Implementation is simple (plpgsql code) This would actually be written in C though, yes? Otherwise it's not really any better than just doing an extension... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 3:39 AM, Pavel Stehule wrote:I am proposing a simple function, that returns a position of element in array.
Yes please!FUNCTION array_position(anyarray, anyelement) RETURNS int
That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice.
It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array
array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to have N-1 dimension of first parameter */
array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]
another question is how to solve more than one occurrence on one value - probably two sets of functions - first returns first occurrence of value, second returns set of occurrence
This wouldn't be so bad if we had an easier way to extract subsets of an array, but even that is really ugly because whatever you extract keeps the original number of dimensions.Implementation is simple (plpgsql code)
This would actually be written in C though, yes? Otherwise it's not really any better than just doing an extension...
Sure, I expect a C implementation
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 11:16 AM, Pavel Stehule wrote: > > > 2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/16/15 3:39 AM, Pavel Stehule wrote: > > I am proposing a simple function, that returns a position of element in array. > > > Yes please! > > FUNCTION array_position(anyarray, anyelement) RETURNS int > > > That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifierfor the slice. > > > It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array > > array_position([1,2,3],2) --> 2 > array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to have N-1 dimension of first parameter */ The problem with that is you can't actually use '2' to get [2,3] back: select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL; ?column? ---------- t (1 row) I think the bigger problem here is we need something better than slices for handling subsets of arrays. Even if the functionreturned [2:2] it's still going to behave differently than it will in the non-array case because you won't be gettingthe expected number of dimensions back. :( > array_position_md([1,2,3],2) --> [2] > array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1] > > another question is how to solve more than one occurrence on one value - probably two sets of functions - first returnsfirst occurrence of value, second returns set of occurrence Gee, if only way had some way to return multiple elements of something... ;P In other words, I think all of these should actually return an array of positions. I think it's OK for someone that onlycares about the first instance to just do [1]. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 18:37 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 11:16 AM, Pavel Stehule wrote:
2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 3:39 AM, Pavel Stehule wrote:
I am proposing a simple function, that returns a position of element in array.
Yes please!
FUNCTION array_position(anyarray, anyelement) RETURNS int
That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice.
It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array
array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to have N-1 dimension of first parameter */
The problem with that is you can't actually use '2' to get [2,3] back:
select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
?column?
----------
t
(1 row)
yes, but when you are searching a array in array you can use a full slice selection:
postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a constant every time in this case -- so it should not be returned
array
---------
{{1,2}}
(1 row)
postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a constant every time in this case -- so it should not be returned
array
---------
{{1,2}}
(1 row)
I think the bigger problem here is we need something better than slices for handling subsets of arrays. Even if the function returned [2:2] it's still going to behave differently than it will in the non-array case because you won't be getting the expected number of dimensions back. :(
you cannot to return a slice and I don't propose it, although we can return a range type or array of range type - but still we cannot to use range for a arrays.
array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]
another question is how to solve more than one occurrence on one value - probably two sets of functions - first returns first occurrence of value, second returns set of occurrence
Gee, if only way had some way to return multiple elements of something... ;P
In other words, I think all of these should actually return an array of positions. I think it's OK for someone that only cares about the first instance to just do [1].
there can be two functions - "position" - returns first and "positions" returns all as a array
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 3:39 AM, Pavel Stehule wrote:I am proposing a simple function, that returns a position of element in array.
Yes please!FUNCTION array_position(anyarray, anyelement) RETURNS int
That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice.
theoretically you can use this function for md arrays too. This function returns offset, and you can calculate a Nd possition
so maybe better name -- array_offset or some similar
Regards
Pavel
This wouldn't be so bad if we had an easier way to extract subsets of an array, but even that is really ugly because whatever you extract keeps the original number of dimensions.Implementation is simple (plpgsql code)
This would actually be written in C though, yes? Otherwise it's not really any better than just doing an extension...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi
here is a proof concept of array_offset functionPavel
2015-01-16 19:12 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2015-01-16 18:37 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/16/15 11:16 AM, Pavel Stehule wrote:
2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 3:39 AM, Pavel Stehule wrote:
I am proposing a simple function, that returns a position of element in array.
Yes please!
FUNCTION array_position(anyarray, anyelement) RETURNS int
That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice.
It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array
array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to have N-1 dimension of first parameter */
The problem with that is you can't actually use '2' to get [2,3] back:
select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
?column?
----------
t
(1 row)yes, but when you are searching a array in array you can use a full slice selection:
postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a constant every time in this case -- so it should not be returned
array
---------
{{1,2}}
(1 row)
I think the bigger problem here is we need something better than slices for handling subsets of arrays. Even if the function returned [2:2] it's still going to behave differently than it will in the non-array case because you won't be getting the expected number of dimensions back. :(you cannot to return a slice and I don't propose it, although we can return a range type or array of range type - but still we cannot to use range for a arrays.array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]
another question is how to solve more than one occurrence on one value - probably two sets of functions - first returns first occurrence of value, second returns set of occurrence
Gee, if only way had some way to return multiple elements of something... ;P
In other words, I think all of these should actually return an array of positions. I think it's OK for someone that only cares about the first instance to just do [1].there can be two functions - "position" - returns first and "positions" returns all as a array
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
Hi
I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function.2015-01-17 23:43 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
RegardsIn this initial proof concept I used "IS NOT DISTINCT FROM" operator - but my opinion is not strong in this question. Both has some advantages and disadvantages.* used comparation "=" or "IS NOT DISTINCT FROM"possible question:Hihere is a proof concept of array_offset function
Pavel2015-01-16 19:12 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2015-01-16 18:37 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/16/15 11:16 AM, Pavel Stehule wrote:
2015-01-16 17:57 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 3:39 AM, Pavel Stehule wrote:
I am proposing a simple function, that returns a position of element in array.
Yes please!
FUNCTION array_position(anyarray, anyelement) RETURNS int
That won't work on a multi-dimensional array. Ideally it needs to accept a slice or an element and return the specifier for the slice.
It is question, what is a result - probably, there can be a multidimensional variant, where result will be a array
array_position([1,2,3],2) --> 2
array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter should to have N-1 dimension of first parameter */
The problem with that is you can't actually use '2' to get [2,3] back:
select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
?column?
----------
t
(1 row)yes, but when you are searching a array in array you can use a full slice selection:
postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a constant every time in this case -- so it should not be returned
array
---------
{{1,2}}
(1 row)
I think the bigger problem here is we need something better than slices for handling subsets of arrays. Even if the function returned [2:2] it's still going to behave differently than it will in the non-array case because you won't be getting the expected number of dimensions back. :(you cannot to return a slice and I don't propose it, although we can return a range type or array of range type - but still we cannot to use range for a arrays.array_position_md([1,2,3],2) --> [2]
array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]
another question is how to solve more than one occurrence on one value - probably two sets of functions - first returns first occurrence of value, second returns set of occurrence
Gee, if only way had some way to return multiple elements of something... ;P
In other words, I think all of these should actually return an array of positions. I think it's OK for someone that only cares about the first instance to just do [1].there can be two functions - "position" - returns first and "positions" returns all as a array
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
On 1/20/15 11:12 AM, Pavel Stehule wrote: > I am sending updated version - it allow third optional argument that specify where searching should to start. With it ispossible repeatably call this function. What happened to returning an array of offsets? I think that would be both easier to use than this version as well as performingbetter. I see you dropped multi-dimension support, but I think that's fine. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-20 21:32 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/20/15 11:12 AM, Pavel Stehule wrote:I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function.
What happened to returning an array of offsets? I think that would be both easier to use than this version as well as performing better.
I have still thinking about this idea. It needs a different function and I didn't start with this.
Implementation a optional start parameter to array_offset is cheap - and I am thinking so it can be useful for some use cases.
I see you dropped multi-dimension support, but I think that's fine.
It can be implemented later. There is no any barriers to later implementation.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi
with array_offsets - returns a array of offsets
Regardswith array_offsets - returns a array of offsets
2015-01-20 21:32 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/20/15 11:12 AM, Pavel Stehule wrote:I am sending updated version - it allow third optional argument that specify where searching should to start. With it is possible repeatably call this function.
What happened to returning an array of offsets? I think that would be both easier to use than this version as well as performing better.
I see you dropped multi-dimension support, but I think that's fine.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
On 1/24/15 2:48 AM, Pavel Stehule wrote: > with array_offsets - returns a array of offsets + <entry>returns a offset of first occurrence of some element in a array. It uses should be + <entry>returns the offset of the first occurrence of some element in an array. It uses + <entry>returns a array of offset of all occurrences some element in a array. It uses should be + <entry>returns an array of the offsets of all occurrences of some element in an array. It uses Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code? You should remove the array_length() from the last array_offsets test; I don't see that it buys anything. I think there should be tests for what happens when you feed these functions a multi-dimensional array. Other than that, looks good. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-26 23:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/24/15 2:48 AM, Pavel Stehule wrote:with array_offsets - returns a array of offsets
+ <entry>returns a offset of first occurrence of some element in a array. It uses
should be
+ <entry>returns the offset of the first occurrence of some element in an array. It uses
+ <entry>returns a array of offset of all occurrences some element in a array. It uses
should be
+ <entry>returns an array of the offsets of all occurrences of some element in an array. It uses
Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code?
I though about it - but there is different checks, different result processing, different result type.
I didn't find any readable and reduced form :(
You should remove the array_length() from the last array_offsets test; I don't see that it buys anything.
ok
I think there should be tests for what happens when you feed these functions a multi-dimensional array.
I can do it. Result should be expected - it searching row by row due MD format
Other than that, looks good.
Thank you
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/26/15 4:17 PM, Pavel Stehule wrote: > Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator cachingcode? > > > I though about it - but there is different checks, different result processing, different result type. > > I didn't find any readable and reduced form :( Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at least removea dozen lines... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-26 23:29 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/26/15 4:17 PM, Pavel Stehule wrote:Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code?
I though about it - but there is different checks, different result processing, different result type.
I didn't find any readable and reduced form :(
Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at least remove a dozen lines...
It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data.
Regards
Pavel
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/27/15 4:36 AM, Pavel Stehule wrote: > 2015-01-26 23:29 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/26/15 4:17 PM, Pavel Stehule wrote: > > Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operatorcaching code? > > > I though about it - but there is different checks, different result processing, different result type. > > I didn't find any readable and reduced form :( > > > Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at leastremove a dozen lines... > > > It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. dependshow we would to modify current API to support externally cached data. Externally cached data? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/27/15 4:36 AM, Pavel Stehule wrote:2015-01-26 23:29 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/26/15 4:17 PM, Pavel Stehule wrote:
Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code?
I though about it - but there is different checks, different result processing, different result type.
I didn't find any readable and reduced form :(
Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at least remove a dozen lines...
It is only partially identical - I would to use cache for array_offset, but it is not necessary for array_offsets .. depends how we would to modify current API to support externally cached data.
Externally cached data?
Some from these functions has own caches for minimize access to typcache (array_map, array_cmp is example). And in first case, I am trying to push these information from fn_extra, in second case I don't do it, because I don't expect a repeated call (and I am expecting so type cache will be enough).
I plan to do some benchmark to calculate if we have to do, or we can use type cache only.
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 28/01/15 08:15, Pavel Stehule wrote: > > > 2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/27/15 4:36 AM, Pavel Stehule wrote: > > > It is only partially identical - I would to use cache for > array_offset, but it is not necessary for array_offsets .. > depends how we would to modify current API to support externally > cached data. > > > Externally cached data? > > > Some from these functions has own caches for minimize access to typcache > (array_map, array_cmp is example). And in first case, I am trying to > push these information from fn_extra, in second case I don't do it, > because I don't expect a repeated call (and I am expecting so type cache > will be enough). > You actually do caching via fn_extra in both case and I think that's the correct way, and yes that part can be moved common function. I also see that the documentation does not say what is returned by array_offset if nothing is found (it's documented in code but not in sgml). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2015-02-22 3:00 GMT+01:00 Petr Jelinek <petr@2ndquadrant.com>:
On 28/01/15 08:15, Pavel Stehule wrote:
2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:
On 1/27/15 4:36 AM, Pavel Stehule wrote:
It is only partially identical - I would to use cache for
array_offset, but it is not necessary for array_offsets ..
depends how we would to modify current API to support externally
cached data.
Externally cached data?
Some from these functions has own caches for minimize access to typcache
(array_map, array_cmp is example). And in first case, I am trying to
push these information from fn_extra, in second case I don't do it,
because I don't expect a repeated call (and I am expecting so type cache
will be enough).
You actually do caching via fn_extra in both case and I think that's the correct way, and yes that part can be moved common function.
I also see that the documentation does not say what is returned by array_offset if nothing is found (it's documented in code but not in sgml).
rebased + fixed docs
Regards
Pavel
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On 22/02/15 12:19, Pavel Stehule wrote: > > > 2015-02-22 3:00 GMT+01:00 Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>>: > > On 28/01/15 08:15, Pavel Stehule wrote: > > > > 2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>>>: > > On 1/27/15 4:36 AM, Pavel Stehule wrote: > > > It is only partially identical - I would to use cache for > array_offset, but it is not necessary for array_offsets .. > depends how we would to modify current API to support > externally > cached data. > > > Externally cached data? > > > Some from these functions has own caches for minimize access to > typcache > (array_map, array_cmp is example). And in first case, I am trying to > push these information from fn_extra, in second case I don't do it, > because I don't expect a repeated call (and I am expecting so > type cache > will be enough). > > > You actually do caching via fn_extra in both case and I think that's > the correct way, and yes that part can be moved common function. > > I also see that the documentation does not say what is returned by > array_offset if nothing is found (it's documented in code but not in > sgml). > > > rebased + fixed docs > You still duplicate the type cache code, but many other array functions do that too so I will not hold that against you. (Maybe somebody should write separate patch that would put all that duplicate code into common function?) I think this patch is ready for committer so I am going to mark it as such. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 7, 2015 at 1:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > You still duplicate the type cache code, but many other array functions do > that too so I will not hold that against you. (Maybe somebody should write > separate patch that would put all that duplicate code into common function?) > > I think this patch is ready for committer so I am going to mark it as such. The documentation in this patch needs some improvements to the English. Can anyone help with that? The documentation should discuss what happens if the array is multi-dimensional. The code for array_offset and for array_offset_start appear to be byte-for-byte identical. There's no comment explaining why, but I bet it's to make the opr_sanity test pass. How about adding a comment? The comment for array_offset_common refers to array_offset_start as array_offset_startpos. I am sure in agreement with the idea that it would be good to factor out the common typecache code (for setting up my_extra). Any chance we get a preliminary patch that does that refactoring, and then rebase the main patch on top of it? I agree that it's not really this patch's job to solve that problem, but it would be nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/10/15 9:30 AM, Robert Haas wrote: > On Sat, Mar 7, 2015 at 1:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> You still duplicate the type cache code, but many other array functions do >> that too so I will not hold that against you. (Maybe somebody should write >> separate patch that would put all that duplicate code into common function?) >> >> I think this patch is ready for committer so I am going to mark it as such. > > The documentation in this patch needs some improvements to the > English. Can anyone help with that? I'll take a look at it. > The documentation should discuss what happens if the array is multi-dimensional. > > The code for array_offset and for array_offset_start appear to be > byte-for-byte identical. There's no comment explaining why, but I bet > it's to make the opr_sanity test pass. How about adding a comment? > > The comment for array_offset_common refers to array_offset_start as > array_offset_startpos. > > I am sure in agreement with the idea that it would be good to factor > out the common typecache code (for setting up my_extra). Any chance > we get a preliminary patch that does that refactoring, and then rebase > the main patch on top of it? I agree that it's not really this > patch's job to solve that problem, but it would be nice. Since this patch is here and ready to go I would prefer that we commit it and refactor later. I can tackle that unless Pavel specifically wants to. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-10 16:53 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 3/10/15 9:30 AM, Robert Haas wrote:On Sat, Mar 7, 2015 at 1:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:You still duplicate the type cache code, but many other array functions do
that too so I will not hold that against you. (Maybe somebody should write
separate patch that would put all that duplicate code into common function?)
I think this patch is ready for committer so I am going to mark it as such.
The documentation in this patch needs some improvements to the
English. Can anyone help with that?
I'll take a look at it.The documentation should discuss what happens if the array is multi-dimensional.
The code for array_offset and for array_offset_start appear to be
byte-for-byte identical. There's no comment explaining why, but I bet
it's to make the opr_sanity test pass. How about adding a comment?
The comment for array_offset_common refers to array_offset_start as
array_offset_startpos.
I am sure in agreement with the idea that it would be good to factor
out the common typecache code (for setting up my_extra). Any chance
we get a preliminary patch that does that refactoring, and then rebase
the main patch on top of it? I agree that it's not really this
patch's job to solve that problem, but it would be nice.
Since this patch is here and ready to go I would prefer that we commit it and refactor later. I can tackle that unless Pavel specifically wants to.
I'll look on this part this evening - but I don't have any idea how to find some common pattern yet. So I am with Jim - we can do it later.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-10 15:30 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
if (my_extra->element_type != element_type)
{
get_typlenbyvalalign(element_type,
&my_extra->typlen,
&my_extra->typbyval,
&my_extra->typalign);
my_extra->element_type = element_type;
}
On Sat, Mar 7, 2015 at 1:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> You still duplicate the type cache code, but many other array functions do
> that too so I will not hold that against you. (Maybe somebody should write
> separate patch that would put all that duplicate code into common function?)
>
> I think this patch is ready for committer so I am going to mark it as such.
The documentation in this patch needs some improvements to the
English. Can anyone help with that?
The documentation should discuss what happens if the array is multi-dimensional.
The code for array_offset and for array_offset_start appear to be
byte-for-byte identical. There's no comment explaining why, but I bet
it's to make the opr_sanity test pass. How about adding a comment?
The comment for array_offset_common refers to array_offset_start as
array_offset_startpos.
yes, it is a reason. I'll comment it.
I am sure in agreement with the idea that it would be good to factor
out the common typecache code (for setting up my_extra). Any chance
we get a preliminary patch that does that refactoring, and then rebase
the main patch on top of it? I agree that it's not really this
patch's job to solve that problem, but it would be nice.
The common part is following code:
my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
if (my_extra == NULL)
{
fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
sizeof(ArrayMetaState));
my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
my_extra->element_type = ~element_type;
}
my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
if (my_extra == NULL)
{
fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
sizeof(ArrayMetaState));
my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
my_extra->element_type = ~element_type;
}
and
if (my_extra->element_type != element_type)
{
get_typlenbyvalalign(element_type,
&my_extra->typlen,
&my_extra->typbyval,
&my_extra->typalign);
my_extra->element_type = element_type;
}
so we can design function like
(ArrayMetaState *)
GetCachedArrayMetaState(FunctionCallInfo fcinfo, Oid element_type, bool *reused)
{
{
ArrayMetaState *state = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
if (state == NULL)
{
fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
sizeof(ArrayMetaState));
state = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
state->element_type = ~element_type;
}
{
fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
sizeof(ArrayMetaState));
state = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
state->element_type = ~element_type;
}
if (state->element_type != element_type)
{
get_typlenbyvalalign(element_type,
&my_extra->typlen,
&my_extra->typbyval,
&my_extra->typalign);
state->element_type = element_type;
{
get_typlenbyvalalign(element_type,
&my_extra->typlen,
&my_extra->typbyval,
&my_extra->typalign);
state->element_type = element_type;
*resused = false;
}
else
*reused = true;
}
Usage in code:
array_state = GetCachedArrayMetaState(fceinfo, element_type, &reused);
if (!reused)
{
{
// some other initialization
}
The content is relatively clean, but the most big problem is naming (as usual)
Comments?
Regards
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2015-03-10 15:30 GMT+01:00 Robert Haas <robertmhaas@gmail.com>: >> I am sure in agreement with the idea that it would be good to factor >> out the common typecache code (for setting up my_extra). Any chance >> we get a preliminary patch that does that refactoring, and then rebase >> the main patch on top of it? I agree that it's not really this >> patch's job to solve that problem, but it would be nice. > The common part is following code: There is not all that much commonality; many functions don't bother to populate all of the ArrayMetaState fields because they don't need all of them. (The ones you quote don't, in fact.) You are either going to end up with a subroutine that does extra syscache lookups to populate the whole struct every time, or a confusing collection of ad-hoc subroutines. I'm not convinced that there's a lot of mileage to be gained here. regards, tom lane
2015-03-10 19:50 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2015-03-10 15:30 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
>> I am sure in agreement with the idea that it would be good to factor
>> out the common typecache code (for setting up my_extra). Any chance
>> we get a preliminary patch that does that refactoring, and then rebase
>> the main patch on top of it? I agree that it's not really this
>> patch's job to solve that problem, but it would be nice.
> The common part is following code:
There is not all that much commonality; many functions don't bother to
populate all of the ArrayMetaState fields because they don't need all of
them. (The ones you quote don't, in fact.) You are either going to end
up with a subroutine that does extra syscache lookups to populate the
whole struct every time, or a confusing collection of ad-hoc subroutines.
I'm not convinced that there's a lot of mileage to be gained here.
I have not good feeling about it too. If we would to enhance this are, we probably need a specific flinfo field and flags to specify more precious the context of cached informations. my_extra should be reserved for generic usage. But still there is relative big space for settings some less common fields like "proc".
With extra field in flinfo we can have interface like
bool set_flinfo_type_cache(fcinfo, type, flags);
and usage fcinfo->flinfo->typecache->typlen, ..
I agree with Robert, this can be nice, but it needs more time for design :(
Regards
Pavel
regards, tom lane
On Tue, Mar 10, 2015 at 3:43 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I have not good feeling about it too. If we would to enhance this are, we > probably need a specific flinfo field and flags to specify more precious the > context of cached informations. my_extra should be reserved for generic > usage. But still there is relative big space for settings some less common > fields like "proc". > > With extra field in flinfo we can have interface like > > bool set_flinfo_type_cache(fcinfo, type, flags); > and usage fcinfo->flinfo->typecache->typlen, .. > > I agree with Robert, this can be nice, but it needs more time for design :( OK. If I'm in the minority, I'll desist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/10/15 10:53 AM, Jim Nasby wrote: > On 3/10/15 9:30 AM, Robert Haas wrote: >> On Sat, Mar 7, 2015 at 1:06 PM, Petr Jelinek <petr@2ndquadrant.com> >> wrote: >>> You still duplicate the type cache code, but many other array >>> functions do >>> that too so I will not hold that against you. (Maybe somebody should >>> write >>> separate patch that would put all that duplicate code into common >>> function?) >>> >>> I think this patch is ready for committer so I am going to mark it as >>> such. >> >> The documentation in this patch needs some improvements to the >> English. Can anyone help with that? > > I'll take a look at it. > >> The documentation should discuss what happens if the array is >> multi-dimensional. This is the wording I have to describe what happens with a multi-dimensional array. I'm not thrilled with it; anyone have better ideas? Note: multi-dimensional arrays are squashed to one dimension before searching. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2/22/15 5:19 AM, Pavel Stehule wrote: > > > 2015-02-22 3:00 GMT+01:00 Petr Jelinek <petr@2ndquadrant.com > <mailto:petr@2ndquadrant.com>>: > > On 28/01/15 08:15, Pavel Stehule wrote: > > > > 2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com> > <mailto:Jim.Nasby@bluetreble.__com > <mailto:Jim.Nasby@bluetreble.com>>>: > > On 1/27/15 4:36 AM, Pavel Stehule wrote: > > > It is only partially identical - I would to use cache for > array_offset, but it is not necessary for array_offsets .. > depends how we would to modify current API to support > externally > cached data. > > > Externally cached data? > > > Some from these functions has own caches for minimize access to > typcache > (array_map, array_cmp is example). And in first case, I am trying to > push these information from fn_extra, in second case I don't do it, > because I don't expect a repeated call (and I am expecting so > type cache > will be enough). > > > You actually do caching via fn_extra in both case and I think that's > the correct way, and yes that part can be moved common function. > > I also see that the documentation does not say what is returned by > array_offset if nothing is found (it's documented in code but not in > sgml). > > > rebased + fixed docs I don't think we need both array_offset and array_offset_start; can't both SQL functions just call one C function? It might be worth combining the array and non-array versions of this, by having a _common function that accepts a boolean and then just run one or the other of the while loops. Most of the code seems to be shared between the two versions. What is this comment supposed to mean? There is no 'width_array'... * Biggest difference against width_array is unsorted input array. I've attached my doc changes, both alone and with the code. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
On 3/10/15 2:43 PM, Pavel Stehule wrote: > > There is not all that much commonality; many functions don't bother to > populate all of the ArrayMetaState fields because they don't need all of > them. (The ones you quote don't, in fact.) You are either going to end > up with a subroutine that does extra syscache lookups to populate the > whole struct every time, or a confusing collection of ad-hoc > subroutines. > I'm not convinced that there's a lot of mileage to be gained here. > > > I have not good feeling about it too. If we would to enhance this are, > we probably need a specific flinfo field and flags to specify more > precious the context of cached informations. my_extra should be reserved > for generic usage. But still there is relative big space for settings > some less common fields like "proc". > > With extra field in flinfo we can have interface like > > bool set_flinfo_type_cache(fcinfo, type, flags); > and usage fcinfo->flinfo->typecache->typlen, .. I'm not following what you intended there, but in any case I don't think we'd need to change all that much, because there's only 3 cases: 1) Get only the base type info 2) Get base type info and equality operator 3) Get IO info for one IOFunc Passing the function an enum (and perhaps keeping it in ArrayMetaState) would be enough to distinguish between the 3 cases. You'd also need to pass in IOFuncSelector. That said, this pattern with fn_extra is repeated a lot, even just in the backend (not counting contrib or extensions). It would be nice if there was generic support for this. decibel@decina:[17:15]~/pgsql/HEAD/src/backend (array_offset_v4 $)$pg_grep fn_extra|cut -d: -f1|uniq -c 14 access/gist/gistscan.c 7 executor/functions.c 1 executor/nodeWindowAgg.c 14 utils/adt/array_userfuncs.c 31 utils/adt/arrayfuncs.c 4 utils/adt/domains.c 2 utils/adt/enum.c 1 utils/adt/int.c 6 utils/adt/jsonfuncs.c 1 utils/adt/oid.c 4 utils/adt/orderedsetaggs.c 7 utils/adt/rangetypes.c 24 utils/adt/rowtypes.c 8 utils/adt/varlena.c (utils/fmgr/* doesn't count) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > That said, this pattern with fn_extra is repeated a lot, even just in > the backend (not counting contrib or extensions). It would be nice if > there was generic support for this. What do you mean by "generic support"? Most of those functions are doing quite different things with fn_extra --- granted, nearly all of them are caching something there, but I don't see very much that a "generic" infrastructure could bring to the table. regards, tom lane
On Tue, Mar 10, 2015 at 5:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > I don't think we need both array_offset and array_offset_start; can't both > SQL functions just call one C function? Not if you want the opr_sanity tests to pass. (But I'm seriously starting to wonder if that's actually a smart rule for us to be enforcing. It seems to be something of a pain in the neck, and I'm unclear as to whether it is preventing any real problem.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2015-03-11 2:57 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Tue, Mar 10, 2015 at 5:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> I don't think we need both array_offset and array_offset_start; can't both
> SQL functions just call one C function?
Not if you want the opr_sanity tests to pass.
(But I'm seriously starting to wonder if that's actually a smart rule
for us to be enforcing. It seems to be something of a pain in the
neck, and I'm unclear as to whether it is preventing any real
problem.)
It is simple protection against some oversights. I am not against this check - this rule cleans a interface between C and SQL. More, the additional C code is usually very short and trivial.
But it should be commented well.
Pavel
2015-03-10 22:53 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 2/22/15 5:19 AM, Pavel Stehule wrote:
2015-02-22 3:00 GMT+01:00 Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>>:
On 28/01/15 08:15, Pavel Stehule wrote:
2015-01-28 0:01 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>
<mailto:Jim.Nasby@bluetreble.__com
<mailto:Jim.Nasby@bluetreble.com>>>:
On 1/27/15 4:36 AM, Pavel Stehule wrote:
It is only partially identical - I would to use cache for
array_offset, but it is not necessary for array_offsets ..
depends how we would to modify current API to support
externally
cached data.
Externally cached data?
Some from these functions has own caches for minimize access to
typcache
(array_map, array_cmp is example). And in first case, I am trying to
push these information from fn_extra, in second case I don't do it,
because I don't expect a repeated call (and I am expecting so
type cache
will be enough).
You actually do caching via fn_extra in both case and I think that's
the correct way, and yes that part can be moved common function.
I also see that the documentation does not say what is returned by
array_offset if nothing is found (it's documented in code but not in
sgml).
rebased + fixed docs
I don't think we need both array_offset and array_offset_start; can't both SQL functions just call one C function?
There is a rule about unique mapping C functions to SQL space - and I don't think so this rule is bad.
It might be worth combining the array and non-array versions of this, by having a _common function that accepts a boolean and then just run one or the other of the while loops. Most of the code seems to be shared between the two versions.
What is this comment supposed to mean? There is no 'width_array'...
It is typo (I am sorry) - should be width_bucket(, array)
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e80252d424278abf65b624669c8e6b3fe8587cac
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e80252d424278abf65b624669c8e6b3fe8587cac
The code is similar, but it expect large **sorted** input. array_offset works on unsorted (alphabetical unsorted) data sets - like days of week ..
* Biggest difference against width_array is unsorted input array.
I've attached my doc changes, both alone and with the code.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 3/11/15 1:19 AM, Pavel Stehule wrote: > 2015-03-11 2:57 GMT+01:00 Robert Haas <robertmhaas@gmail.com > <mailto:robertmhaas@gmail.com>>: > > On Tue, Mar 10, 2015 at 5:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>> wrote: > > I don't think we need both array_offset and array_offset_start; can't both > > SQL functions just call one C function? > > Not if you want the opr_sanity tests to pass. > > (But I'm seriously starting to wonder if that's actually a smart rule > for us to be enforcing. It seems to be something of a pain in the > neck, and I'm unclear as to whether it is preventing any real > problem.) > > > It is simple protection against some oversights. I am not against this > check - this rule cleans a interface between C and SQL. More, the > additional C code is usually very short and trivial. > > But it should be commented well. Ahh, ok, makes more sense now. If the separate C functions are serving a purpose that's fine. I think the comment should mention it though, as it's not exactly the most obvious thing. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 3/10/15 5:25 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@bluetreble.com> writes: >> That said, this pattern with fn_extra is repeated a lot, even just in >> the backend (not counting contrib or extensions). It would be nice if >> there was generic support for this. > > What do you mean by "generic support"? Most of those functions are doing > quite different things with fn_extra --- granted, nearly all of them are > caching something there, but I don't see very much that a "generic" > infrastructure could bring to the table. At a glance, almost all the use under src/backend is doing some combination of 3 things: get_typlenbyvalalign(), get_type_io_data() or getting some operator for a type. This is most notable for records, arrays and ranges (though, records actually need an array of this info, so maybe we're out of luck there). That pattern exists outside of backend too; I think it's used in contrib, and I know at least one extension does this. So my thought was something that accepted fcinfo, IOFuncSelector, and TypeCacheEntry.flags. If IOFuncSelector was set, use get_type_io_data; else use get_typlenbyvalalign. If TypeCacheEntry.flags is set also look up the operator. Hmm... now that I look at it, the only use of get_type_io_data in src/backend seems to be to support arrays. Ranges make use of it too, but there's a comment in there that it returns more than what's needed. Even if a generic support function doesn't make sense, there's a lot of repeated fn_extra code for arrays and records. It would be good to at least refactor that like what was done for rangetypes. That process might provide a more obvious answer to how this could be done generically. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 3/11/15 1:29 AM, Pavel Stehule wrote: > > What is this comment supposed to mean? There is no 'width_array'... > > > It is typo (I am sorry) - should be width_bucket(, array) > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e80252d424278abf65b624669c8e6b3fe8587cac > > The code is similar, but it expect large **sorted** input. array_offset > works on unsorted (alphabetical unsorted) data sets - like days of week .. The functions are serving rather different purposes, so I'm not sure it's worth mentioning. If we do want to mention it, then something like the following should be added to *both* functions: * This code is similar to width_bucket() and * This code is similar to array_offset() Incidentally, isn't it bad that we're doing all these static assignments inside the loop in width_bucket? Or can we count on the compiler to recognize this? http://lnk.nu/github.com/1dvrr.c -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-11 22:14 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 3/11/15 1:29 AM, Pavel Stehule wrote:
What is this comment supposed to mean? There is no 'width_array'...
It is typo (I am sorry) - should be width_bucket(, array)
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e80252d424278abf65b624669c8e6b3fe8587cac
The code is similar, but it expect large **sorted** input. array_offset
works on unsorted (alphabetical unsorted) data sets - like days of week ..
The functions are serving rather different purposes, so I'm not sure it's worth mentioning. If we do want to mention it, then something like the following should be added to *both* functions:
ok, I removed this note.
I added comment about wrapping and I simplified a code there - this method is used more time in pg for same purposes.
Merged Jim's changes in doc
Pavel
* This code is similar to width_bucket()
and
* This code is similar to array_offset()
Incidentally, isn't it bad that we're doing all these static assignments inside the loop in width_bucket? Or can we count on the compiler to recognize this?
http://lnk.nu/github.com/1dvrr.c
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
On 3/11/15 4:37 PM, Pavel Stehule wrote: + /* + * array_offset - returns the offset of a value in an array (array_offset and + * array_offset_start are wrappers for safe call (look on opr_sanity test) a + * array_offset_common function. + * + * Returns NULL when value is not found. It uses a "NOT DISTINCT FROM" operator + * for comparation to be safe against NULL. + */ would be better as... + /* + * array_offset - returns the offset of a value in an array. array_offset and + * array_offset_start are wrappers for the sake of the opr_sanity test. + * + * Returns NULL when value is not found. It uses a "NOT DISTINCT FROM" operator + * for comparation to be safe against NULL. + */ -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-11 22:50 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 3/11/15 4:37 PM, Pavel Stehule wrote:
+ /*
+ * array_offset - returns the offset of a value in an array (array_offset and
+ * array_offset_start are wrappers for safe call (look on opr_sanity test) a
+ * array_offset_common function.
+ *
+ * Returns NULL when value is not found. It uses a "NOT DISTINCT FROM" operator
+ * for comparation to be safe against NULL.
+ */
would be better as...
+ /*
+ * array_offset - returns the offset of a value in an array. array_offset and
+ * array_offset_start are wrappers for the sake of the opr_sanity test.
+ *
+ * Returns NULL when value is not found. It uses a "NOT DISTINCT FROM" operator
+ * for comparation to be safe against NULL.
+ */
fixed
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
My main question regarding this patch is whether the behavior with MD arrays is useful at all. Suppose I give it this: alvherre=# select array_offset('{{{1,2},{3,4},{5,6}},{{2,3},{4,5},{6,7}}}', 3); array_offset -------------- 3 (1 fila) What can I do with the "3" value it returned? Certainly not use it as an offset to get a slice of the original array. The only thing that seems sensible to me here is to reject the whole thing with an error, that is, only accept 1-D arrays here. We can later extend the function by allowing higher dimensionality as long as the second argument is an array one dimension less than the first argument. But if we allow the case on its appearance, it's going to be difficult to change the behavior later. Has a case been made for the current behavior? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 3/17/15 8:06 PM, Alvaro Herrera wrote: > My main question regarding this patch is whether the behavior with MD > arrays is useful at all. Suppose I give it this: > > alvherre=# select array_offset('{{{1,2},{3,4},{5,6}},{{2,3},{4,5},{6,7}}}', 3); > array_offset > -------------- > 3 > (1 fila) > > What can I do with the "3" value it returned? Certainly not use it as > an offset to get a slice of the original array. The only thing that > seems sensible to me here is to reject the whole thing with an error, > that is, only accept 1-D arrays here. We can later extend the function > by allowing higher dimensionality as long as the second argument is an > array one dimension less than the first argument. But if we allow the > case on its appearance, it's going to be difficult to change the > behavior later. +1 > Has a case been made for the current behavior? Not that I remember. There was discussion about how this should properly support MD arrays. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-18 3:45 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 3/17/15 8:06 PM, Alvaro Herrera wrote:My main question regarding this patch is whether the behavior with MD
arrays is useful at all. Suppose I give it this:
alvherre=# select array_offset('{{{1,2},{3,4},{5,6}},{{2,3},{4,5},{6,7}}}', 3);
array_offset
--------------
3
(1 fila)
What can I do with the "3" value it returned? Certainly not use it as
an offset to get a slice of the original array. The only thing that
seems sensible to me here is to reject the whole thing with an error,
that is, only accept 1-D arrays here. We can later extend the function
by allowing higher dimensionality as long as the second argument is an
array one dimension less than the first argument. But if we allow the
case on its appearance, it's going to be difficult to change the
behavior later.
This behave is consistent with "unnest" function, when all multidimensional arrays are reduced to 1ND arrays.
Other argument for this behave is impossibility to design other behave. array_offset function have to returns integer always. You cannot to returns a array of integers, what is necessary for MD position. And one integer can be only position in flatted 1ND array. I agree, so this is not user friendly, but there is not any other possible solution - we have not anyarray and anymdarray types. I designed this possibility (use ND arrays) mainly for info, if some value exists or not.
I am thinking, so this behave is correct (there is no other possible), but it is only corner case for this functionality - and if you are thinking, so better to disallow it, I am not against. My main focus is 1ND array.
Regards
Pavel
+1Has a case been made for the current behavior?
Not that I remember. There was discussion about how this should properly support MD arrays.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 3/18/15 12:27 PM, Pavel Stehule wrote: >> On 3/17/15 8:06 PM, Alvaro Herrera wrote: >> >>> My main question regarding this patch is whether the behavior with MD >>> arrays is useful at all. Suppose I give it this: >>> >>> alvherre=# select array_offset('{{{1,2},{3,4},{5,6}},{{2,3},{4,5},{6,7}}}', >>> 3); >>> array_offset >>> -------------- >>> 3 >>> (1 fila) >>> >>> What can I do with the "3" value it returned? Certainly not use it as >>> an offset to get a slice of the original array. The only thing that >>> seems sensible to me here is to reject the whole thing with an error, >>> that is, only accept 1-D arrays here. We can later extend the function >>> by allowing higher dimensionality as long as the second argument is an >>> array one dimension less than the first argument. But if we allow the >>> case on its appearance, it's going to be difficult to change the >>> behavior later. >>> > I designed this possibility (use ND arrays) > mainly for info, if some value exists or not. Why not use =ANY() for that? > I am thinking, so this behave is correct (there is no other possible), but > it is only corner case for this functionality - and if you are thinking, so > better to disallow it, I am not against. My main focus is 1ND array. A nonsensical answer for multi-dimensional arrays is worse than no answer at all. I think raising an exception is better. .m
2015-03-18 12:41 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
On 3/18/15 12:27 PM, Pavel Stehule wrote:On 3/17/15 8:06 PM, Alvaro Herrera wrote:I designed this possibility (use ND arrays)My main question regarding this patch is whether the behavior with MD
arrays is useful at all. Suppose I give it this:
alvherre=# select array_offset('{{{1,2},{3,4},{5,6}},{{2,3},{4,5},{6,7}}}',
3);
array_offset
--------------
3
(1 fila)
What can I do with the "3" value it returned? Certainly not use it as
an offset to get a slice of the original array. The only thing that
seems sensible to me here is to reject the whole thing with an error,
that is, only accept 1-D arrays here. We can later extend the function
by allowing higher dimensionality as long as the second argument is an
array one dimension less than the first argument. But if we allow the
case on its appearance, it's going to be difficult to change the
behavior later.
mainly for info, if some value exists or not.
Why not use =ANY() for that?
It is only partial solution - array_offset use "IS NOT DISTINCT FROM" operator.
I am thinking, so this behave is correct (there is no other possible), but
it is only corner case for this functionality - and if you are thinking, so
better to disallow it, I am not against. My main focus is 1ND array.
A nonsensical answer for multi-dimensional arrays is worse than no answer at all. I think raising an exception is better.
I have not problem with it.
Regards
Pavel
.m
Pavel Stehule wrote: > 2015-03-18 12:41 GMT+01:00 Marko Tiikkaja <marko@joh.to>: > >> I am thinking, so this behave is correct (there is no other > >> possible), but it is only corner case for this functionality - and > >> if you are thinking, so better to disallow it, I am not against. My > >> main focus is 1ND array. > > > > A nonsensical answer for multi-dimensional arrays is worse than no answer > > at all. I think raising an exception is better. > > > > I have not problem with it. Pushed after adding error checks there and fixing the docs to match. Please verify. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2015-03-18 20:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2015-03-18 12:41 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
> >> I am thinking, so this behave is correct (there is no other
> >> possible), but it is only corner case for this functionality - and
> >> if you are thinking, so better to disallow it, I am not against. My
> >> main focus is 1ND array.
> >
> > A nonsensical answer for multi-dimensional arrays is worse than no answer
> > at all. I think raising an exception is better.
> >
>
> I have not problem with it.
Pushed after adding error checks there and fixing the docs to match.
Please verify.
it is looking well, thank you.
small issue - there is not documented, so multidimensional arrays are not supported,
Regards
Pavel
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2015-03-18 20:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > Pushed after adding error checks there and fixing the docs to match. > > Please verify. > > > > it is looking well, thank you. Thanks for checking. > small issue - there is not documented, so multidimensional arrays are not > supported, I added a parenthised comment in the table, "(array must be one-dimensional)". I copied this from the entry for the array_remove function IIRC; it seemed enough ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 18 March 2015 at 19:03, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Pushed after adding error checks there and fixing the docs to match. > Please verify. > There's an issue when the array's lower bound isn't 1: select array_offset('[2:4]={1,2,3}'::int[], 1);array_offset -------------- 1 (1 row) whereas I would expect this to return 2. Similarly for array_offsets(), so the offsets can be used as indexes into the original array. Regards, Dean
2015-03-20 17:49 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
On 18 March 2015 at 19:03, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Pushed after adding error checks there and fixing the docs to match.
> Please verify.
>
There's an issue when the array's lower bound isn't 1:
select array_offset('[2:4]={1,2,3}'::int[], 1);
array_offset
--------------
1
(1 row)
whereas I would expect this to return 2. Similarly for
array_offsets(), so the offsets can be used as indexes into the
original array.
I am thinking, so it is ok - it returns a offset, not position.
Regards
Pavel
Regards,
Dean
Pavel Stehule wrote: > 2015-03-20 17:49 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>: > > > There's an issue when the array's lower bound isn't 1: > > > > select array_offset('[2:4]={1,2,3}'::int[], 1); > > array_offset > > -------------- > > 1 > > (1 row) > > > > whereas I would expect this to return 2. Similarly for > > array_offsets(), so the offsets can be used as indexes into the > > original array. > > > > I am thinking, so it is ok - it returns a offset, not position. So you can't use it as a subscript? That sounds unfriendly. Almost every function using this will be subtly broken. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2015-03-20 18:29 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2015-03-20 17:49 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>
> > There's an issue when the array's lower bound isn't 1:
> >
> > select array_offset('[2:4]={1,2,3}'::int[], 1);
> > array_offset
> > --------------
> > 1
> > (1 row)
> >
> > whereas I would expect this to return 2. Similarly for
> > array_offsets(), so the offsets can be used as indexes into the
> > original array.
> >
>
> I am thinking, so it is ok - it returns a offset, not position.
So you can't use it as a subscript? That sounds unfriendly. Almost
every function using this will be subtly broken.
depends what you want. It means - it is on Nth position from start. So it is useful when iterate over array, because it is safe against different array start dimensions. it works, if you use it as "offset". It is named "array_offset"
It can be changed and renamed to array_position - it is simple fix. But I am not sure, if it is better.
Regards
Pavel
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Pavel Stehule wrote: >> I am thinking, so it is ok - it returns a offset, not position. > So you can't use it as a subscript? That sounds unfriendly. Almost > every function using this will be subtly broken. I concur; perhaps "offset" was the design intention, but it's wrong. The result should be a subscript. regards, tom lane
2015-03-20 18:47 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pavel Stehule wrote:
>> I am thinking, so it is ok - it returns a offset, not position.
> So you can't use it as a subscript? That sounds unfriendly. Almost
> every function using this will be subtly broken.
I concur; perhaps "offset" was the design intention, but it's wrong.
The result should be a subscript.
do you have any idea about name for this function? array_position is ok?
Regards
Pavel
regards, tom lane
On 3/20/15 2:48 PM, Pavel Stehule wrote: > > > 2015-03-20 18:47 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>>: > > Alvaro Herrera <alvherre@2ndquadrant.com > <mailto:alvherre@2ndquadrant.com>> writes: > > Pavel Stehule wrote: > >> I am thinking, so it is ok - it returns a offset, not position. > > > So you can't use it as a subscript? That sounds unfriendly. Almost > > every function using this will be subtly broken. > > I concur; perhaps "offset" was the design intention, but it's wrong. > The result should be a subscript. > > > do you have any idea about name for this function? array_position is ok? +1 on array_position. It's possible at some point we'll actually want array_offset that does what it claims. On another note, you mentioned elsewhere that it's not possible to return anything other than an integer. Why can't there be a variation of this function that returns an array of ndims-1 that is the slice where a value was found? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-03-21 0:27 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 3/20/15 2:48 PM, Pavel Stehule wrote:
2015-03-20 18:47 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>>:
Alvaro Herrera <alvherre@2ndquadrant.com
<mailto:alvherre@2ndquadrant.com>> writes:
> Pavel Stehule wrote:
>> I am thinking, so it is ok - it returns a offset, not position.
> So you can't use it as a subscript? That sounds unfriendly. Almost
> every function using this will be subtly broken.
I concur; perhaps "offset" was the design intention, but it's wrong.
The result should be a subscript.
do you have any idea about name for this function? array_position is ok?
+1 on array_position. It's possible at some point we'll actually want array_offset that does what it claims.
additional implementation of array_position needs few lines more
On another note, you mentioned elsewhere that it's not possible to return anything other than an integer. Why can't there be a variation of this function that returns an array of ndims-1 that is the slice where a value was found?
We talked about it, when we talked about MD searching - and we moved it to next stage.
I am thinking so array_postions can support MD arrays due returning a array
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
>>> do you have any idea about name for this function? array_position is ok? >> >> +1 on array_position. It's possible at some point we'll actually want >> array_offset that does what it claims. > +1 for array_position. -1 for keeping array_offset. I'm not convinced that there are sufficient use cases for it. No other array functions deal in offsets relative to the first element, and if you do want that, it is trivial to achieve with array_position() and array_lower(). IMO having 2 functions for searching in an array will just increase the risk of users picking the wrong one by accident. Regards, Dean
Hi
here is updated patch with array_position, array_positions implementation.2015-03-20 18:29 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2015-03-20 17:49 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>
> > There's an issue when the array's lower bound isn't 1:
> >
> > select array_offset('[2:4]={1,2,3}'::int[], 1);
> > array_offset
> > --------------
> > 1
> > (1 row)
> >
> > whereas I would expect this to return 2. Similarly for
> > array_offsets(), so the offsets can be used as indexes into the
> > original array.
> >
>
> I am thinking, so it is ok - it returns a offset, not position.
So you can't use it as a subscript? That sounds unfriendly. Almost
every function using this will be subtly broken.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On 22 March 2015 at 06:11, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > here is updated patch with array_position, array_positions implementation. > > It is based on committed code - so please, revert commit > 13dbc7a824b3f905904cab51840d37f31a07a9ef and apply this patch > I checked this and the changes look good, except that you missed a couple of places in src/include/utils/array.h that need updating. In the public docs, you should s/position/subscript because that's the term used throughout the docs for an index into an array. I still like the name array_position() for the function though, because it's consistent with the existing position() functions. Regards, Dean
2015-03-22 11:30 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
On 22 March 2015 at 06:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> here is updated patch with array_position, array_positions implementation.
>
> It is based on committed code - so please, revert commit
> 13dbc7a824b3f905904cab51840d37f31a07a9ef and apply this patch
>
I checked this and the changes look good, except that you missed a
couple of places in src/include/utils/array.h that need updating.
In the public docs, you should s/position/subscript because that's the
term used throughout the docs for an index into an array. I still like
the name array_position() for the function though, because it's
consistent with the existing position() functions.
updated patch
Regards,
Dean
Вложения
Pavel Stehule wrote: > 2015-03-22 11:30 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>: > > In the public docs, you should s/position/subscript because that's the > > term used throughout the docs for an index into an array. I still like > > the name array_position() for the function though, because it's > > consistent with the existing position() functions. > > updated patch Thanks, pushed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2015-03-30 21:30 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2015-03-22 11:30 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
> > In the public docs, you should s/position/subscript because that's the
> > term used throughout the docs for an index into an array. I still like
> > the name array_position() for the function though, because it's
> > consistent with the existing position() functions.
>
> updated patch
Thanks, pushed.
Thank you very much
Pavel
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services