Обсуждение: proposal: function parse_ident
Hi
I miss a functionality that helps with parsing any identifier to basic three parts - database, schema, objectname. We have this function internally, but it is not available for SQL layer. Pavel Stehule <pavel.stehule@gmail.com> writes: > I miss a functionality that helps with parsing any identifier to basic > three parts - database, schema, objectname. We have this function > internally, but it is not available for SQL layer. > FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text, > OUT objectname text) What exactly would you do with this that would not be better done with, for example, regclass? Don't say "parse names for things other than tables". Only a minority of the types of objects used in the database have names that meet this specification. regards, tom lane
Hi
2015-08-19 21:33 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I miss a functionality that helps with parsing any identifier to basic
> three parts - database, schema, objectname. We have this function
> internally, but it is not available for SQL layer.
> FUNCTION parse_ident(IN ident text, OUT dbname text, OUT schemaname text,
> OUT objectname text)
What exactly would you do with this that would not be better done with,
for example, regclass?
Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.
I see one important reason and one minor reason:
Important - cast to regclass is possible only for existing objects - parse_ident doesn't check validity of parsed ident.
minor - cast to regclass depends on search_path - but parse_ident not - with this function I am able to detect if ident depends (or not) on search_path.
Regards
Pavel
regards, tom lane
On 8/19/15 2:44 PM, Pavel Stehule wrote: > Don't say "parse names for things other than tables". Only a minority > of the types of objects used in the database have names that meet this > specification. Really? My impression is that almost everything that's not a shared object allows for a schema... > I see one important reason and one minor reason: > > Important - cast to regclass is possible only for existing objects - > parse_ident doesn't check validity of parsed ident. > minor - cast to regclass depends on search_path - but parse_ident not - > with this function I am able to detect if ident depends (or not) on > search_path. I've been forced to write this several times. I'd really like to expose this functionality. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@bluetreble.com> writes: >> Don't say "parse names for things other than tables". Only a minority >> of the types of objects used in the database have names that meet this >> specification. > Really? My impression is that almost everything that's not a shared > object allows for a schema... Tables meet this naming spec. Columns, functions, operators, operator classes/families, collations, constraints, and conversions do not (you need more data to name them). Schemas, databases, languages, extensions, and some other things also do not, because you need *less* data to name them. Types also don't really meet this naming spec, because you need to contend with special cases like "int[]" or "timestamp with time zone". So this proposal doesn't seem very carefully thought-through to me, or at least the use case is much narrower than it could be. Also, if "object does not exist" isn't supposed to be an error case, what of "name is not correctly formatted"? It seems a bit arbitrary to me to throw an error in one case but not the other. regards, tom lane
2015-08-20 2:22 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>> Don't say "parse names for things other than tables". Only a minority
>> of the types of objects used in the database have names that meet this
>> specification.
> Really? My impression is that almost everything that's not a shared
> object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.
Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
When I would to work with living object, then behave of cast to regclass is correct, but I can work with object, that will be created in future, and I need to take some other information about this future object - and then cast has to fail.
Regards
Pavel
regards, tom lane
On 8/19/15 7:22 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@bluetreble.com> writes: >>> Don't say "parse names for things other than tables". Only a minority >>> of the types of objects used in the database have names that meet this >>> specification. > >> Really? My impression is that almost everything that's not a shared >> object allows for a schema... > > Tables meet this naming spec. Columns, functions, operators, operator > classes/families, collations, constraints, and conversions do not (you > need more data to name them). Schemas, databases, languages, extensions, > and some other things also do not, because you need *less* data to name > them. Types also don't really meet this naming spec, because you need to > contend with special cases like "int[]" or "timestamp with time zone". > So this proposal doesn't seem very carefully thought-through to me, > or at least the use case is much narrower than it could be. > > Also, if "object does not exist" isn't supposed to be an error case, > what of "name is not correctly formatted"? It seems a bit arbitrary > to me to throw an error in one case but not the other. I think the important point here is this is *parse*_ident(). It's not meant to guarantee an object actually exists, what kind of object it is, or whether it's syntactically correct. It's meant only to separate an identifier into it's 3 (or in some cases 2) components. If this was as simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's obviously a lot more complex than that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 8/19/15 7:22 PM, Tom Lane wrote:Jim Nasby <Jim.Nasby@bluetreble.com> writes:Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.
Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
I think the important point here is this is *parse*_ident(). It's not meant to guarantee an object actually exists, what kind of object it is, or whether it's syntactically correct. It's meant only to separate an identifier into it's 3 (or in some cases 2) components. If this was as simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's obviously a lot more complex than that.
parsing composite identifier is pretty complex - and almost all work is done - it just need SQL envelope only
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi
2015-08-21 7:15 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
2015-08-20 21:16 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 8/19/15 7:22 PM, Tom Lane wrote:Jim Nasby <Jim.Nasby@bluetreble.com> writes:Don't say "parse names for things other than tables". Only a minority
of the types of objects used in the database have names that meet this
specification.Really? My impression is that almost everything that's not a shared
object allows for a schema...
Tables meet this naming spec. Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them). Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them. Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.
Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"? It seems a bit arbitrary
to me to throw an error in one case but not the other.
I think the important point here is this is *parse*_ident(). It's not meant to guarantee an object actually exists, what kind of object it is, or whether it's syntactically correct. It's meant only to separate an identifier into it's 3 (or in some cases 2) components. If this was as simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's obviously a lot more complex than that.parsing composite identifier is pretty complex - and almost all work is done - it just need SQL envelope only
here is the patch
It is really trivial - all heavy work was done done before.
Regards
Pavel
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote: > here is the patch > > It is really trivial - all heavy work was done done before. This seems to entirely disregard the comments in http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us about the fact that this approach only works for a few object types? Also, for the umpteenst time: Start actually quoting in a sane manner. Greetings, Andres Freund
2015-09-03 13:11 GMT+02:00 Andres Freund <andres@anarazel.de>:
On 2015-08-23 17:46:36 +0200, Pavel Stehule wrote:
> here is the patch
>
> It is really trivial - all heavy work was done done before.
This seems to entirely disregard the comments in
http://archives.postgresql.org/message-id/29030.1440030152%40sss.pgh.pa.us
about the fact that this approach only works for a few object types?
The alghoritm for parsing identifiers is same - the differences are in a names of levels, and in ending symbols.
I don't would to write totally generic parser - more without access to system catalog or without external hint, you cannot to decide if identifier is schema.table or table.column. But the rules for parsing is exactly same.
The function can be redesigned little bit:
FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT specific text)
so it can parse function myschema.myfunc(xx int)
level1: NULL
level2: myschema
level3: myfunc
specific: (xx int)
Is it acceptable?
Regards
Pavel
Also, for the umpteenst time: Start actually quoting in a sane manner.
Greetings,
Andres Freund
On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > The alghoritm for parsing identifiers is same - the differences are in a > names of levels, and in ending symbols. > > I don't would to write totally generic parser - more without access to > system catalog or without external hint, you cannot to decide if identifier > is schema.table or table.column. But the rules for parsing is exactly same. > > The function can be redesigned little bit: > > FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT > specific text) > > so it can parse function myschema.myfunc(xx int) > > level1: NULL > level2: myschema > level3: myfunc > specific: (xx int) > > Is it acceptable? Well, *I* think that would be useful. I'm not sure it belongs in core, but useful? Yeah, definitely. I would probably make it text[] rather than level1, level2, level3, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> The alghoritm for parsing identifiers is same - the differences are in a
> names of levels, and in ending symbols.
>
> I don't would to write totally generic parser - more without access to
> system catalog or without external hint, you cannot to decide if identifier
> is schema.table or table.column. But the rules for parsing is exactly same.
>
> The function can be redesigned little bit:
>
> FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
> specific text)
>
> so it can parse function myschema.myfunc(xx int)
>
> level1: NULL
> level2: myschema
> level3: myfunc
> specific: (xx int)
>
> Is it acceptable?
Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.
Returning a array is a good idea.
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Sep 8, 2015 at 8:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2015-09-08 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:On Fri, Sep 4, 2015 at 12:24 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> The alghoritm for parsing identifiers is same - the differences are in a
> names of levels, and in ending symbols.
>
> I don't would to write totally generic parser - more without access to
> system catalog or without external hint, you cannot to decide if identifier
> is schema.table or table.column. But the rules for parsing is exactly same.
>
> The function can be redesigned little bit:
>
> FUNCTION parse_ident(OUT level1 text,OUT level2 text,OUT level3 text,OUT
> specific text)
>
> so it can parse function myschema.myfunc(xx int)
>
> level1: NULL
> level2: myschema
> level3: myfunc
> specific: (xx int)
>
> Is it acceptable?
Well, *I* think that would be useful. I'm not sure it belongs in
core, but useful? Yeah, definitely. I would probably make it text[]
rather than level1, level2, level3, though.Returning a array is a good idea.
+1
I would have immediate use for this. So often a function is written with a table name as a parameter and it's not immediately clear if the schema is to be parsed out of the string, prescribed, or a separate parameter...in which case the function signature now has a clumsy optional schema parameter somewhere. I've written this bit of code probably five times now, let's make it a solved problem.
text[] return seems most sensible. While I can see the use for a record output, it wouldn't be used as often.
text[] return seems most sensible. While I can see the use for a record output, it wouldn't be used as often.
2015-09-08 20:17 GMT+02:00 Corey Huinker <corey.huinker@gmail.com>:
I would have immediate use for this. So often a function is written with a table name as a parameter and it's not immediately clear if the schema is to be parsed out of the string, prescribed, or a separate parameter...in which case the function signature now has a clumsy optional schema parameter somewhere. I've written this bit of code probably five times now, let's make it a solved problem.
text[] return seems most sensible. While I can see the use for a record output, it wouldn't be used as often.
here is a patch
I cannot to use current SplitIdentifierString because it is designed for different purpose - and it cannot to separate non identifier part. But the code is simple - and will be cleaned.
postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)
postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
┌───────────────┬───────┐
│ parts │ other │
╞═══════════════╪═══════╡
│ {AHOJ,nazdar} │ [] │
└───────────────┴───────┘
(1 row)
Regards
Pavel
Вложения
Hi
I sending the path with little bit enhanced parser - it doesn't support utf8 alpha in identifiers yetВложения
next iteration - fixed bug in parsing UTF8 chars, enhanced error messages.
Regards
Pavel
Вложения
Pavel Stehule wrote: > I cannot to use current SplitIdentifierString because it is designed for > different purpose - and it cannot to separate non identifier part. But the > code is simple - and will be cleaned. > > postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text); > ┌───────────────┬───────┐ > │ parts │ other │ > ╞═══════════════╪═══════╡ > │ {AHOJ,nazdar} │ [] │ > └───────────────┴───────┘ > (1 row) Um. Now this is really getting into much of the same territory I got into with the objname/objargs arrays for pg_get_object_address. I think the "other" bit is a very poor solution to that. If you want to be able to parse names for all kinds of objects, you need a solution much more complex than this function. I think a clean solution returns three sets of things; one is the primary part of the name, which is an array of text; the second is the secondary name, which is another array of text; the third is an array of TypeName. For the name of a relation, only the first of these arrays is used. For the name of objects subsidiary to a relation, the first two are used (the first array is the name of the relation itself, and the second array is the name of the object; for instance a constraint name, or a trigger name). The array of type names is necessary because the parsing of TypeName is completely unlike parsing of plain names. You need [] decorator and typmod. If you consider objects such as casts, you need two TypeNames ("from" and "to"), hence this is an array and not a single one. As far as I recall there are other object types that also need more than one TypeName. For the name of a function, you need the first text array, and the array of TypeName which are the input argument types. If you don't want to have all this complexity, I think you need to forgo the idea of the "other" thingy that you propose above, and just concern yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an identifier. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2015-09-09 21:55 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> I cannot to use current SplitIdentifierString because it is designed for
> different purpose - and it cannot to separate non identifier part. But the
> code is simple - and will be cleaned.
>
> postgres=# select * from parse_ident('"AHOJ".NAZDAR[]'::text);
> ┌───────────────┬───────┐
> │ parts │ other │
> ╞═══════════════╪═══════╡
> │ {AHOJ,nazdar} │ [] │
> └───────────────┴───────┘
> (1 row)
Um. Now this is really getting into much of the same territory I got
into with the objname/objargs arrays for pg_get_object_address. I think
the "other" bit is a very poor solution to that.
If you want to be able to parse names for all kinds of objects, you need
a solution much more complex than this function. I think a clean
solution returns three sets of things; one is the primary part of the
name, which is an array of text; the second is the secondary name, which
is another array of text; the third is an array of TypeName.
For the name of a relation, only the first of these arrays is used. For
the name of objects subsidiary to a relation, the first two are used
(the first array is the name of the relation itself, and the second
array is the name of the object; for instance a constraint name, or a
trigger name).
The array of type names is necessary because the parsing of TypeName is
completely unlike parsing of plain names. You need [] decorator and
typmod. If you consider objects such as casts, you need two TypeNames
("from" and "to"), hence this is an array and not a single one. As far
as I recall there are other object types that also need more than one
TypeName.
For the name of a function, you need the first text array, and the array
of TypeName which are the input argument types.
If you don't want to have all this complexity, I think you need to forgo
the idea of the "other" thingy that you propose above, and just concern
yourself with the first bits. I don't think "AHOJ".NAZDAR[] is an
identifier.
yes, usually I don't need a "other" part. And If I need it, then I can get it as difference against a original string. But sometimes you don't get a clean string - and you have to find a end of identifier. The SplitIdentifierString calculates only with separator char, and it cannot to find end of ident. So little bit modified API can look like
CREATE OR REPLACE FUNCTION parse_ident(str text, strict boolean DEFAULT true) RETURNS text[]
raise exception "syntax error" for '"AHOJ".NAZDAR[]' when "strict" is true
returns "AHOJ".nazdar for '"AHOJ".NAZDAR[]' when "strict" is false
Pavel
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/11/15 6:25 AM, Pavel Stehule wrote: > new update of parse_ident function patch How would you actually use this? I know several people have spoken up that they could use this, but could you provide a few actual practical examples?
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">2015-09-16 2:41 GMT+02:00 Peter Eisentraut <spandir="ltr"><<a href="mailto:peter_e@gmx.net" target="_blank">peter_e@gmx.net</a>></span>:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 9/11/15 6:25 AM,Pavel Stehule wrote:<br /> > new update of parse_ident function patch<br /><br /></span>How would you actually usethis?<br /><br /> I know several people have spoken up that they could use this, but could<br /> you provide a few actualpractical examples?<br /><br /></blockquote></div><br /></div><div class="gmail_extra">I see two basic use cases<br/><br /></div><div class="gmail_extra">1. processing user input with little bit more comfort - the user doesn't needto separate schema and table<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">CREATE OR REPLACEFUNCTION createtable(tablename text)<br /></div><div class="gmail_extra">RETURNS void AS $$<br /></div><div class="gmail_extra">DECLAREnames text[];<br />BEGIN<br /></div><div class="gmail_extra"> names := parse_ident(tablename);<br/></div><div class="gmail_extra"> IF array_length(names) > 2 || array_length(names) = 0 THEN<br/></div><div class="gmail_extra"> RAISE EXCEPTION 'wrong identifier';<br /></div><div class="gmail_extra"> ENDIF;<br /></div><div class="gmail_extra"> IF names[2] IS NOT NULL THEN<br /></div><div class="gmail_extra"> CREATESCHEMA IF NOT EXISTS names[2];<br /></div><div class="gmail_extra"> END IF;<br /></div><div class="gmail_extra"> CREATE TABLE tablename;<br /></div><div class="gmail_extra">END;<br /></div><div class="gmail_extra">$$LANGUAGE plpgsql;<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">2. parsingerror messages or some automatic variables<br /><br /></div><div class="gmail_extra">Regards<br /><br /></div><divclass="gmail_extra">Pavel<br /></div></div>
On 9/11/15 12:25 PM, Pavel Stehule wrote: > new update of parse_ident function patch Nice! I've certainly wanted something like this a number of times. Some comments about the v2 of the patch: - The patch doesn't apply anymore, so it should be rebased. - The docs don't even try and explain what the "strictmode" parameter does. - The comment before the parse_ident function is not up to date anymore, since "the rest" was removed from the interface. - I can't immediately grep for any uses of do { .. } while (true) from our code base. AFAICT the majority look like for (;;); I see no reason not to be consistent here. - What should happen if the input is a string like 'one.two.three.four.five.six'? Do we want to accept input like that? - I haven't reviewed the actual parsing code carefully,but didn't we already have a function which splits identifiers up? I of course can't find one with my grepping right now, so I might be wrong. .m
Hi
2015-11-17 1:49 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
fixed
fixed
fixed
I don't see the reason why not. It is pretty simple to count fields in result array and raise error later. The application has better information about expected and valid numbers. But any opinion in this question should be valid. I have not strong position here.
On 9/11/15 12:25 PM, Pavel Stehule wrote:new update of parse_ident function patch
Nice! I've certainly wanted something like this a number of times.
Some comments about the v2 of the patch:
- The patch doesn't apply anymore, so it should be rebased.
done
- The docs don't even try and explain what the "strictmode" parameter does.
fixed
- The comment before the parse_ident function is not up to date anymore, since "the rest" was removed from the interface.
fixed
- I can't immediately grep for any uses of do { .. } while (true) from our code base. AFAICT the majority look like for (;;); I see no reason not to be consistent here.
fixed
- What should happen if the input is a string like 'one.two.three.four.five.six'? Do we want to accept input like that?
I don't see the reason why not. It is pretty simple to count fields in result array and raise error later. The application has better information about expected and valid numbers. But any opinion in this question should be valid. I have not strong position here.
- I haven't reviewed the actual parsing code carefully, but didn't we already have a function which splits identifiers up? I of course can't find one with my grepping right now, so I might be wrong.
There is: SplitIdentifierString or textToQualifiedNameList in varlena.c. My first patch was based on these functions. But I cannot to use it.
1. major reason: The buildin parser is based on searching the dot "." and doesn't search any disallowed identifiers chars. So there is not possible to implement non strict mode - find last char of last identifier and ignore other.
2. minor reason: little bit more precious diagnostics - buildin routines returns only true (valid) and false (invalid).
Regards
Pavel
.m
Вложения
On 9/15/15 11:49 PM, Pavel Stehule wrote: > 1. processing user input with little bit more comfort - the user doesn't > need to separate schema and table This is especially useful if you're doing anything that needs to dynamically work with different objects. I'd say about 80% of the time I'm doing this ::regclass is good enough, but obviously the object has to exist then, and ::regclass doesn't separate schema from name. There's a number of other handy convenience functions/views you can create to interface with the catalog, none of which are rocket science. But you're pretty screwed if what you want isn't in the catalog yet. (On a side note, something my TODO is to restart pg_newsysviews as an extension, and then add a bunch of convenience functions on top of that... things like relation_info(regclass) RETURNS (everything in pg_class, plus other useful bits like nspname), and relation_schema(regclass) RETURNS regnamespace). FWIW, the other function I've wanted in the past that's difficult to implement externally is parsing the arguments of a function definition. ::regprocedure kinda works for this, but it blows up on certain things (defaults being one, iirc). I've specifically wanted that capability for a function I wrote that made it easy to specify *everything* about a function in a single call, including it's permissions and a comment on the function. That may sound trivial, but it's a PITA to cut and paste the whole argument list into multiple REVOKE/GRANT/COMMENT on statements. Even worse, not all the options of CREATE FUNCTION are supported in those other commands, so often you can't even just cut and paste. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
On Thu, Dec 3, 2015 at 5:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > There is: SplitIdentifierString or textToQualifiedNameList in varlena.c. My > first patch was based on these functions. But I cannot to use it. > > 1. major reason: The buildin parser is based on searching the dot "." and > doesn't search any disallowed identifiers chars. So there is not possible to > implement non strict mode - find last char of last identifier and ignore > other. > 2. minor reason: little bit more precious diagnostics - buildin routines > returns only true (valid) and false (invalid). I am moving that to next CF because there is a patch but no actual reviews, and a couple of hackers have showed interest in having that based on the latest updates on this thread. -- Michael
Hi Pavel, Sorry for the lack of review here. I didn't realize I was still the reviewer for this after it had been moved to another commitfest. That said, I think I've exhausted my usefulness here as a reviewer. Marking ready for committer. .m
On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote: > Hi Pavel, > > Sorry for the lack of review here. I didn't realize I was still the > reviewer for this after it had been moved to another commitfest. > > That said, I think I've exhausted my usefulness here as a reviewer. Marking > ready for committer. + errmsg("identifier contains disallowed chars"), + errdetail("string \"%s\" is not valid identifier", + text_to_cstring(qualname)))); Perhaps, "identifier contains not allowed character" is better? -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote: > + errmsg("identifier contains disallowed chars"), > + errdetail("string \"%s\" is not valid identifier", > + text_to_cstring(qualname)))); > Perhaps, "identifier contains not allowed character" is better? "disallowed" reads better to me. I agree with expanding "chars" to "characters" though. Also, the errdetail is conveying no actual extra detail AFAICS. I'd go with something like errmsg("identifier contains disallowed characters: \"%s\"", text_to_cstring(qualname))); regards, tom lane The errdeta regards, tom lane > -- > Michael > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Hi
2016-01-23 16:36 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Jan 23, 2016 at 1:25 AM, Marko Tiikkaja <marko@joh.to> wrote:
> + errmsg("identifier contains disallowed chars"),
> + errdetail("string \"%s\" is not valid identifier",
> + text_to_cstring(qualname))));
> Perhaps, "identifier contains not allowed character" is better?
"disallowed" reads better to me. I agree with expanding "chars" to
"characters" though. Also, the errdetail is conveying no actual extra
detail AFAICS. I'd go with something like
errmsg("identifier contains disallowed characters: \"%s\"",
text_to_cstring(qualname)));
regards, tom lane
rebased, messages changes per Tom's proposal
Regards
Pavel
The errdeta
regards, tom lane
> --
> Michael
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
> rebased, messages changes per Tom's proposal Cool feature and sometimes I needed it a lot. But, seems, there are some bugs in error processing. 1 Following query is okay: # select * from parse_ident(E'"Some \r Schema".someTable'); parse_ident ------------------------------ {"Some \r Schema",sometable} but: % select * from parse_ident(E'"Some \r Schema".9someTable'); Schema".9someTable"tifier after "." symbol: ""Some Return carriage is not escaped in error message. Suppose, any other special charaters will not be escaped. 2 # select * from parse_ident('.someTable'); ERROR: missing identifier after "." symbol: ".someTable" Why AFTER instead of BEFORE? 2 Function succesfully truncates long indentifier but not in case of quoted identifier. select length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ; length | length --------+-------- 414 | 63 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
2016-02-08 16:55 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
rebased, messages changes per Tom's proposalCool feature and sometimes I needed it a lot.
But, seems, there are some bugs in error processing.
I am looking on it
Regards
Pavel
Pavel Stehule wrote: > I am looking on it Thanks, closing as returned-with-feedback. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2016-02-08 16:55 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
rebased, messages changes per Tom's proposalCool feature and sometimes I needed it a lot.
But, seems, there are some bugs in error processing.
1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
parse_ident
------------------------------
{"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
Schema".9someTable"tifier after "." symbol: ""Some
Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.
2
# select * from parse_ident('.someTable');
ERROR: missing identifier after "." symbol: ".someTable"
Why AFTER instead of BEFORE?
fixed - now the function produce more adequate message - see regress tests
2
Function succesfully truncates long indentifier but not in case of quoted identifier.
select length(a[1]), length(a[2]) from parse_ident('"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx".yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy') as a ;
length | length
--------+--------
414 | 63
fixed - I used the function downcase_truncate_identifier, that does truncating. I agree - in this case default truncating isn't practical - and user can explicitly truncate (or implicitly by casting to "name")
New patch attached
Thank you for test
Regards
Pavel
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested NEEDS CATVERSION BUMP I editorialized the docs and some of the comments. In particular, I documented behavior of not truncating, and recommendedcasting to name[] if user cares about that. Added a unit test to verify that works. BTW, I saw mention in thethread about not truncated spaces, but the function *does* truncate them, unless they're inside quotes, where they'relegitimate. Also added test for invalid characters. I think "strict" would be more in line with other uses in code. There are currently no other occurrences of 'strictmode'in the code. There are loads of references to 'strict', but I didn't go through all of them to see if any wereused as externally visible function parameter names. qualname_str is used in exactly 1 place. Either it should be gotten rid of, or all the uses of text_to_cstring(qualname)should be changed to qualname_str. I think the code would have been clearer if instead of the big if (*nextp == '\"') it did the same "inquote" looping thatis done elsewhere, but I don't have a strong opinion on it. The new status of this patch is: Waiting on Author
On 2/10/16 12:26 PM, Jim Nasby wrote: > I editorialized the docs and some of the comments. In particular, I documented behavior of not truncating, and recommendedcasting to name[] if user cares about that. Added a unit test to verify that works. BTW, I saw mention in thethread about not truncated spaces, but the function*does* truncate them, unless they're inside quotes, where they'relegitimate. New patch for that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
2016-02-10 19:26 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
NEEDS CATVERSION BUMP
I editorialized the docs and some of the comments. In particular, I documented behavior of not truncating, and recommended casting to name[] if user cares about that. Added a unit test to verify that works. BTW, I saw mention in the thread about not truncated spaces, but the function *does* truncate them, unless they're inside quotes, where they're legitimate.
ok
Also added test for invalid characters.
I think "strict" would be more in line with other uses in code. There are currently no other occurrences of 'strictmode' in the code. There are loads of references to 'strict', but I didn't go through all of them to see if any were used as externally visible function parameter names.
I am sorry, I don't understand to this point. You unlike the name of parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?
qualname_str is used in exactly 1 place. Either it should be gotten rid of, or all the uses of text_to_cstring(qualname) should be changed to qualname_str.
fixed, qualname_str is used everywhere
I think the code would have been clearer if instead of the big if (*nextp == '\"') it did the same "inquote" looping that is done elsewhere, but I don't have a strong opinion on it.
The almost all code +/- is related to human readable error messages. We can move some code to separate static functions - read_quoted_ident, read_unquoted_ident, but there will be some magic about parameters, and the code will not be much better, than it is now.
The new status of this patch is: Waiting on Author
Thank you for your work on documentation.
I am sending updated version of this patch.
Regards
Pavel
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi Jim
2016-02-11 8:27 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>
ok
Also added test for invalid characters.
I think "strict" would be more in line with other uses in code. There are currently no other occurrences of 'strictmode' in the code. There are loads of references to 'strict', but I didn't go through all of them to see if any were used as externally visible function parameter names.I am sorry, I don't understand to this point. You unlike the name of parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?
Please, can you explain this point?
Regards
Pavel
On 2/11/16 1:27 AM, Pavel Stehule wrote: > I editorialized the docs and some of the comments. In particular, I > documented behavior of not truncating, and recommended casting to > name[] if user cares about that. Added a unit test to verify that > works. BTW, I saw mention in the thread about not truncated spaces, > but the function *does* truncate them, unless they're inside quotes, > where they're legitimate. > > > ok I missed some of my edits. Updated patch with those in place attached. > Also added test for invalid characters. > > I think "strict" would be more in line with other uses in code. > There are currently no other occurrences of 'strictmode' in the > code. There are loads of references to 'strict', but I didn't go > through all of them to see if any were used as externally visible > function parameter names. > > > I am sorry, I don't understand to this point. You unlike the name of > parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ? I would just call it strict. There's precedent for that in the code. > The almost all code +/- is related to human readable error messages. We > can move some code to separate static functions - read_quoted_ident, > read_unquoted_ident, but there will be some magic about parameters, and > the code will not be much better, than it is now. What I'm saying is that most places that need to do de-quoting or similar just run a simple while loop and use an in_quote variable to track whether they're inside a quote or not. See backend/utils/adt/rowtypes.c line 199 for an example. As I said, I don't have a strong opinion on it, so if you prefer it this way that's fine with me. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
2016-02-17 1:38 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 2/11/16 1:27 AM, Pavel Stehule wrote:I editorialized the docs and some of the comments. In particular, I
documented behavior of not truncating, and recommended casting to
name[] if user cares about that. Added a unit test to verify that
works. BTW, I saw mention in the thread about not truncated spaces,
but the function *does* truncate them, unless they're inside quotes,
where they're legitimate.
ok
I missed some of my edits. Updated patch with those in place attached.Also added test for invalid characters.
I think "strict" would be more in line with other uses in code.
There are currently no other occurrences of 'strictmode' in the
code. There are loads of references to 'strict', but I didn't go
through all of them to see if any were used as externally visible
function parameter names.
I am sorry, I don't understand to this point. You unlike the name of
parameter "strictmode" ? Have you any proposal? Maybe "restrictive" ?
I would just call it strict. There's precedent for that in the code.
+1
fixed in attached patch
The almost all code +/- is related to human readable error messages. We
can move some code to separate static functions - read_quoted_ident,
read_unquoted_ident, but there will be some magic about parameters, and
the code will not be much better, than it is now.
What I'm saying is that most places that need to do de-quoting or similar just run a simple while loop and use an in_quote variable to track whether they're inside a quote or not. See backend/utils/adt/rowtypes.c line 199 for an example.
As I said, I don't have a strong opinion on it, so if you prefer it this way that's fine with me.
yes, I don't see string differences between for(;;) and break and while(var). I prefer current state.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Вложения
> I missed some of my edits. Updated patch with those in place attached. I'm sorry, but special chararacter still isn't escaped correctly in error messages: % select parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'); XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX""X Time: 0,510 ms -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Hi
2016-02-17 14:02 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
I missed some of my edits. Updated patch with those in place attached.
I'm sorry, but special chararacter still isn't escaped correctly in error messages:
% select parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX""X
Time: 0,510 ms
:(, .. I'll fix it today or tomorrow, when I'll have free time
Pavel
Hi
2016-02-18 4:59 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
select parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
I am sending updated patch - I used json function for correct escaping - the escaping behave is same.
Regards
Pavel
Вложения
> select > parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'); > > > I am sending updated patch - I used json function for correct escaping - the > escaping behave is same. Hmm, it doesn't look so: % select parse_ident(E'_\005'); ERROR: identifier contains disallowed characters: "_\u0005" % select parse_ident(E'\005'); ERROR: missing identifier: "\u0005" but # select parse_ident(E'"\005"'); parse_ident ------------- {\x05} Error messages above point wrong character wrongly. One more inconsistence: # select parse_ident(E'"\005"') as "\005"; \005 -------- {\x05} Display outputs of actual identifier and parse_indent are differ. Actually, I can live with both but are any other opinions? Seems, at least difference of actual identifier and output of parse_indent should be pointed in docs. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
2016-03-10 15:34 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
select
parse_ident(E'X\rXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
I am sending updated patch - I used json function for correct escaping - the
escaping behave is same.
Hmm, it doesn't look so:
% select parse_ident(E'_\005');
ERROR: identifier contains disallowed characters: "_\u0005"
% select parse_ident(E'\005');
ERROR: missing identifier: "\u0005"
but
# select parse_ident(E'"\005"');
parse_ident
-------------
{\x05}
Error messages above point wrong character wrongly.
One more inconsistence:
# select parse_ident(E'"\005"') as "\005";
\005
--------
{\x05}
Display outputs of actual identifier and parse_indent are differ.
Actually, I can live with both but are any other opinions? Seems, at least difference of actual identifier and output of parse_indent should be pointed in docs.
I afraid so I cannot to fix this inconsistency (if this is inconsistency - the binary values are same) - the parameter of function is raw string with processed escape codes, and I have not any information about original escape sequences. When you enter octet value, and I show it as hex value, then there should be difference. Buy I have not information about your input (octet or hex). I have the original string of SQL identifier inside parser, executor, but I have not original string of function parameter inside function (not without pretty complex and long code).
I am trying describe it in doc (I am sorry for my less level English) in new patch. Fixed duplicated oid too.
Regards
Pavel
Вложения
> I afraid so I cannot to fix this inconsistency (if this is inconsistency - the > binary values are same) - the parameter of function is raw string with processed > escape codes, and I have not any information about original escape sequences. > When you enter octet value, and I show it as hex value, then there should be > difference. Buy I have not information about your input (octet or hex). I have > the original string of SQL identifier inside parser, executor, but I have not > original string of function parameter inside function (not without pretty > complex and long code). Ok, agree > > I am trying describe it in doc (I am sorry for my less level English) in new > patch. Fixed duplicated oid too. Edited a bit + fix some typos and remove unneeded headers, patch attached Sorry, I can't find all corner-cases at once, but: SELECT parse_ident(E'"c".X XXXXXXXXXX'); ERROR: identifier contains disallowed characters: "\"c" Error message wrongly points to the reason of error. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Вложения
2016-03-14 17:39 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
I afraid so I cannot to fix this inconsistency (if this is inconsistency - theOk, agree
binary values are same) - the parameter of function is raw string with processed
escape codes, and I have not any information about original escape sequences.
When you enter octet value, and I show it as hex value, then there should be
difference. Buy I have not information about your input (octet or hex). I have
the original string of SQL identifier inside parser, executor, but I have not
original string of function parameter inside function (not without pretty
complex and long code).Edited a bit + fix some typos and remove unneeded headers, patch attached
I am trying describe it in doc (I am sorry for my less level English) in new
patch. Fixed duplicated oid too.
Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XXXXXXXXXX');
ERROR: identifier contains disallowed characters: "\"c"
I'll check it tomorrow
Thank you
Pavel
Error message wrongly points to the reason of error.
Hi
2016-03-14 17:39 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
I afraid so I cannot to fix this inconsistency (if this is inconsistency - theOk, agree
binary values are same) - the parameter of function is raw string with processed
escape codes, and I have not any information about original escape sequences.
When you enter octet value, and I show it as hex value, then there should be
difference. Buy I have not information about your input (octet or hex). I have
the original string of SQL identifier inside parser, executor, but I have not
original string of function parameter inside function (not without pretty
complex and long code).Edited a bit + fix some typos and remove unneeded headers, patch attached
I am trying describe it in doc (I am sorry for my less level English) in new
patch. Fixed duplicated oid too.
Sorry, I can't find all corner-cases at once, but:
SELECT parse_ident(E'"c".X XXXXXXXXXX');
ERROR: identifier contains disallowed characters: "\"c"
Error message wrongly points to the reason of error.
I forgot my original plan - show full original string. Now, complete original parameter is used as part of message everywhere. It is more consistent.
I cannot to reuse escape_json - it escape double quotes, and then the paremeter in message looks strange.
I hope so the messages are ok now. Few more regress tests added.
Thank you for your patience.
Regards
Pavel
Вложения
> I hope so the messages are ok now. Few more regress tests added. Thank you, committed. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
2016-03-18 16:26 GMT+01:00 Teodor Sigaev <teodor@sigaev.ru>:
I hope so the messages are ok now. Few more regress tests added.
Thank you, committed.
Thank you very much
Pavel