Обсуждение: BUG #13960: plpython fails with certain function names

Поиск
Список
Период
Сортировка

BUG #13960: plpython fails with certain function names

От
Jim.Nasby@BlueTreble.com
Дата:
The following bug has been logged on the website:

Bug reference:      13960
Logged by:          Jim Nasby
Email address:      Jim.Nasby@BlueTreble.com
PostgreSQL version: 9.5.0
Operating system:   OS X
Description:

If a Postgres function contains characters that are illegal for python
identifiers, compilation fails. Error message is not very helpful either:

~@decina.local/21474# CREATE FUNCTION "test[]"() RETURNS text LANGUAGE
plpythonu AS $$return 'test'$$;
ERROR:  could not compile PL/Python function "test[]"
DETAIL:  SyntaxError: invalid syntax (<string>, line 1)
~@decina.local/21474# CREATE FUNCTION "test"() RETURNS text LANGUAGE
plpythonu AS $$return 'test'$$;
CREATE FUNCTION
~@decina.local/21474#

One possibility is to simply strip out invalid characters[1]. What's valid
expands in 3.5, but I don't think that really matters.

Another possibility would be to catch the python exception, but that's
probably more trouble than it's worth.

[1]
https://docs.python.org/2/reference/lexical_analysis.html#grammar-token-identifier
identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

3.5:
https://docs.python.org/3.5/reference/lexical_analysis.html#grammar-token-identifier

Re: BUG #13960: plpython fails with certain function names

От
Tom Lane
Дата:
Jim.Nasby@BlueTreble.com writes:
> If a Postgres function contains characters that are illegal for python
> identifiers, compilation fails. Error message is not very helpful either:

Hm, how much do we really care?  The example seems kinda artificial.

> One possibility is to simply strip out invalid characters[1].

No, because then you would get collisions, ie function names that look
different to PG would look the same to python.  Bad news.

(Actually, don't we have that issue anyway because of schemas?  I wonder
why we are exposing the PG name of the function to python at all.)

            regards, tom lane

Re: BUG #13960: plpython fails with certain function names

От
Jim Nasby
Дата:
On 2/14/16 6:49 PM, Tom Lane wrote:
> Jim.Nasby@BlueTreble.com writes:
>> If a Postgres function contains characters that are illegal for python
>> identifiers, compilation fails. Error message is not very helpful either:
>
> Hm, how much do we really care?  The example seems kinda artificial.

I did run across it creating cast functions programmatically, roughly as

format( 'CREATE FUNCTION "ndarray_to_%1s"', input_type::regtype )

which blows up when you feed it something like float. The actual example
was doing something similar but for an array.

I agree it's not exactly a high priority thing to support, but the error
text is completely useless unless you happen to know that plpython is
creating an actual python function.

>> One possibility is to simply strip out invalid characters[1].
>
> No, because then you would get collisions, ie function names that look
> different to PG would look the same to python.  Bad news.

IIRC we append the regproc OID to ensure it's unique.

> (Actually, don't we have that issue anyway because of schemas?  I wonder
> why we are exposing the PG name of the function to python at all.)

Not sure. Maybe useful in a call stack inside python? I can't really
think of what else you'd use it for. (I assume you're suggesting just
call the function names something like plpython_<regproc::oid>.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: BUG #13960: plpython fails with certain function names

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> On 2/14/16 6:49 PM, Tom Lane wrote:
>> No, because then you would get collisions, ie function names that look
>> different to PG would look the same to python.  Bad news.

> IIRC we append the regproc OID to ensure it's unique.

Ah, okay, problem solved.

>> (Actually, don't we have that issue anyway because of schemas?  I wonder
>> why we are exposing the PG name of the function to python at all.)

> Not sure. Maybe useful in a call stack inside python? I can't really
> think of what else you'd use it for. (I assume you're suggesting just
> call the function names something like plpython_<regproc::oid>.)

Yeah, that's what I was thinking about.  But yes, if we append the OID
anyway then we might as well just strip all non-alphanumeric chars
from the name.  Safe and you still get some debuggability.

            regards, tom lane

Re: BUG #13960: plpython fails with certain function names

От
Jim Nasby
Дата:
On 2/14/16 7:09 PM, Tom Lane wrote:
> Yeah, that's what I was thinking about.  But yes, if we append the OID
> anyway then we might as well just strip all non-alphanumeric chars
> from the name.  Safe and you still get some debuggability.

Attached. I opted not to modify the name in-place. If it's safe to
modify in place, might want to just replace invalid characters with '_'
instead of making a copy.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вложения

Re: BUG #13960: plpython fails with certain function names

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> On 2/14/16 7:09 PM, Tom Lane wrote:
>> Yeah, that's what I was thinking about.  But yes, if we append the OID
>> anyway then we might as well just strip all non-alphanumeric chars
>> from the name.  Safe and you still get some debuggability.

> Attached. I opted not to modify the name in-place. If it's safe to
> modify in place, might want to just replace invalid characters with '_'
> instead of making a copy.

I like the idea of replacing invalid characters with '_'.  It's definitely
not safe to scribble on the pg_proc tuple, but we could get the same
result with a few wasted cycles by rescanning the procName string after
building it, as per attached.

            regards, tom lane

diff --git a/src/pl/plpython/expected/plpython_test.out b/src/pl/plpython/expected/plpython_test.out
index 7b76faf..f8270a7 100644
*** a/src/pl/plpython/expected/plpython_test.out
--- b/src/pl/plpython/expected/plpython_test.out
*************** select stupidn();
*** 16,23 ****
   zarkon
  (1 row)

! -- test multiple arguments
! CREATE FUNCTION argument_test_one(u users, a1 text, a2 text) RETURNS text
      AS
  'keys = list(u.keys())
  keys.sort()
--- 16,23 ----
   zarkon
  (1 row)

! -- test multiple arguments and odd characters in function name
! CREATE FUNCTION "Argument test #1"(u users, a1 text, a2 text) RETURNS text
      AS
  'keys = list(u.keys())
  keys.sort()
*************** for key in keys:
*** 27,34 ****
  words = a1 + " " + a2 + " => {" + ", ".join(out) + "}"
  return words'
      LANGUAGE plpythonu;
! select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1;
!                            argument_test_one
  -----------------------------------------------------------------------
   jane doe => {fname: jane, lname: doe, userid: 1, username: j_doe}
   john doe => {fname: john, lname: doe, userid: 2, username: johnd}
--- 27,34 ----
  words = a1 + " " + a2 + " => {" + ", ".join(out) + "}"
  return words'
      LANGUAGE plpythonu;
! select "Argument test #1"(users, fname, lname) from users where lname = 'doe' order by 1;
!                            Argument test #1
  -----------------------------------------------------------------------
   jane doe => {fname: jane, lname: doe, userid: 1, username: j_doe}
   john doe => {fname: john, lname: doe, userid: 2, username: johnd}
diff --git a/src/pl/plpython/plpy_procedure.c b/src/pl/plpython/plpy_procedure.c
index e1f5620..a0d0792 100644
*** a/src/pl/plpython/plpy_procedure.c
--- b/src/pl/plpython/plpy_procedure.c
*************** PLy_procedure_create(HeapTuple procTup,
*** 146,151 ****
--- 146,152 ----
      MemoryContext cxt;
      MemoryContext oldcxt;
      int            rv;
+     char       *ptr;

      procStruct = (Form_pg_proc) GETSTRUCT(procTup);
      rv = snprintf(procName, sizeof(procName),
*************** PLy_procedure_create(HeapTuple procTup,
*** 155,160 ****
--- 156,170 ----
      if (rv >= sizeof(procName) || rv < 0)
          elog(ERROR, "procedure name would overrun buffer");

+     /* Replace any not-legal-in-Python-names characters with '_' */
+     for (ptr = procName; *ptr; ptr++)
+     {
+         if (!((*ptr >= 'A' && *ptr <= 'Z') ||
+               (*ptr >= 'a' && *ptr <= 'z') ||
+               (*ptr >= '0' && *ptr <= '9')))
+             *ptr = '_';
+     }
+
      cxt = AllocSetContextCreate(TopMemoryContext,
                                  procName,
                                  ALLOCSET_DEFAULT_MINSIZE,
diff --git a/src/pl/plpython/sql/plpython_test.sql b/src/pl/plpython/sql/plpython_test.sql
index c8d5ef5..3a76104 100644
*** a/src/pl/plpython/sql/plpython_test.sql
--- b/src/pl/plpython/sql/plpython_test.sql
*************** CREATE FUNCTION stupidn() RETURNS text A
*** 11,18 ****

  select stupidn();

! -- test multiple arguments
! CREATE FUNCTION argument_test_one(u users, a1 text, a2 text) RETURNS text
      AS
  'keys = list(u.keys())
  keys.sort()
--- 11,18 ----

  select stupidn();

! -- test multiple arguments and odd characters in function name
! CREATE FUNCTION "Argument test #1"(u users, a1 text, a2 text) RETURNS text
      AS
  'keys = list(u.keys())
  keys.sort()
*************** words = a1 + " " + a2 + " => {" + ", ".j
*** 23,29 ****
  return words'
      LANGUAGE plpythonu;

! select argument_test_one(users, fname, lname) from users where lname = 'doe' order by 1;


  -- check module contents
--- 23,29 ----
  return words'
      LANGUAGE plpythonu;

! select "Argument test #1"(users, fname, lname) from users where lname = 'doe' order by 1;


  -- check module contents

Re: BUG #13960: plpython fails with certain function names

От
Jim Nasby
Дата:
On 2/16/16 7:49 PM, Tom Lane wrote:
> I like the idea of replacing invalid characters with '_'.  It's definitely
> not safe to scribble on the pg_proc tuple, but we could get the same
> result with a few wasted cycles by rescanning the procName string after
> building it, as per attached.

Heck, I didn't even think about that. Yeah, it's going to scan another
20 bytes or so, but this certainly isn't performance critical.

BTW, I didn't bother checking this with python 3.5, but I can't fathom
how that would matter here.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: BUG #13960: plpython fails with certain function names

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> BTW, I didn't bother checking this with python 3.5, but I can't fathom
> how that would matter here.

Me either, but the buildfarm will soon tell us if it does.

            regards, tom lane

Re: BUG #13960: plpython fails with certain function names

От
Jim Nasby
Дата:
On 2/17/16 8:36 PM, Pedro Gimeno wrote:
> Tom Lane wrote, On 2016-02-17 02:49:
>
>> +     /* Replace any not-legal-in-Python-names characters with '_' */
>> +     for (ptr = procName; *ptr; ptr++)
>> +     {
>> +         if (!((*ptr >= 'A' && *ptr <= 'Z') ||
>> +               (*ptr >= 'a' && *ptr <= 'z') ||
>> +               (*ptr >= '0' && *ptr <= '9')))
>> +             *ptr = '_';
>> +     }
>> +
>
> That may blow on names that start with a digit. Maybe
>
> +               (*ptr >= '0' && *ptr <= '9' && ptr != procName)))
>
> Making the testcase start with a digit sounds like a good idea.

Postgres prefaces the name of the python function with a fixed string,
so it will never start with a digit.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: BUG #13960: plpython fails with certain function names

От
Pedro Gimeno
Дата:
Tom Lane wrote, On 2016-02-17 02:49:

> +     /* Replace any not-legal-in-Python-names characters with '_' */
> +     for (ptr = procName; *ptr; ptr++)
> +     {
> +         if (!((*ptr >= 'A' && *ptr <= 'Z') ||
> +               (*ptr >= 'a' && *ptr <= 'z') ||
> +               (*ptr >= '0' && *ptr <= '9')))
> +             *ptr = '_';
> +     }
> +

That may blow on names that start with a digit. Maybe

+               (*ptr >= '0' && *ptr <= '9' && ptr != procName)))

Making the testcase start with a digit sounds like a good idea.