Re: pl/perl and utf-8 in sql_ascii databases

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: pl/perl and utf-8 in sql_ascii databases
Дата
Msg-id 20120703.175938.213139907.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: pl/perl and utf-8 in sql_ascii databases  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: pl/perl and utf-8 in sql_ascii databases  (Alex Hunsaker <badalex@gmail.com>)
Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-hackers
Hello, Here is regression test runs on pg's also built with
cygwin-gcc and VC++.

The patches attached following,

- plperl_sql_ascii-4.patch         : fix for pl/perl utf8 vs sql_ascii
- plperl_sql_ascii_regress-1.patch : regression test for this patch.                                    I added some
testson encoding to this.
 

I will mark this patch as 'ready for committer' after this.

For the continuity of the behavior for sql_ascii and the chars
like \x80, It might be better if the main patch is back ported
into 9.1 and 9.2. New regression tests seems to have less
necessity to do since it has not been there from the first..


This regression test runs for all of Build with gcc3 / Linux(CentOS6.2-64) Built with Cygwin gcc3 / Windows7-64 Built
withVC++2008 / ActivePerl5.12 / Windows7-64
 

==========

I've been stuck in mud trying to plperl work on windows
environment. I saw many messages complaining that plperl wouldn't
be built to work. For the convenience of those and myself, I
describe the process of building postgresql with plperl on
Windows with cygwin and VC++ I've done below.

> Ok. Since there found to be only two patterns in the regression
> test. The fancy thing is no more needed. I will unfold them and
> make sure to work on mingw build environment.
> 
> And for one more environment, on the one with VC++.. I'll need a
> bit longer time to make out what vcregress.pl does.

I could understand what you meant after I managed to build plperl
to run properly. vcregress.pl reads $REGRESS in GNUmakefile so
variable substitution of make does not work on Windows'
regression. I resolved this problem by copying plperl_lc_*.out
files into plperl_lc.out before it runs pg_regress.

> - The main patch fixes the sql-ascii handling itself shoud ported
>   into 9.2 and 9.1. Someone shoud work for this. (me?)

done.

> - The remainder of the patch whic fixes the easy fixable leakes
>   of palloc'ed memory won't be ported into 9.1. This is only for
>   9.3dev.
> 
> - The patch for 9.3dev will be provided with the new regression
>   test. It will be easily ported into 9.1 and 9.2 and there seems
>   to be no problem technically, but a bit unsure from the other
>   points of view...

What should I do for this?

regards,

========
Addition - Building Windows binary for plperl

NOTE: This is ONE example I tried and turned out a success.

A. Cygwin
Versions: Windows 7 64bit          Cygwin 1.7.15          gcc 3.4.4 (cygwin-server is running)
1. Build perl aside system-installed one   perl-5.16.0$ export PATH=/usr/local/bin:/bin:/usr/bin:/usr/sbin
perl-5.16.0$./Configure --   perl-5.16.0$ ./Configure -d   perl-5.16.0$ make   perl-5.16.0$ make install
 
 - The first line needed to avoid the Makefile of perl stops by   parens in search path.
2. Build postgresql with --with-perl   pg93dev$ ./configure --with-perl   pg93dev$ make all
3. Run the regression test for plperl   pg93dev/src/pl/plperl$ make check   ....   pg93dev/src/pl/plperl$ make check
ENCODING=sql-ascii  ....
 

B. VC++   
Versions: Windows 7 64bit          Microsoft Visual C++ 2008 Express Edition          Active Perl v6.12.4 x86
1. Install Active Perl normally.  Assuming the install location   is "c:\Perl" and I did all operation on cmd.exe after
this.
2. Create config.pl in src/tools/msvc   pg> cd src\tools\msvc   msvc> copy config_default.pl config.pl   .... Edit as
follows
   -   perl=>undef,             # --with-perl   +   perl=>'c:\Perl',             # --with-perl
3. Build it
   msvc> C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\vcvars32.bat   msvc> build   msvc> install c:\pgsql
4. Run the regression tests
   4.1 Create test database and run postgres for UTF8 test
   A> set PATH=c:\pgsql\bin;c:\pgsql\lib;%PATH%   A> initdb -D <pgdata dir> --no-locale --encoding=utf8   A> postgres
-D<pgdata dir>
 
   ... on another cmd.exe
   B> set PATH=c:\pgsql\bin;c:\pgsql\lib;%PATH%   pg> cd src\tools\msvc   msvc> vcregress plcheck
   4.2 Run regression test for SQL-ASCII.   A> (delete <pgdata dir> and its contents.)   A> initdb -D <pgdata dir>
--no-locale--encoding=sql_ascii   A> postgres -D <pgdata dir>   ... same as 4.1 here after...
 


-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..4b4b680 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -67,8 +67,11 @@ static text *sv2text(SV *sv){    char       *str = sv2cstr(sv);
+    text       *text;
-    return cstring_to_text(str);
+    text = cstring_to_text(str);
+    pfree(str);
+    return text;}MODULE = PostgreSQL::InServer::Util PREFIX = util_
@@ -113,10 +116,12 @@ util_quote_literal(sv)    }    else {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -132,10 +137,12 @@ util_quote_nullable(sv)    else    {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -145,14 +152,15 @@ util_quote_ident(sv)    SV *sv    PREINIT:        text *arg;
-        text *ret;
-        char *str;
+        text *quoted;
+        char *ret;    CODE:        arg = sv2text(sv);
-        ret = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
-        str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        quoted = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    OUTPUT:    RETVAL
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1b6648b..ed99194 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -3,21 +3,29 @@/* * convert from utf8 to database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *
-utf_u2e(const char *utf8_str, size_t len)
+utf_u2e(char *utf8_str, size_t len){    int            enc = GetDatabaseEncoding();
-
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, enc);
+    char       *ret;    /*
-     * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
-     * will not do any conversion or verification. we need to do it manually
-     * instead.
+     * When we are in a PG_UTF8 or SQL_ASCII database
+     * pg_do_encoding_conversion() will not do any conversion (which is good)
+     * or verification (not so much), so we need to run the verification step
+     * separately.     */    if (enc == PG_UTF8 || enc == PG_SQL_ASCII)
-        pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+    {
+        pg_verify_mbstr_len(enc, utf8_str, len, false);
+        ret = utf8_str;
+    }
+    else
+        ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str,
+                                                 len, PG_UTF8, enc);    if (ret == utf8_str)        ret =
pstrdup(ret);
@@ -27,11 +35,15 @@ utf_u2e(const char *utf8_str, size_t len)/* * convert from database encoding to utf8
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *utf_e2u(const char *str){
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str), GetDatabaseEncoding(),
PG_UTF8);
+    char       *ret =
+        (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str),
+                                           GetDatabaseEncoding(), PG_UTF8);    if (ret == str)        ret =
pstrdup(ret);
@@ -41,6 +53,8 @@ utf_e2u(const char *str)/* * Convert an SV to a char * in the current database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *sv2cstr(SV *sv)
@@ -51,7 +65,9 @@ sv2cstr(SV *sv)    /*     * get a utf8 encoded char * out of perl. *note* it may not be valid utf8!
-     *
+     */
+
+    /*     * SvPVutf8() croaks nastily on certain things, like typeglobs and     * readonly objects such as $^V.
That'sa perl bug - it's not supposed to     * happen. To avoid crashing the backend, we make a copy of the sv before
 
@@ -63,18 +79,27 @@ sv2cstr(SV *sv)        (SvTYPE(sv) > SVt_PVLV && SvTYPE(sv) != SVt_PVFM))        sv = newSVsv(sv);
 else
 
-
+    {        /*         * increase the reference count so we can just SvREFCNT_dec() it when         * we are done
   */        SvREFCNT_inc_simple_void(sv);
 
+    }
-    val = SvPVutf8(sv, len);
+    /*
+     * Request the string from Perl, in UTF-8 encoding; but if we're in a
+     * SQL_ASCII database, just request the byte soup without trying to make it
+     * UTF8, because that might fail.
+     */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        val = SvPV(sv, len);
+    else
+        val = SvPVutf8(sv, len);    /*
-     * we use perl's length in the event we had an embedded null byte to
-     * ensure we error out properly
+     * Now convert to database encoding.  We use perl's length in the event we
+     * had an embedded null byte to ensure we error out properly.     */    res = utf_u2e(val, len);
@@ -88,16 +113,20 @@ sv2cstr(SV *sv) * Create a new SV from a string assumed to be in the current database's *
encoding.*/
 
-static inline SV *cstr2sv(const char *str){    SV           *sv;
-    char       *utf8_str = utf_e2u(str);
+    char       *utf8_str;
+
+    /* no conversion when SQL_ASCII */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        return newSVpv(str, 0);
+
+    utf8_str = utf_e2u(str);    sv = newSVpv(utf8_str, 0);    SvUTF8_on(sv);
-    pfree(utf8_str);    return sv;
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index 188d7d2..533b41d 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -44,7 +44,12 @@ PERLCHUNKS = plc_perlboot.pl plc_trusted.plSHLIB_LINK = $(perl_embed_ldflags)REGRESS_OPTS =
--dbname=$(PL_TESTDB)--load-extension=plperl  --load-extension=plperlu
 
-REGRESS = plperl plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array
+ifeq ($(shell echo $(ENCODING) | tr "A-Z-" "a-z_"),sql_ascii)
+REGRESS_LC_OUT_ORG=expected/plperl_lc_sql_ascii.out
+else
+REGRESS_LC_OUT_ORG=expected/plperl_lc_utf8.out
+endif
+REGRESS = plperl plperl_lc plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array# if
Perlcan support two interpreters in one backend,# test plperl-and-plperlu casesifneq ($(PERL),)
 
@@ -98,6 +103,8 @@ uninstall-data:check: all submake
+    @echo Setting up per-locale regression
+    cp $(REGRESS_LC_OUT_ORG) expected/plperl_lc.out    $(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)installcheck:
submake
diff --git a/src/pl/plperl/expected/plperl.out b/src/pl/plperl/expected/plperl.out
index df54937..906dc15 100644
--- a/src/pl/plperl/expected/plperl.out
+++ b/src/pl/plperl/expected/plperl.out
@@ -650,16 +650,6 @@ CONTEXT:  PL/Perl anonymous code blockDO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort
@y;1; $do$ LANGUAGE plperl;ERROR:  Useless use of sort in scalar context at line 1.CONTEXT:  PL/Perl anonymous code
block
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
-ERROR:  invalid byte sequence for encoding "UTF8": 0x00
-CONTEXT:  PL/Perl function "perl_zerob"-- make sure functions marked as VOID without an explicit return workCREATE OR
REPLACEFUNCTION myfuncs() RETURNS void AS $$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/expected/plperl_lc_sql_ascii.out b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
new file mode 100644
index 0000000..c454c44
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
@@ -0,0 +1,41 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+ perl_0x80_in 
+--------------
+ t
+(1 row)
+
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+ ?column? 
+----------
+ t
+(1 row)
+
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
+          encode          
+--------------------------
+ NotUTF8:ab\345\267\235cd
+(1 row)
+
diff --git a/src/pl/plperl/expected/plperl_lc_utf8.out b/src/pl/plperl/expected/plperl_lc_utf8.out
new file mode 100644
index 0000000..8557b46
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc_utf8.out
@@ -0,0 +1,33 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "UTF8": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+ERROR:  invalid byte sequence for encoding "UTF8": 0x80
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+ERROR:  invalid byte sequence for encoding "UTF8": 0x80
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
+        encode         
+-----------------------
+ UTF8:ab\345\267\235cd
+(1 row)
+
diff --git a/src/pl/plperl/sql/plperl.sql b/src/pl/plperl/sql/plperl.sql
index 84af1fd..a5e3840 100644
--- a/src/pl/plperl/sql/plperl.sql
+++ b/src/pl/plperl/sql/plperl.sql
@@ -423,15 +423,6 @@ DO $do$ use strict; my $name = "foo"; my $ref = $$name; $do$ LANGUAGE plperl;-- yields "ERROR:
Uselessuse of sort in scalar context."DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE
plperl;
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
--- make sure functions marked as VOID without an explicit return workCREATE OR REPLACE FUNCTION myfuncs() RETURNS void
AS$$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/sql/plperl_lc.sql b/src/pl/plperl/sql/plperl_lc.sql
new file mode 100644
index 0000000..fd75bc0
--- /dev/null
+++ b/src/pl/plperl/sql/plperl_lc.sql
@@ -0,0 +1,24 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index ef70350..58ca5b2 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -183,11 +183,19 @@ sub plcheck        }        print
"============================================================\n";       print "Checking $lang\n";
 
+        my @encoding =();
+        if ($lang eq 'plperl')
+        {
+            setupPlperlRegress("../../../$Config/psql/psql.exe");
+        }
+        my @args = (            "../../../$Config/pg_regress/pg_regress",
"--psqldir=../../../$Config/psql",
-            "--dbname=pl_regression",@lang_args,@tests
+            "--dbname=pl_regression",@encoding,@lang_args,@tests        );
+        print join('//', @args), '\n';
+        system(@args);        my $status = $? >> 8;        exit $status if $status;
@@ -197,6 +205,44 @@ sub plcheck    chdir "../../..";}
+sub setupPlperlRegress
+{
+    my $psql_cmd = shift;
+    my $encoding = `$psql_cmd -t -d postgres -c "select encoding from pg_database where datname='template0'"`;
+    my $status = $? >> 8;
+    my $expectfile = "expected/plperl_lc_utf8.out";
+    my @ret = ();
+    exit $status if $status;
+    
+    if ($encoding == 0)
+    {
+        $expectfile = "expected/plperl_lc_sql_ascii.out";
+        @ret = ("--encoding=SQL_ASCII");
+    }
+
+    print "Copy $expectfile as expected/plperl_lc.out\n";
+    simpleCopyFile($expectfile, 'expected/plperl_lc.out');
+
+    return @ret;
+}
+
+sub simpleCopyFile
+{
+    my ($from, $to) = @_;
+    my $inf;
+    my $outf;
+
+    # Copy expect file.
+    open($inf, "<$from") ||
+        die "Failed to open file \'$from\': $!";
+    open($outf, ">expected/plperl_lc.out") ||
+        die "Failed to open file \'$to\': $!";
+    print $outf $_ while(<$inf>);
+    close($inf);
+    close($outf);
+    
+}
+sub contribcheck{    chdir "../../../contrib";

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: xlog filename formatting functions in recovery
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Event Triggers reduced, v1