Обсуждение: force_not_null option support for file_fdw

Поиск
Список
Период
Сортировка

force_not_null option support for file_fdw

От
Shigeru Hanada
Дата:
Hi,

I propose to support force_not_null option for file_fdw too.

In 9.1 development cycle, file_fdw had been implemented with exported
COPY FROM routines, but only force_not_null option has not been 
supported yet.

Originally, in COPY FROM, force_not_null is specified as a list of
column which is not matched against null-string.  Now per-column FDW
option is available, so I implemented force_not_null options as boolean
value for each column to avoid parsing column list in file_fdw.

True means that the column is not matched against null-string, and it's
equivalent to specify the column's name in force_not_null option of COPY
FROM.  Default value is false.

The patch includes changes for code, document and regression tests.  Any
comments/questions are welcome.

Regards,
--
Shigeru Hanada

Вложения

Re: force_not_null option support for file_fdw

От
Kohei KaiGai
Дата:
I tried to review this patch.

It seems to me its implementation is reasonable and enough simple.
All the works of this patch is pick-up "force_not_null" option from
pg_attribute.attfdwoptions and transform its data structure into suitable
form to the existing BeginCopyFrom().
So, I'd almost like to mark this patch "Ready for Committer".

Here are only two points I'd like to comment on.

+       tuple = SearchSysCache2(ATTNUM,
+                               RelationGetRelid(rel),
+                               Int16GetDatum(attnum));
+       if (!HeapTupleIsValid(tuple))
+           ereport(ERROR,
+                   (errcode(ERRCODE_UNDEFINED_OBJECT),
+                    errmsg("cache lookup failed for attribute %d of
relation %u",
+                           attnum, RelationGetRelid(rel))));

The tuple should be always found unless we have any bugs that makes
mismatch between pg_class.relnatts and actual number of attributes.
So, it is a case to use elog(), instead of ereport() with error code.


One other point is diffset of regression test, when I run `make check
-C contrib/file_fdw'.
Do we have something changed corresponding to COPY TO/FROM statement
since 8th-August to now?

*** /home/kaigai/repo/sepgsql/contrib/file_fdw/expected/file_fdw.out2011-09-04 20:36:23.670981921 +0200
--- /home/kaigai/repo/sepgsql/contrib/file_fdw/results/file_fdw.out2011-09-04 20:36:51.202989681 +0200
***************
*** 118,126 ****  word1 | word2 -------+-------  123   | 123  ABC   | ABC  NULL  |
-  abc   | abc (4 rows)
 -- basic query tests
--- 118,126 ----  word1 | word2 -------+-------  123   | 123
+  abc   | abc  ABC   | ABC  NULL  | (4 rows)
 -- basic query tests

======================================================================

Thanks,

2011年8月8日9:14 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Hi,
>
> I propose to support force_not_null option for file_fdw too.
>
> In 9.1 development cycle, file_fdw had been implemented with exported
> COPY FROM routines, but only force_not_null option has not been
> supported yet.
>
> Originally, in COPY FROM, force_not_null is specified as a list of
> column which is not matched against null-string.  Now per-column FDW
> option is available, so I implemented force_not_null options as boolean
> value for each column to avoid parsing column list in file_fdw.
>
> True means that the column is not matched against null-string, and it's
> equivalent to specify the column's name in force_not_null option of COPY
> FROM.  Default value is false.
>
> The patch includes changes for code, document and regression tests.  Any
> comments/questions are welcome.
>
> Regards,
> --
> Shigeru Hanada
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


Re: force_not_null option support for file_fdw

От
Shigeru Hanada
Дата:
Thanks for the review.

(2011/09/05 3:55), Kohei KaiGai wrote:
> I tried to review this patch.
>
> It seems to me its implementation is reasonable and enough simple.
> All the works of this patch is pick-up "force_not_null" option from
> pg_attribute.attfdwoptions and transform its data structure into suitable
> form to the existing BeginCopyFrom().
> So, I'd almost like to mark this patch "Ready for Committer".
>
> Here are only two points I'd like to comment on.
>
> +       tuple = SearchSysCache2(ATTNUM,
> +                               RelationGetRelid(rel),
> +                               Int16GetDatum(attnum));
> +       if (!HeapTupleIsValid(tuple))
> +           ereport(ERROR,
> +                   (errcode(ERRCODE_UNDEFINED_OBJECT),
> +                    errmsg("cache lookup failed for attribute %d of
> relation %u",
> +                           attnum, RelationGetRelid(rel))));
>
> The tuple should be always found unless we have any bugs that makes
> mismatch between pg_class.relnatts and actual number of attributes.
> So, it is a case to use elog(), instead of ereport() with error code.

Oh, I've missed that other similar errors use elog()...
Fixed.

> One other point is diffset of regression test, when I run `make check
> -C contrib/file_fdw'.
> Do we have something changed corresponding to COPY TO/FROM statement
> since 8th-August to now?

I don't know about such change, and src/backend/command/copy.c has not
been touched since Feb 23.

> *** /home/kaigai/repo/sepgsql/contrib/file_fdw/expected/file_fdw.out
>   2011-09-04 20:36:23.670981921 +0200
> --- /home/kaigai/repo/sepgsql/contrib/file_fdw/results/file_fdw.out
>   2011-09-04 20:36:51.202989681 +0200
> ***************
> *** 118,126 ****
>     word1 | word2
>    -------+-------
>     123   | 123
>     ABC   | ABC
>     NULL  |
> -  abc   | abc
>    (4 rows)
>
>    -- basic query tests
> --- 118,126 ----
>     word1 | word2
>    -------+-------
>     123   | 123
> +  abc   | abc
>     ABC   | ABC
>     NULL  |
>    (4 rows)
>
>    -- basic query tests
>
> ======================================================================

In my usual environment that test passed, but finally I've reproduced
the failure with setting $LC_COLLATE to "es_ES.UTF-8".  Do you have set
any $LC_COLLATE in your test environment?

Regards,
--
Shigeru Hanada


Вложения

Re: force_not_null option support for file_fdw

От
Kohei Kaigai
Дата:
> In my usual environment that test passed, but finally I've reproduced the failure with setting
> $LC_COLLATE to "es_ES.UTF-8".  Do you have set any $LC_COLLATE in your test environment?
>
It is not set in my environment.

I checked the behavior of ORDER BY when we set a collation on the regular relation, not a foreign table.
Do we hit something other unexpected bug in collation here?

postgres=# CREATE TABLE t1 (word1 text);
CREATE TABLE
postgres=# INSERT INTO t1 VALUES ('ABC'),('abc'),('123'),('NULL');
INSERT 0 4
postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "ja_JP.utf8";
ALTER TABLE
postgres=# SELECT * FROM t1 ORDER BY word1;word1
-------123ABCNULLabc
(4 rows)

postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "en_US.utf8";
ALTER TABLE
postgres=# SELECT * FROM t1 ORDER BY word1;word1
-------123abcABCNULL
(4 rows)

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei <kohei.kaigai@emea.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> Shigeru Hanada
> Sent: 5. September 2011 06:56
> To: Kohei KaiGai
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] force_not_null option support for file_fdw
>
> Thanks for the review.
>
> (2011/09/05 3:55), Kohei KaiGai wrote:
> > I tried to review this patch.
> >
> > It seems to me its implementation is reasonable and enough simple.
> > All the works of this patch is pick-up "force_not_null" option from
> > pg_attribute.attfdwoptions and transform its data structure into
> > suitable form to the existing BeginCopyFrom().
> > So, I'd almost like to mark this patch "Ready for Committer".
> >
> > Here are only two points I'd like to comment on.
> >
> > +       tuple = SearchSysCache2(ATTNUM,
> > +                               RelationGetRelid(rel),
> > +                               Int16GetDatum(attnum));
> > +       if (!HeapTupleIsValid(tuple))
> > +           ereport(ERROR,
> > +                   (errcode(ERRCODE_UNDEFINED_OBJECT),
> > +                    errmsg("cache lookup failed for attribute %d of
> > relation %u",
> > +                           attnum, RelationGetRelid(rel))));
> >
> > The tuple should be always found unless we have any bugs that makes
> > mismatch between pg_class.relnatts and actual number of attributes.
> > So, it is a case to use elog(), instead of ereport() with error code.
>
> Oh, I've missed that other similar errors use elog()...
> Fixed.
>
> > One other point is diffset of regression test, when I run `make check
> > -C contrib/file_fdw'.
> > Do we have something changed corresponding to COPY TO/FROM statement
> > since 8th-August to now?
>
> I don't know about such change, and src/backend/command/copy.c has not been touched since Feb 23.
>
> > *** /home/kaigai/repo/sepgsql/contrib/file_fdw/expected/file_fdw.out
> >   2011-09-04 20:36:23.670981921 +0200
> > --- /home/kaigai/repo/sepgsql/contrib/file_fdw/results/file_fdw.out
> >   2011-09-04 20:36:51.202989681 +0200
> > ***************
> > *** 118,126 ****
> >     word1 | word2
> >    -------+-------
> >     123   | 123
> >     ABC   | ABC
> >     NULL  |
> > -  abc   | abc
> >    (4 rows)
> >
> >    -- basic query tests
> > --- 118,126 ----
> >     word1 | word2
> >    -------+-------
> >     123   | 123
> > +  abc   | abc
> >     ABC   | ABC
> >     NULL  |
> >    (4 rows)
> >
> >    -- basic query tests
> >
> > ======================================================================
>
> In my usual environment that test passed, but finally I've reproduced the failure with setting
> $LC_COLLATE to "es_ES.UTF-8".  Do you have set any $LC_COLLATE in your test environment?
>
> Regards,
> --
> Shigeru Hanada
>
>
>
>  Click
> https://www.mailcontrol.com/sr/yQEP2keV9uzTndxI!oX7UgZzT7dlvrTeW0pvcI7!FpP+qgioCQTZMxIe1v95Rjzlbr
> CRFdjEt0BTEf5tQBqpNg==  to report this email as spam.


Re: force_not_null option support for file_fdw

От
Shigeru Hanada
Дата:
(2011/09/05 22:05), Kohei Kaigai wrote:
>> In my usual environment that test passed, but finally I've reproduced the failure with setting
>> $LC_COLLATE to "es_ES.UTF-8".  Do you have set any $LC_COLLATE in your test environment?
>>
> It is not set in my environment.
>
> I checked the behavior of ORDER BY when we set a collation on the regular relation, not a foreign table.
> Do we hit something other unexpected bug in collation here?
>
> postgres=# CREATE TABLE t1 (word1 text);
> CREATE TABLE
> postgres=# INSERT INTO t1 VALUES ('ABC'),('abc'),('123'),('NULL');
> INSERT 0 4
> postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "ja_JP.utf8";
> ALTER TABLE
> postgres=# SELECT * FROM t1 ORDER BY word1;
>   word1
> -------
>   123
>   ABC
>   NULL
>   abc
> (4 rows)
>
> postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "en_US.utf8";
> ALTER TABLE
> postgres=# SELECT * FROM t1 ORDER BY word1;
>   word1
> -------
>   123
>   abc
>   ABC
>   NULL
> (4 rows)

Thanks for the checking.  FYI, I mainly use Fedora 15 box with Japanese
environment for my development.

ISTM that your results are reasonable for each collation setting.
Former ordering is same as C locale, and in latter case alphabetical
order has priority over case distinctions.  Do you mean that ordering
used in file_fdw is affected from something unexpected, without
collation or locale setting?

BTW, I found a thread which is related to this issue.
  http://archives.postgresql.org/pgsql-hackers/2011-09/msg00130.php

I changed the test data so that it uses only upper case alphabets,
because case distinction is not important for that test.  I also removed
digits to avoid test failure in some locales which sort alphabets before
digits.

Regards,
--
Shigeru Hanada

Вложения

Re: force_not_null option support for file_fdw

От
Kohei Kaigai
Дата:
Hi Hanada-san.

> ISTM that your results are reasonable for each collation setting.
> Former ordering is same as C locale, and in latter case alphabetical order has priority over case
> distinctions.  Do you mean that ordering used in file_fdw is affected from something unexpected, without
> collation or locale setting?
>
> BTW, I found a thread which is related to this issue.
>   http://archives.postgresql.org/pgsql-hackers/2011-09/msg00130.php
>
> I changed the test data so that it uses only upper case alphabets, because case distinction is not
> important for that test.  I also removed digits to avoid test failure in some locales which sort
> alphabets before digits.
>
OK, Thanks for this clarification. This change of regression test case seems to me reasonable
to avoid unnecessary false-positive.

I found one other point to be fixed:
On get_force_not_null(), it makes a list of attribute names with force_not_null option.

+       foreach (cell, options)
+       {
+           DefElem    *def = (DefElem *) lfirst(cell);
+           const char *value = defGetString(def);
+
+           if (strcmp(def->defname, "force_not_null") == 0 &&
+               strcmp(value, "true") == 0)
+           {
+               columns = lappend(columns, makeString(NameStr(attr->attname)));
+               elog(DEBUG1, "%s: force_not_null", NameStr(attr->attname));
+           }
+
+       }

makeString() does not copy the supplied string itself, so it is not preferable to reference
NameStr(attr->attname) across ReleaseSysCache().
I'd like to suggest to supply a copied attname using pstrdup for makeString

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei <kohei.kaigai@emea.nec.com>


> -----Original Message-----
> From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
> Sent: 8. September 2011 06:19
> To: Kohei Kaigai
> Cc: Kohei KaiGai; PostgreSQL-development
> Subject: Re: [HACKERS] force_not_null option support for file_fdw
>
> (2011/09/05 22:05), Kohei Kaigai wrote:
> >> In my usual environment that test passed, but finally I've reproduced
> >> the failure with setting $LC_COLLATE to "es_ES.UTF-8".  Do you have set any $LC_COLLATE in your
> test environment?
> >>
> > It is not set in my environment.
> >
> > I checked the behavior of ORDER BY when we set a collation on the regular relation, not a foreign
> table.
> > Do we hit something other unexpected bug in collation here?
> >
> > postgres=# CREATE TABLE t1 (word1 text); CREATE TABLE postgres=#
> > INSERT INTO t1 VALUES ('ABC'),('abc'),('123'),('NULL'); INSERT 0 4
> > postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "ja_JP.utf8";
> > ALTER TABLE postgres=# SELECT * FROM t1 ORDER BY word1;
> >   word1
> > -------
> >   123
> >   ABC
> >   NULL
> >   abc
> > (4 rows)
> >
> > postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE "en_US.utf8";
> > ALTER TABLE postgres=# SELECT * FROM t1 ORDER BY word1;
> >   word1
> > -------
> >   123
> >   abc
> >   ABC
> >   NULL
> > (4 rows)
>
> Thanks for the checking.  FYI, I mainly use Fedora 15 box with Japanese environment for my development.
>
> ISTM that your results are reasonable for each collation setting.
> Former ordering is same as C locale, and in latter case alphabetical order has priority over case
> distinctions.  Do you mean that ordering used in file_fdw is affected from something unexpected, without
> collation or locale setting?
>
> BTW, I found a thread which is related to this issue.
>   http://archives.postgresql.org/pgsql-hackers/2011-09/msg00130.php
>
> I changed the test data so that it uses only upper case alphabets, because case distinction is not
> important for that test.  I also removed digits to avoid test failure in some locales which sort
> alphabets before digits.
>
> Regards,
> --
> Shigeru Hanada
>
>
>  Click
> https://www.mailcontrol.com/sr/fB6Wmr8zmMzTndxI!oX7Uo9cpkuWnNqkEgc!P6cHvBhGb4EkL1te5Ky38yYzoE4Mra
> 3ljAyIpUlPbv5+FCDqIw==  to report this email as spam.


Re: force_not_null option support for file_fdw

От
Shigeru Hanada
Дата:
Thanks for the review, Kaigai-san.

(2011/09/09 0:47), Kohei Kaigai wrote:
> I found one other point to be fixed:
> On get_force_not_null(), it makes a list of attribute names with force_not_null option.
>
> +       foreach (cell, options)
> +       {
> +           DefElem    *def = (DefElem *) lfirst(cell);
> +           const char *value = defGetString(def);
> +
> +           if (strcmp(def->defname, "force_not_null") == 0&&
> +               strcmp(value, "true") == 0)
> +           {
> +               columns = lappend(columns, makeString(NameStr(attr->attname)));
> +               elog(DEBUG1, "%s: force_not_null", NameStr(attr->attname));
> +           }
> +
> +       }
>
> makeString() does not copy the supplied string itself, so it is not preferable to reference
> NameStr(attr->attname) across ReleaseSysCache().
> I'd like to suggest to supply a copied attname using pstrdup for makeString

Oops, fixed.
[ I should check some of my projects for this issue... ]

Attached patch also includes some cosmetic changes such as removing
useless blank lines.

Regards,
--
Shigeru Hanada

Вложения

Re: force_not_null option support for file_fdw

От
Kohei Kaigai
Дата:
I marked this patch as "Ready for Committer", and hope this patch being committed.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei <kohei.kaigai@emea.nec.com>


> -----Original Message-----
> From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
> Sent: 9. September 2011 06:03
> To: Kohei Kaigai
> Cc: Kohei KaiGai; PostgreSQL-development
> Subject: Re: [HACKERS] force_not_null option support for file_fdw
>
> Thanks for the review, Kaigai-san.
>
> (2011/09/09 0:47), Kohei Kaigai wrote:
> > I found one other point to be fixed:
> > On get_force_not_null(), it makes a list of attribute names with force_not_null option.
> >
> > +       foreach (cell, options)
> > +       {
> > +           DefElem    *def = (DefElem *) lfirst(cell);
> > +           const char *value = defGetString(def);
> > +
> > +           if (strcmp(def->defname, "force_not_null") == 0&&
> > +               strcmp(value, "true") == 0)
> > +           {
> > +               columns = lappend(columns, makeString(NameStr(attr->attname)));
> > +               elog(DEBUG1, "%s: force_not_null", NameStr(attr->attname));
> > +           }
> > +
> > +       }
> >
> > makeString() does not copy the supplied string itself, so it is not
> > preferable to reference
> > NameStr(attr->attname) across ReleaseSysCache().
> > I'd like to suggest to supply a copied attname using pstrdup for
> > makeString
>
> Oops, fixed.
> [ I should check some of my projects for this issue... ]
>
> Attached patch also includes some cosmetic changes such as removing useless blank lines.
>
> Regards,
> --
> Shigeru Hanada
>
>
>  Click
> https://www.mailcontrol.com/sr/GQT1p8GGIBPTndxI!oX7UiqFKmKbqX6Rgam71Xs0JKL1UdBaye8DbxIe1v95RjzltL
> 2w8BfevsSBchKRB0KEKw==  to report this email as spam.


Re: force_not_null option support for file_fdw

От
Tom Lane
Дата:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> (2011/09/09 0:47), Kohei Kaigai wrote:
>> makeString() does not copy the supplied string itself, so it is not preferable to reference
>> NameStr(attr->attname) across ReleaseSysCache().

> Oops, fixed.
> [ I should check some of my projects for this issue... ]

I've committed this with some mostly-cosmetic revisions, notably

* use defGetBoolean, since this ought to be a plain boolean option
rather than having its own private idea of which spellings are accepted.

* get rid of the ORDER BY altogether in the regression test case ---
it seems a lot safer to assume that COPY will read the data in the
presented order than that text will be sorted in any particular way.
        regards, tom lane