Hi,
On Thu, Jul 18, 2024 at 6:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-07-05 00:27, torikoshia wrote:
> > Hi,
> >
> > With the current file_fdw, if even one line of data conversion fails,
> > the contents of the file cannot be referenced at all:
> >
> > =# \! cat data/test.data
> > 1,a
> > 2,b
> > a,c
> > =# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_1;
> > ERROR: invalid input syntax for type integer: "a"
> > CONTEXT: COPY f_fdw_test, line 3, column i: "a"
> >
> > Since we'll support ON_ERROR option which tolerates data conversion
> > errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
> > supporting them on file_fdw?
+1
> >
> > This idea comes from Fujii-san[2], and I think it'd be useful when
> > reading a bit dirty data.
> >
> > Attached PoC patch works like below:
> >
> > =# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_2;
> > NOTICE: 1 row was skipped due to data type incompatibility
> > i | t
> > ---+---
> > 1 | a
> > 2 | b
> > (2 rows)
I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent. An alternative idea of place to write the
information of the number of malformed rows would be the EXPLAIN
command as follow:
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on public.test (cost=0.00..1.10 rows=1 width=12)
Output: a, b, c
Foreign File: test.csv
Foreign File Size: 12 b
Skipped Rows: 10
> >
> >
> > =# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore',
> > log_verbosity 'verbose');
> > CREATE FOREIGN TABLE
> >
> > =# table f_fdw_test_3 ;
> > NOTICE: skipping row due to data type incompatibility at line 3 for
> > column i: "a"
> > NOTICE: 1 row was skipped due to data type incompatibility
> > i | t
> > ---+---
> > 1 | a
> > 2 | b
> > (2 rows)
IIUC we have to execute ALTER FOREIGN TABLE to change the
log_verbosity value and which requires to be the owner. Which seems
not to be user-friendly.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com