Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Дата
Msg-id c7c80c6d-4c02-4629-8a27-5c635d27ad10@yandex.ru
обсуждение исходный текст
Ответ на Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
Ответы Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)  (jian he <jian.universality@gmail.com>)
Список pgsql-hackers

Hi!

Thank you for your contribution to this thread.

On 04.12.2023 05:23, jian he wrote:
hi.
here is my implementation based on previous discussions

add a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}

overall logic is:
if  save_error is specified then   if error_holding table not exists then create one   if error_holding table exists set error_firsttime to false.
if  save_error is not specified then work as master branch.

if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%s

error holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".

error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);

the following field is not implemented.
FIELDS  text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)

because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;

I feel it's hard from textual '\D'  to get text[] `(11,test)` via SPI.
--------------------------------------
demo:

create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.

NOTICE:  3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select  * from copy_default_error_save_error; lineno |               line               |  field   |      source  |                         err_message                         |
err_detail | errorcode
--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------      1 | k       value   '2022-07-04'     | id       | k  | invalid input syntax for type integer: "k"                  |      | 22P02      2 | z       \D      '2022-07-03ASKL' | id       | z  | invalid input syntax for type integer: "z"                  |      | 22P02      2 | z       \D      '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" |            | 22007      3 | s       \D      \D               | id       | s  | invalid input syntax for type integer: "s"                  |      | 22P02
(4 rows)

The doc is not so good.

COPY FROM (save_error),  it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...

The main code is about constructing SPI query, and test and test output.
I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.

2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */

4. Maybe rewrite this comment these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.

5. Is this part of the comment needed? I think it duplicates the information below when we form the query.

 * . column list(order by attnum, begin from ctid) =
 *    {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
 * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}

I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [PoC] Federated Authn/z with OAUTHBEARER
Следующее
От: Nikita Malakhov
Дата:
Сообщение: Re: Avoid detoast overhead when possible