Обсуждение: [Proposal] Adding TRIM_SPACE option to COPY
Hi,
I'd like to propose adding a new option, TRIM_SPACE, to the COPY command.
Other data warehouse systems such as Snowflake provide similar functionality
(TRIM_SPACE) to improve robustness when loading CSV data. PostgreSQL does not currently have
such an option, although it would be consistent with other user-friendly
features already present in COPY (e.g., FORCE_NULL, FORCE_NOT_NULL, ON_ERROR).
Proposed feature
----------------
Add a boolean option:
TRIM_SPACE = true | false
(default: false)
When enabled, COPY FROM with FORMAT text or csv will trim leading and trailing
ASCII whitespace from each column value before NULL processing and type
conversion. This is applied only to FORMAT text and csv.(not binary)
Example usage:
COPY mytable
FROM '/tmp/data.csv'
WITH (FORMAT csv, HEADER true, TRIM_SPACE true);
This would transform(trim leading and trailing ASCII whitespace):
" AAA ", " BBB", "CCC "
into:
'AAA', 'BBB', 'CCC'
----------------
Thanks in advance for your comments.
Best regards,
Tatsuya Kawata
I'd like to propose adding a new option, TRIM_SPACE, to the COPY command.
Other data warehouse systems such as Snowflake provide similar functionality
(TRIM_SPACE) to improve robustness when loading CSV data. PostgreSQL does not currently have
such an option, although it would be consistent with other user-friendly
features already present in COPY (e.g., FORCE_NULL, FORCE_NOT_NULL, ON_ERROR).
Proposed feature
----------------
Add a boolean option:
TRIM_SPACE = true | false
(default: false)
When enabled, COPY FROM with FORMAT text or csv will trim leading and trailing
ASCII whitespace from each column value before NULL processing and type
conversion. This is applied only to FORMAT text and csv.(not binary)
Example usage:
COPY mytable
FROM '/tmp/data.csv'
WITH (FORMAT csv, HEADER true, TRIM_SPACE true);
This would transform(trim leading and trailing ASCII whitespace):
" AAA ", " BBB", "CCC "
into:
'AAA', 'BBB', 'CCC'
----------------
Thanks in advance for your comments.
Best regards,
Tatsuya Kawata
Вложения
HI,
This change seems very useful, I participated in a related project before. After reviewing the patch,
I have a suggestion, would it be better to restrict the change to only "copy from"?
daidewei1970@163.com
From: 河田達也Date: 2025-11-25 00:17To: pgsql-hackersSubject: [Proposal] Adding TRIM_SPACE option to COPYHi,
I'd like to propose adding a new option, TRIM_SPACE, to the COPY command.
Other data warehouse systems such as Snowflake provide similar functionality
(TRIM_SPACE) to improve robustness when loading CSV data. PostgreSQL does not currently have
such an option, although it would be consistent with other user-friendly
features already present in COPY (e.g., FORCE_NULL, FORCE_NOT_NULL, ON_ERROR).
Proposed feature
----------------
Add a boolean option:
TRIM_SPACE = true | false
(default: false)
When enabled, COPY FROM with FORMAT text or csv will trim leading and trailing
ASCII whitespace from each column value before NULL processing and type
conversion. This is applied only to FORMAT text and csv.(not binary)
Example usage:
COPY mytable
FROM '/tmp/data.csv'
WITH (FORMAT csv, HEADER true, TRIM_SPACE true);
This would transform(trim leading and trailing ASCII whitespace):
" AAA ", " BBB", "CCC "
into:
'AAA', 'BBB', 'CCC'
----------------
Thanks in advance for your comments.
Best regards,
Tatsuya Kawata
On Tue, Nov 25, 2025 at 1:18 AM 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> I'd like to propose adding a new option, TRIM_SPACE, to the COPY command.
>
> Other data warehouse systems such as Snowflake provide similar functionality
> (TRIM_SPACE) to improve robustness when loading CSV data. PostgreSQL does not currently have
> such an option, although it would be consistent with other user-friendly
> features already present in COPY (e.g., FORCE_NULL, FORCE_NOT_NULL, ON_ERROR).
>
> Proposed feature
> ----------------
> Add a boolean option:
>
> TRIM_SPACE = true | false
> (default: false)
>
> When enabled, COPY FROM with FORMAT text or csv will trim leading and trailing
> ASCII whitespace from each column value before NULL processing and type
> conversion. This is applied only to FORMAT text and csv.(not binary)
>
> Example usage:
>
> COPY mytable
> FROM '/tmp/data.csv'
> WITH (FORMAT csv, HEADER true, TRIM_SPACE true);
>
> This would transform(trim leading and trailing ASCII whitespace):
> " AAA ", " BBB", "CCC "
>
> into:
> 'AAA', 'BBB', 'CCC'
> ----------------
>
> Thanks in advance for your comments.
I like this idea in general.
Since the docs already notes that leading or trailing spaces in CSV values
can cause errors during loading, at [1], we'd likely need to update
that section as well if we proceed with this feature.
One question: should TRIM_SPACE remove only the literal space character (' '),
or should it also trim other whitespace characters (e.g., tab, newline,
those recognized by isspace())? I'm not sure how other databases define
this behavior, so it would be good to clarify the intended scope.
Regards,
[1] https://www.postgresql.org/docs/devel/sql-copy.html
--
Fujii Masao
Fujii Masao <masao.fujii@gmail.com> writes:
> On Tue, Nov 25, 2025 at 1:18 AM 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>> I'd like to propose adding a new option, TRIM_SPACE, to the COPY command.
> I like this idea in general.
I'm kind of down on it, because it's inevitably going to add
processing overhead to every COPY operation whether the feature
is used or not. I don't find it likely to be sufficiently
useful to justify that universal cost.
COPY is not a general-purpose filter or ETL tool, and we try
to make it one at our peril.
regards, tom lane
Hi all,
Thank you very much for the helpful feedback so far.
I would like to take a moment to discuss and confirm the overall direction
before preparing the next revision (v2).
My goal is to ensure we are aligned on the intended behavior, scope, and
design considerations for the proposed TRIM_SPACE option.
To summarize the discussion and the points I plan to address in the next action:
Thank you very much for the helpful feedback so far.
I would like to take a moment to discuss and confirm the overall direction
before preparing the next revision (v2).
My goal is to ensure we are aligned on the intended behavior, scope, and
design considerations for the proposed TRIM_SPACE option.
To summarize the discussion and the points I plan to address in the next action:
> I have a suggestion, would it be better to restrict the change to only "copy from"?
1. Scope (COPY FROM only)
I agree that the option should apply only to COPY FROM.
I will adjust the patch accordingly.
1. Scope (COPY FROM only)
I agree that the option should apply only to COPY FROM.
I will adjust the patch accordingly.
> Since the docs already notes that leading or trailing spaces in CSV values
> can cause errors during loading, at [1], we'd likely need to update
> that section as well if we proceed with this feature.
2. Documentation
I will update the CSV section mentioned earlier so that it reflects
the behavior when TRIM_SPACE is enabled.
> can cause errors during loading, at [1], we'd likely need to update
> that section as well if we proceed with this feature.
2. Documentation
I will update the CSV section mentioned earlier so that it reflects
the behavior when TRIM_SPACE is enabled.
> One question: should TRIM_SPACE remove only the literal space character (' '),
> or should it also trim other whitespace characters (e.g., tab, newline,
> those recognized by isspace())?
> or should it also trim other whitespace characters (e.g., tab, newline,
> those recognized by isspace())?
3. Characters to trim
I plan to trim only the ASCII space character (' ') to keep the behavior simple and avoid ambiguity.
Support for additional whitespace characters could be considered later if there is consensus.
> I'm kind of down on it, because it's inevitably going to add
> processing overhead to every COPY operation whether the feature
> is used or not. I don't find it likely to be sufficiently
> useful to justify that universal cost.
4. Performance / overhead concerns
Thank you for raising this point.
I fully agree that the feature must not introduce overhead when TRIM_SPACE is disabled.
The trimming logic will be executed only when the option is explicitly enabled,
so there will be just a single additional conditional check.
While this does technically add some processing overhead, it is expected to be slight in practice for typical CSV loads and unlikely to be a concern.
> COPY is not a general-purpose filter or ETL tool, and we try
> to make it one at our peril.
My intention is not to expand COPY into an ETL tool, but rather to
provide a small convenience option similar to FORCE_NULL or ON_ERROR,
to help users avoid common issues caused by unintended leading or
trailing spaces in CSV/text files.
I look forward to your thoughts and any additional feedback before preparing the next patch.
Thank you again for all the valuable comments.
Best regards,
Tatsuya Kawata
I plan to trim only the ASCII space character (' ') to keep the behavior simple and avoid ambiguity.
Support for additional whitespace characters could be considered later if there is consensus.
> I'm kind of down on it, because it's inevitably going to add
> processing overhead to every COPY operation whether the feature
> is used or not. I don't find it likely to be sufficiently
> useful to justify that universal cost.
4. Performance / overhead concerns
Thank you for raising this point.
I fully agree that the feature must not introduce overhead when TRIM_SPACE is disabled.
The trimming logic will be executed only when the option is explicitly enabled,
so there will be just a single additional conditional check.
While this does technically add some processing overhead, it is expected to be slight in practice for typical CSV loads and unlikely to be a concern.
> COPY is not a general-purpose filter or ETL tool, and we try
> to make it one at our peril.
My intention is not to expand COPY into an ETL tool, but rather to
provide a small convenience option similar to FORCE_NULL or ON_ERROR,
to help users avoid common issues caused by unintended leading or
trailing spaces in CSV/text files.
I look forward to your thoughts and any additional feedback before preparing the next patch.
Thank you again for all the valuable comments.
Best regards,
Tatsuya Kawata