Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
Дата
Msg-id 45428207-d977-4c15-94b3-d85dd2911e10@www.fastmail.com
обсуждение исходный текст
Ответ на Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, May 5, 2021, at 19:58, David G. Johnston wrote:
On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac.morland@gmail.com> wrote:
On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:
On 05/05/21 13:02, David G. Johnston wrote:
> Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
> what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character which marks the end of a data value, so the entire line is a single data value.


This.  When dealing with COPY it's expected that each line becomes its own row.  On the server you can do pg_read_file() if you need the entire file to be considered a single value.  psql (\I and variables) is a bit more hackey, but I'd rather see that improved directly anyway if the goal is to try and make getting the "whole document" easier - copy isn't the right API for that IMO.

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

SELECT COUNT(*) FROM regexp_split_to_table(repeat(E'\n',1000000000),E'\n');
ERROR:  invalid memory alloc request size 4000000004
Time: 4151.374 ms (00:04.151)

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: RFC: Detailed reorder buffer stats dumps
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: MaxOffsetNumber for Table AMs