Re: Extending copy_expert

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Extending copy_expert
Дата
Msg-id 543BD762.4070009@aklaver.com
обсуждение исходный текст
Ответ на Extending copy_expert  (Andrea Riciputi <andrea.riciputi@gmail.com>)
Ответы Re: Extending copy_expert
Список psycopg
On 10/12/2014 02:28 PM, Andrea Riciputi wrote:
> Hi all,
> a couple of weeks ago at work we had to produce a quite big CSV data file which should be used as input by another
pieceof software. 
>
> Since the file must be produced on a daily basis, is big (let say half a TB), and it contains data stored in our PG
database,letting PG produce the file itself seemed the right approach. Thanks to psycopg the whole operation is
performedin C, resulting fast enough for our purpose. 
>
> However the target software for which the file is produced, is, let say, “legacy” software and can only accept CRLF
asEOL character. However by calling COPY TO STDOUT from psycopg ends up in a CSV file with LF as EOL forcing us to pass
thefile a second time to convert EOL, which is inconvenient. Plus, doing it in Python, make it a little bit to slow. 
>
> My first attempt was to ask the pgsql-hackers ML for extending the COPY TO syntax to allow a “FORCE_EOL” parameter,
butthey kindly rejected my proposal. They also suggested to me to use the result of PQgetCopyData() and convert there
theLF character with whatever is suitable for me. 
>
> So I studied the psycopg codebase and spotted out where and how to change it to allow such an use case. My intent was
toadd a new keyword argument to the copy_expert() method, let me call it “eol” with a default of “\n”. If the user
decidesto override it using a different EOL (i.e. “\r\n” or “\r”) every EOL returned by PQgetCopyData() in
_pq_copy_out_v3()can be converted. 
>
> However I’m a little bit concerned with this solution, and before going on with a pull request, I’d like to have your
feedbackhere. My main concern is that extending the copy_expert() method in psycopg leaves the user completely alone
aboutusing this new keyword argument in the right way. 
>
> We can easily allow only CR, LF, and CRLF as the values for that argument, but what if the user uses the “eol” kwarg
andfor example issues a “COPY TO … AS BINARY” query? In that case the resulting output file can end up being corrupted
withoutthe user can even notice that. Of course psycopg can parse the “COPY TO” query (by means of PG’s
ProcessCopyOptions())and check if the “eol” kwarg is consistent with the issued query. But, frankly this is seems to
becomea little bit too complex  to me. 
>
> So I’m asking to you, what’s your take on this, what do you think about that? Do you see any better way to get it
done?Anyone here also involved in pgsql-hackers ML can support my idea to extend the COPY TO syntax directly in PG? 
>
> Thanks for you help, and apologies for the long email.

Alright to follow up on my previous post about open. In Python 2 newline
is available in the io module, so a simple example:

f = io.open('io_newline.csv', 'w',  newline='\r\n')

cur = con.cursor()

cur.copy_expert("COPY cell_per TO STDOUT WITH CSV HEADER", f)

f.close()

aklaver@panda:~/software_projects> file io_newline.csv
io_newline.csv: ASCII text, with CRLF line terminators

> a.
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


В списке psycopg по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Extending copy_expert
Следующее
От: Andrea Riciputi
Дата:
Сообщение: Re: Extending copy_expert