Обсуждение: Extending COPY TO
Hi all, it’s my first time here, so please let me know if I’m doing something wrong. I’m a developer, heavy PG user, but I’ve neverhacked it before. Last week at work we had to produce a quite big CSV data file which should be used as input by anotherpiece of software. Since the file must be produced on a daily basis, is big, and it contains data stored in our PG database, letting PG producethe file itself seemed the right approach. Unfortunately the target software is, let say, “legacy” software and canonly accept CRLF as EOL character. Since our PG is installed on a Linux server COPY TO results in a CSV file with LF asEOL, forcing us to pass the file a second time to convert EOL, which is inconvenient. My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keep itsimple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviour whenno EOL option is given. I was also wondering if an EOL option could be useful also for the text output format or not. I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why. Thanks in advance, Andrea
On 09/23/2014 09:49 AM, Andrea Riciputi wrote: > My idea was to extend the COPY TO command to accept an EOL option as > it already does with the DELIMITER option. To keep it simple we can > limit the EOL choice to CR, LF or CRLF to avoid unusual output, and > also keep the current behaviour when no EOL option is given. I was > also wondering if an EOL option could be useful also for the text > output format or not. I don't think we want to go down that path. There are plenty of options in COPY already, and the more you add, the more complicated it gets. And we're never going to be able to satisfy everyone's needs. I'd suggest doing: COPY table TO PROGRAM 'unix2dos > /tmp/file' - Heikki
Andrea, * Andrea Riciputi (andrea.riciputi@gmail.com) wrote: > My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keepit simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviourwhen no EOL option is given. I was also wondering if an EOL option could be useful also for the text output formator not. Have you considered using COPY TO's 'PROGRAM' option to simply pipe the output through unix2dos..? > I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why. I'm not particularly against it, though if it can be solved with the existing 'PROGRAM' capability then it may not make sense to complicate the COPY code further. Thanks! Stephen
Hi all, thanks for all your answers, I see your point. And I also understand the argument according to which there always be someother use case to satisfy. However your suggestion to use COPY TO sql TO PROGRAM doesn’t seem to me to fit well the usecase I have in mind. Imagine you access PG from an application written in the language X using a driver library, both your application and yourPG instance run on two different hosts. Now using COPY TO sql PROGRAM the output file ends up on the PG host filesystem,while using COPY TO sql STDOUT and passing a file descriptor to PG via the driver library the output file endsup on the application hosts, which is much more convenient from the application point of view. Sure you can always fix this setting up some kind of shared filesystem, but this is just the first of the issues I couldthink of. What about the potential I/O errors that could happen while opening/writing the output file? I should replicatethem back from the PG host to the application layer, and this is something I’m pretty sure no one wants to go down. So adding such a feature to PG itself seems to me still the best trade off between complexity and convenience. However, ifyou are strongly against it, or see a better way to get around this problem, please let me know. As I wrote before, despitebeing an heavy PG user, it’s my first time on the hackers ML and I don’t want to seem disrespectful of the community. Thanks, Andrea On 23 Sep 2014, at 08:56, Stephen Frost <sfrost@snowman.net> wrote: > Andrea, > > * Andrea Riciputi (andrea.riciputi@gmail.com) wrote: >> My idea was to extend the COPY TO command to accept an EOL option as it already does with the DELIMITER option. To keepit simple we can limit the EOL choice to CR, LF or CRLF to avoid unusual output, and also keep the current behaviourwhen no EOL option is given. I was also wondering if an EOL option could be useful also for the text output formator not. > > Have you considered using COPY TO's 'PROGRAM' option to simply pipe the > output through unix2dos..? > >> I spent the weekend reading the COPY command source code and its grammar definition and I think I can patch it by myself,submit the patch here and wait for your review. However before starting this in my spare time I wanted to know ifyou, as the PG hackers community, would be against a similar proposal for any reason, and if so why. > > I'm not particularly against it, though if it can be solved with the > existing 'PROGRAM' capability then it may not make sense to complicate > the COPY code further. > > Thanks! > > Stephen
On 09/24/2014 09:23 AM, Andrea Riciputi wrote: > Imagine you access PG from an application written in the language X > using a driver library, both your application and your PG instance > run on two different hosts. In that scenario, you'll be using the PQgetCopyData function to get the data. PQgetCopyData returns one row at a time; the application can trivially change the line-ending to whatever it wants, when writing the output to a file or wherever it goes. > As I wrote before, despite being an heavy PG user, it’s my first time > on the hackers ML and I don’t want to seem disrespectful of the > community. No worries; thanks for effort, even if this idea doesn't pan out. - Heikki