A while back, there was a push to make COPY gzip-aware. That didn't happen, but COPY FROM PROGRAM did, and it scratches the same itch.
I have a similar need, but with file_fdw foreign tables. I have .csv.gz files downloaded to the server, but those CSVs have 100+ columns in them, and in this case I only really care about a half dozen of those columns. I'd like to avoid:
- the overhead of writing the uncompressed file to disk and then immediately re-reading it
- writing unwanted columns to a temp/work table via COPY, and then immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data cells
- a csv parsing tool like csvtool or mlr, because they output another CSV which must be reparsed from scratch
Since file_fdw leverages COPY, it seemed like it would be easy to add the FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql IRC, only to discover that Adam Gomaa (
akgomaa@gmail.com ) had already written such a thing, but hadn't submitted it. Attached is a small rework of his patch, along with documentation.
NOTE: The regression test includes unix commands in the program option. I figured that wouldn't work for win32 systems, so I checked to see what the regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So I guess the test exists as a proof of concept that will get excised before final commit.