Обсуждение: Howto import regularly some CSV files with variing names?

Поиск
Список
Период
Сортировка

Howto import regularly some CSV files with variing names?

От
Andreas
Дата:
Hi,

I need to import some log-files of an application running on a different
host.
This app can't talk to the db but only creates daily a dump in a remote
directory that is mountable via samba by a Linux host that runs the
db-server.

The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

I could have cron to use "find" to search for all the files in the
mounted directoy.
But how can I pipe a SQL script into the db-server that takes the
filenames from "find" as a parameter?


Because of the somewhat limited intelligence of the application that
creates the logs I have to read the contents of the log in a temporary
table and insert from there only those lines that aren't allready in the
actual log-table within the db.
I've got all covered but the filenames that change from day to day. :(

I could copy each of those files in a temp directory and import from
there so that the sql script wouldn't have to deal with date within the
file-name but I'd rather store the names in a table though so that the
script could skip all those files that allready got imported previously.

So   how would I get the filenames into the sql-script?




Re: Howto import regularly some CSV files with variing names?

От
Adrian Klaver
Дата:
On 09/23/2013 05:19 PM, Andreas wrote:
> Hi,
>
> I need to import some log-files of an application running on a different
> host.
> This app can't talk to the db but only creates daily a dump in a remote
> directory that is mountable via samba by a Linux host that runs the
> db-server.
>
> The import would be easy if the files had a constant name but the app
> creates csv files with names like "ExportYYYYMMDD".
>
> I could have cron to use "find" to search for all the files in the
> mounted directoy.
> But how can I pipe a SQL script into the db-server that takes the
> filenames from "find" as a parameter?
>
>
> Because of the somewhat limited intelligence of the application that
> creates the logs I have to read the contents of the log in a temporary
> table and insert from there only those lines that aren't allready in the
> actual log-table within the db.
> I've got all covered but the filenames that change from day to day. :(
>
> I could copy each of those files in a temp directory and import from
> there so that the sql script wouldn't have to deal with date within the
> file-name but I'd rather store the names in a table though so that the
> script could skip all those files that allready got imported previously.
>
> So   how would I get the filenames into the sql-script?

Do man on find and look for -exec.

>
>
>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Howto import regularly some CSV files with variing names?

От
Andreas
Дата:
Am 24.09.2013 02:25, schrieb Adrian Klaver:
> On 09/23/2013 05:19 PM, Andreas wrote:
>>
>> I need to import some log-files of an application [...]
>> The import would be easy if the files had a constant name but the app
>> creates csv files with names like "ExportYYYYMMDD".
>>
>>
>> So   how would I get the filenames into the sql-script?
>
> Do man on find and look for -exec.
>

I could find the files and exec a shell script but how can I have a SQL
script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp table
then insert from there into the log-table

How would I get the filenames into the SQL script?


Re: Howto import regularly some CSV files with variing names?

От
Adrian Klaver
Дата:
On 09/23/2013 05:47 PM, Andreas wrote:
> Am 24.09.2013 02:25, schrieb Adrian Klaver:
>> On 09/23/2013 05:19 PM, Andreas wrote:
>>>
>>> I need to import some log-files of an application [...]
>>> The import would be easy if the files had a constant name but the app
>>> creates csv files with names like "ExportYYYYMMDD".
>>>
>>>
>>> So   how would I get the filenames into the sql-script?
>>
>> Do man on find and look for -exec.
>>
>
> I could find the files and exec a shell script but how can I have a SQL
> script take the found filenames as parameter?
>
> The SQL script needs to create a temp table
> then COPY the file with the filename it got as parameter into the temp
> table
> then insert from there into the log-table
>
> How would I get the filenames into the SQL script?

Just a thought:

1) Create a function that encapsulates the above logic where the
argument is the file name.

2) Create a shell script. Use substitution to take the filename passed
to the shell script to build a string and in the script do

psql -d db_name -U user_name -c 'select the function(filename)'

2a) Use alternate language to do 2).
--
Adrian Klaver
adrian.klaver@gmail.com


Re: Howto import regularly some CSV files with variing names?

От
Steve Crawford
Дата:
On 09/23/2013 05:47 PM, Andreas wrote:
> Am 24.09.2013 02:25, schrieb Adrian Klaver:
>> On 09/23/2013 05:19 PM, Andreas wrote:
>>>
>>> I need to import some log-files of an application [...]
>>> The import would be easy if the files had a constant name but the app
>>> creates csv files with names like "ExportYYYYMMDD".
>>>
>>>
>>> So   how would I get the filenames into the sql-script?
>>
>> Do man on find and look for -exec.
>>
>
> I could find the files and exec a shell script but how can I have a
> SQL script take the found filenames as parameter?
>
> The SQL script needs to create a temp table
> then COPY the file with the filename it got as parameter into the temp
> table
> then insert from there into the log-table
>
> How would I get the filenames into the SQL script?
>
>

Assuming your main script - the one that mounts the directory and finds
the file name - is in bash you can easily put a small script into a
heredoc block with variable substitution:

some script stuff that mounts remote directory and sets variable logfilename
...
psql -your -connection -parameters <<EOS
some preliminary setup statements
\copy .... from $logfilename ...
some processing statements
EOS

The disadvantage of this approach is that it is difficult-to-impossible
to detect and handle statement-level errors. But for short scripts like
simple imports this may not be an issue or may be easily solved by
wrapping things in a begin;...commit; block.

Cheers,
Steve



Re: Howto import regularly some CSV files with variing names?

От
Rémi Cura
Дата:
To be very straightforward :
your bash script will dynamically create the sql query in a string, then send it to database using psql.
You can also use pipes.

For example :
$4 -c "COPY $1 FROM '/tmp/pipe_ply_binaire_vers_ply_ascii_"$1"' WITH CSV DELIMITER AS ' ';";
where $4 is the psql command to connect to db, $1 the number in the name of the file we are working in, etc

Cheers,
Rémi-C


2013/9/24 Steve Crawford <scrawford@pinpointresearch.com>
On 09/23/2013 05:47 PM, Andreas wrote:
Am 24.09.2013 02:25, schrieb Adrian Klaver:
On 09/23/2013 05:19 PM, Andreas wrote:

I need to import some log-files of an application [...]
The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".


So   how would I get the filenames into the sql-script?

Do man on find and look for -exec.


I could find the files and exec a shell script but how can I have a SQL script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp table
then insert from there into the log-table

How would I get the filenames into the SQL script?



Assuming your main script - the one that mounts the directory and finds the file name - is in bash you can easily put a small script into a heredoc block with variable substitution:

some script stuff that mounts remote directory and sets variable logfilename
...
psql -your -connection -parameters <<EOS
some preliminary setup statements
\copy .... from $logfilename ...
some processing statements
EOS

The disadvantage of this approach is that it is difficult-to-impossible to detect and handle statement-level errors. But for short scripts like simple imports this may not be an issue or may be easily solved by wrapping things in a begin;...commit; block.

Cheers,
Steve




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Howto import regularly some CSV files with variing names?

От
Rowan Collins
Дата:
On 24/09/2013 18:18, Rémi Cura wrote:
> To be very straightforward :
> your bash script will dynamically create the sql query in a string,
> then send it to database using psql.
> You can also use pipes.
>
> For example :
> $4 -c "COPY $1 FROM '/tmp/pipe_ply_binaire_vers_ply_ascii_"$1"' WITH
> CSV DELIMITER AS ' ';";
>
> where $4 is the psql command to connect to db, $1 the number in the
> name of the file we are working in, etc

One problem with this is that it requires your shell script to have
superuser access to your database, since it is directly running a COPY
command. This may or may not matter, depending on setup - for instance,
you might trust local Unix sockets, meaning someone would need access to
the box first. It's probably best to never have a non-interactive
process able to connect with elevated privileges though.

--
Rowan Collins
[IMSoP]


Re: Howto import regularly some CSV files with variing names?

От
Rowan Collins
Дата:
On 24/09/2013 02:14, Adrian Klaver wrote:
Just a thought:

1) Create a function that encapsulates the above logic where the argument is the file name.

2) Create a shell script. Use substitution to take the filename passed to the shell script to build a string and in the script do

psql -d db_name -U user_name -c 'select the function(filename)'

2a) Use alternate language to do 2).

This has the advantage that you can enforce security inside the function. Note that for very good reasons, Postgres requires server-side COPY commands to be run under a super-user account - but this is exactly where a SECURITY DEFINER function comes in handy. Basically, your psql command would run as a non-privileged user, but the function would run as the superuser that created/owned it. The important part is that you are asking for the command to run, not demanding it - the script can and should say no sometimes! (I have a half-written blog-post on this subject kicking around...)

The import would be easy if the files had a constant name but the app creates csv files with names like "ExportYYYYMMDD".

If that's the current year month and day, it should be easy enough to generate the correct filename, rather than having to search the file system. Or perhaps iterate backwards from today to find the latest (or give up if you got to too old a date).

On the other hand...

This app can't talk to the db but only creates daily a dump in a remote directory that is mountable via samba

Is there a reason for that access constraint? As Steve mentioned, psql has a \copy command which works like the COPY SQL statement, but reads the file from the client and sends it over a connection, eliminating the need for a superuser account at the database end. If database security is a concern, you could have a login with extremely limitied privileges, and whitelist it explicitly in pg_hba.conf for this purpose.

Regards,
--
Rowan Collins
[IMSoP]