Обсуждение: Importing binary data

Поиск
Список
Период
Сортировка

Importing binary data

От
Chris Ruprecht
Дата:
Hey guys,

I was given a database back of a non-PostgreSQL database. That database contains records where some binary file (looks
likeemail attachments) was imported into several chunks of X characters in length and then stored into multiple
records.A messy way of storing BLOB data. The database encoding is LATIN1, ISO8859-1. 
There chunks are actually 50 fields of 60 bytes each per row. If the original file is larger than that, more than one
rowis used. 

I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres,
I'mgetting errors like this: 

ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
CONTEXT:  COPY attachments, line 14: "58025     1       cl\Cert.r       10
M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1%  M1T52'$585$523B!7..." 

I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to have
a"NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to interpret
them.

Any ideas of what I can do to import this stuff?


best regards,
chris
--
chris ruprecht
database grunt and bit pusher extraordinaíre



Re: Importing binary data

От
David G Johnston
Дата:
Chris Ruprecht-3 wrote
> Hey guys,
>
> I was given a database back of a non-PostgreSQL database. That database
> contains records where some binary file (looks like email attachments) was
> imported into several chunks of X characters in length and then stored
> into multiple records. A messy way of storing BLOB data. The database
> encoding is LATIN1, ISO8859-1.
> There chunks are actually 50 fields of 60 bytes each per row. If the
> original file is larger than that, more than one row is used.
>
> I can export the data out of that database into flat files just fine, but
> then I try to import the data to Postgres, I'm getting errors like this:
>
> ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
> CONTEXT:  COPY attachments, line 14: "58025     1       cl\Cert.r       10
> M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1%
> M1T52'$585$523B!7..."
>
> I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the
> data type 'bytea', no luck. I'd love to have a "NO-CONVERSION" option on
> the copy command that just takes what ever bytes come along and doesn't
> try to interpret them.
>
> Any ideas of what I can do to import this stuff?

PostgreSQL does not like NUL (0x00) in the data that it is importing - I
don't believe it matters what encoding you are using.  That said it would
help to clarify exactly how you are running the copy command - specifically
client or server.

I haven't ever personally encountered this situation but any chance you can
base64 encode on output and then import that way into the new database on a
staging table then decode and manipulate the staged data before storing it
into the permanent table?

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Importing-binary-data-tp5824488p5824490.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Importing binary data

От
Chris Ruprecht
Дата:
I'm using

psql mydb
# copy attachments FROM '/opt/tmp/clientdata/sql-data-p/attachments.dat' with NULL as 'NULL';

to load the database back in. The psql command runs on the same box as the DB server.

If I had my way, I'd store the binary somewhere and keep a reference to it in the database. Having binary data (images,
PDFs,etc.) in the database never made much sense to me. But this is a client's DB. They used to have an application
writtenin something called "Progress 4GL" and now ported all that to use PostgreSQL with what ever front end (I have no
idea).I believe they are using the new app just like the old app, so unfortunately, I can't just go and change the
underlyingdatabase structure. 

Hope this helps to clarify the issue.

> On Oct 27, 2014, at 15:57 , David G Johnston <david.g.johnston@gmail.com> wrote:
>
> PostgreSQL does not like NUL (0x00) in the data that it is importing - I
> don't believe it matters what encoding you are using.  That said it would
> help to clarify exactly how you are running the copy command - specifically
> client or server.
>
> I haven't ever personally encountered this situation but any chance you can
> base64 encode on output and then import that way into the new database on a
> staging table then decode and manipulate the staged data before storing it
> into the permanent table?
>
> David J.
>
>
>
>
> --
> View this message in context: http://postgresql.1045698.n5.nabble.com/Importing-binary-data-tp5824488p5824490.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Importing binary data

От
Tom Lane
Дата:
Chris Ruprecht <chris@cdrbill.com> writes:
> I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres,
I'mgetting errors like this: 

> ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
> CONTEXT:  COPY attachments, line 14: "58025     1       cl\Cert.r       10
M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1%  M1T52'$585$523B!7..." 

> I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to
havea "NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to
interpretthem. 

That's hardly possible, considering you're expecting COPY to recognize
field and record boundaries.  What you probably need to do here is declare
the column as bytea and then write some sort of preprocessing script that
converts the binary data into hex-encoded form (basically \x followed by
hex digits, if memory serves, but check the description of bytea in the
PG manual).

            regards, tom lane


Re: Importing binary data

От
Chris Ruprecht
Дата:
Thanks guys. I know what I will spend the night, doing ;).


> On Oct 27, 2014, at 16:33 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Chris Ruprecht <chris@cdrbill.com> writes:
>> I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres,
I'mgetting errors like this: 
>
>> ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
>> CONTEXT:  COPY attachments, line 14: "58025     1       cl\Cert.r       10
M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1%  M1T52'$585$523B!7..." 
>
>> I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to
havea "NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to
interpretthem. 
>
> That's hardly possible, considering you're expecting COPY to recognize
> field and record boundaries.  What you probably need to do here is declare
> the column as bytea and then write some sort of preprocessing script that
> converts the binary data into hex-encoded form (basically \x followed by
> hex digits, if memory serves, but check the description of bytea in the
> PG manual).
>
>             regards, tom lane