Обсуждение: Importing directly from BCP files

Поиск
Список
Период
Сортировка

Importing directly from BCP files

От
Tim Uckun
Дата:
Does anybody have any documentation on what a BCP file from SQL server looks like?  I have a directory full of BCP files and I would like to import them into PG.  

Has anybody done anything like this before?

Thanks

Re: Importing directly from BCP files

От
David Rowley
Дата:
On 16 November 2015 at 15:31, Tim Uckun <timuckun@gmail.com> wrote:
Does anybody have any documentation on what a BCP file from SQL server looks like? 

That appears to depend on the -f option


There's more details on the format files here https://msdn.microsoft.com/en-nz/library/ms191516.aspx

Have you tried opening a file in a text editor to see what they look like?
 
I have a directory full of BCP files and I would like to import them into PG.  


The default format seems to be separating values with a tab character. 

The bulk import command in PostgreSQL is COPY, so you'll likely want to look at http://www.postgresql.org/docs/9.4/static/sql-copy.html 
Let's assume you have *.bcp files in the default format with values separated with tabs, you could open up psql and use a command something along the lines of:

copy <table name> from '<filename>' delimiter ' '; -- <- use a tab character between the quotes.

Of course, you'll need to create the table first with CREATE TABLE.
 
Has anybody done anything like this before?


I'd imagine that it's fairly common.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Importing directly from BCP files

От
S McGraw
Дата:
On 11/15/2015 07:31 PM, Tim Uckun wrote:
> Does anybody have any documentation on what a BCP file from SQL
> server looks like?  I have a directory full of BCP files and I would
> like to import them into PG.
>
> Has anybody done anything like this before?

I moved a database from MS Sql Server 2000 to Postgresql a few years
ago via BCP files.  I used a Python script to do some fixup on the
BCP files to make them importable as CSV files into Postgresql.  I
don't know if quirks I ran into are still an issue with newer versions
of Sql Server but for what it's worth, here are the comments from that
script:

# This script fixes several problems in the csv data files exported
# from Sql Server by bcp from the SW database.  Specifically:
#
# * Empty strings seem to be exported as a 0x00 byte.  We change
#   those to a pair of double quotes ("") that Postgresql's csv
#   import machinery will treat as an empty string.
# * Some text fields contain an ellipsis character (a single
#   character form of "..." encoded as "\201c".)  We change this
#   to utf-8.
# * Some text lines that had embedded newlines are reproduced
#   literals in the bcp output: the \n starts a new line.
#   We since all our data starts with an id number followed by
#   a tab separator, we assume lines that don't start this way
#   are part of the previous line.


Re: Importing directly from BCP files

От
Tim Uckun
Дата:

That appears to depend on the -f option


I have the format files so that's good.

 

Have you tried opening a file in a text editor to see what they look like?
 

Yes. It looks like a binary file with lots of non printable characters. I tried running it through iconv with common encodings (windows-1524, utf-16, utf-32 and few others) but I can't seem to get a clean UTF-8 or ASCII conversion for some reason.


The bulk import command in PostgreSQL is COPY, so you'll likely want to look at http://www.postgresql.org/docs/9.4/static/sql-copy.html 


I need to get them into a shape where copy command can process them first I think.  


 
Has anybody done anything like this before?


I'd imagine that it's fairly common.


I googled for quite a while and couldn't find anything.  Unfortunately the -f "format" option makes it hard to search for "BCP file format" or something similar. 


Re: Importing directly from BCP files

От
Tim Uckun
Дата:

I moved a database from MS Sql Server 2000 to Postgresql a few years
ago via BCP files.  I used a Python script to do some fixup on the
BCP files to make them importable as CSV files into Postgresql.  I
don't know if quirks I ran into are still an issue with newer versions
of Sql Server but for what it's worth, here are the comments from that
script:




Do you rember what the encoding of the files was by any chance?

Re: Importing directly from BCP files

От
S McGraw
Дата:
On 11/15/2015 11:52 PM, Tim Uckun wrote:
> >  I moved a database from MS Sql Server 2000 to Postgresql a few years
> >  ago via BCP files.  I used a Python script to do some fixup on the
> >  BCP files to make them importable as CSV files into Postgresql.  I
> >  don't know if quirks I ran into are still an issue with newer versions
> >  of Sql Server but for what it's worth, here are the comments from that
> >  script:
>
> Do you rember what the encoding of the files was by any chance?

I don't but the machine they were exported from would have been using
either the regular English/US locale or CP-932 (Japanese).

Looking at the extract script, it seems the bcp command used to extract
them used -c option, eg:

  bcp sw.dbo.swkwt out data\swkwt.tmp -c -U sa -P ****** -S NARA\SQLEXPRESS




Re: Importing directly from BCP files

От
Tim Uckun
Дата:
I don't see any documentation anywhere about the BCP format and as I said googling for "BCP format" gives a zillion links about the format files.   Every project on github just calls out the BCP command but I am on linux and that doesn't help me at all.

Bummer.  This is going to be a huge pain to try and import these files.


Re: Importing directly from BCP files

От
Adrian Klaver
Дата:
On 11/16/2015 11:13 AM, Tim Uckun wrote:
> I don't see any documentation anywhere about the BCP format and as I
> said googling for "BCP format" gives a zillion links about the format
> files.   Every project on github just calls out the BCP command but I am
> on linux and that doesn't help me at all.
>
> Bummer.  This is going to be a huge pain to try and import these files.



On openSuSE 13.2

sudo zypper install freetds-tools

Ubuntu 14.04

sudo apt-get install freetds-bin

aklaver@killi:~> freebcp -h
usage:  freebcp [[database_name.]owner.]table_name {in | out} datafile
         [-m maxerrors] [-f formatfile] [-e errfile]
         [-F firstrow] [-L lastrow] [-b batchsize]
         [-n] [-c] [-t field_terminator] [-r row_terminator]
         [-U username] [-P password] [-I interfaces_file] [-S server]
         [-v] [-d] [-h "hint [,...]" [-O "set connection_option on|off,
...]"
         [-A packet size] [-T text or image size] [-E]
         [-i input_file] [-o output_file]

example: freebcp testdb.dbo.inserttest in inserttest.txt -S mssql -U
guest -P password -c

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Importing directly from BCP files

От
Alvaro Herrera
Дата:
Tim Uckun wrote:

> > The bulk import command in PostgreSQL is COPY, so you'll likely want to
> > look at http://www.postgresql.org/docs/9.4/static/sql-copy.html
>
> I need to get them into a shape where copy command can process them first I
> think.

Maybe transform the format file into something for pgloader?

http://pgloader.io/howto/fixed.html

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Importing directly from BCP files

От
Tim Uckun
Дата:

On openSuSE 13.2

sudo zypper install freetds-tools

Ubuntu 14.04

sudo apt-get install freetds-bin

aklaver@killi:~> freebcp -h
usage:  freebcp [[database_name.]owner.]table_name {in | out} datafile
        [-m maxerrors] [-f formatfile] [-e errfile]
        [-F firstrow] [-L lastrow] [-b batchsize]
        [-n] [-c] [-t field_terminator] [-r row_terminator]
        [-U username] [-P password] [-I interfaces_file] [-S server]
        [-v] [-d] [-h "hint [,...]" [-O "set connection_option on|off, ...]"
        [-A packet size] [-T text or image size] [-E]
        [-i input_file] [-o output_file]

example: freebcp testdb.dbo.inserttest in inserttest.txt -S mssql -U guest -P password -c


From what I can make out this tool reads the BCP files and puts them into an SQL server.  I need to either put them into a postgres server or to output as plain text CSV files.

Re: Importing directly from BCP files

От
Kevin Grittner
Дата:
On Monday, November 16, 2015 1:15 PM, Tim Uckun
<timuckun@gmail.com> wrote:

> I don't see any documentation anywhere about the BCP format and
> as I said googling for "BCP format" gives a zillion links about
> the format files.

That's because there is no one, single "BCP format" -- the format
of a BCP file is determined by a BCP format file, which can be
created dynamically when you use BCP to generate output.  There is
no way to do more than guess at how to read a BCP file without its
associated format file.

Now, the human brain is amazingly good at spotting patterns; you
might be able to infer the format of each of your BCP files by
looking at it, but the only *sure* way of interpreting it correctly
is to read the format file.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Importing directly from BCP files

От
Adrian Klaver
Дата:
On 11/16/2015 12:15 PM, Tim Uckun wrote:
>
>     On openSuSE 13.2
>
>     sudo zypper install freetds-tools
>
>     Ubuntu 14.04
>
>     sudo apt-get install freetds-bin
>
>     aklaver@killi:~> freebcp -h
>     usage:  freebcp [[database_name.]owner.]table_name {in | out} datafile
>              [-m maxerrors] [-f formatfile] [-e errfile]
>              [-F firstrow] [-L lastrow] [-b batchsize]
>              [-n] [-c] [-t field_terminator] [-r row_terminator]
>              [-U username] [-P password] [-I interfaces_file] [-S server]
>              [-v] [-d] [-h "hint [,...]" [-O "set connection_option
>     on|off, ...]"
>              [-A packet size] [-T text or image size] [-E]
>              [-i input_file] [-o output_file]
>
>     example: freebcp testdb.dbo.inserttest in inserttest.txt -S mssql -U
>     guest -P password -c
>
>
>
>  From what I can make out this tool reads the BCP files and puts them
> into an SQL server.  I need to either put them into a postgres server or
> to output as plain text CSV files.
>

Hmm, I read through the man page to fast. I thought it was possible to
do like pg_restore and 'restore' the binary file to a text file. That
was wrong thinking on my part.

 From this thread:


https://social.msdn.microsoft.com/Forums/en-US/6525fece-cfc9-4920-8ec3-ff83899130fa/write-from-postgres-database-dump-files-in-bcp-native-data-format

the binary form of the BCP file output is undocumented. So if all the
BCP files you have are the binary(native) version you are up this creek
without a paddle.

So, moving to another creek. It depends on the amount of data you are
working with, but it might be worth it to spin up a VM in the cloud on
AWS, Azure, etc that has SQL Server on it and import the BCP files
there. You could then export the data using the character format instead
of the native format:

https://msdn.microsoft.com/en-us/library/ms190919.aspx

This should result in a CSV file that is(or can be made) suitable for
COPY into Postgres.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Importing directly from BCP files

От
Tim Uckun
Дата:
the binary form of the BCP file output is undocumented. So if all the BCP files you have are the binary(native) version you are up this creek without a paddle.


Ugh.  Yes it looks like that's the creek I am on.   Thanks Microsoft!
 
So, moving to another creek. It depends on the amount of data you are working with, but it might be worth it to spin up a VM in the cloud on AWS, Azure, etc that has SQL Server on it and import the BCP files there. You could then export the data using the character format instead of the native format:

Man that really sucks but if it's the only way then I guess it's the only way.
 

Re: Importing directly from BCP files

От
Adrian Klaver
Дата:
On 11/16/2015 04:25 PM, Tim Uckun wrote:
>     the binary form of the BCP file output is undocumented. So if all
>     the BCP files you have are the binary(native) version you are up
>     this creek without a paddle.
>
>
> Ugh.  Yes it looks like that's the creek I am on.   Thanks Microsoft!
>
>     So, moving to another creek. It depends on the amount of data you
>     are working with, but it might be worth it to spin up a VM in the
>     cloud on AWS, Azure, etc that has SQL Server on it and import the
>     BCP files there. You could then export the data using the character
>     format instead of the native format:
>
>
> Man that really sucks but if it's the only way then I guess it's the
> only way.

Something else that came to mind, where did you get the BCP files from
and is possible the source could create the text based versions instead
of the binary?

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Importing directly from BCP files

От
Tim Uckun
Дата:
I am going to ask them that but there are tons of legacy files which might need to be dealt with again in the future so I was hoping to use them directly.


Re: Importing directly from BCP files

От
Adrian Klaver
Дата:
On 11/16/2015 05:47 PM, Tim Uckun wrote:
> I am going to ask them that but there are tons of legacy files which
> might need to be dealt with again in the future so I was hoping to use
> them directly.

Seems it comes down to who is going to have to do the dealing, you or
the folks supplying the files. Might be a good time to work out a
division of labor:)

>
>


--
Adrian Klaver
adrian.klaver@aklaver.com