Re: [GENERAL] SQL Server to PostgreSQL HOWTO

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [GENERAL] SQL Server to PostgreSQL HOWTO
Дата
Msg-id 200109072139.f87LdO829533@candle.pha.pa.us
обсуждение исходный текст
Список pgsql-docs
Justin, can you add this to techdocs?


> Someone asked me to do this, and I have a draft now.   While it is clearly lacking in some areas, it might be of help
generally.  
>
> Ian
>
> Ian A. Harding
> Programmer/Analyst II
> Tacoma-Pierce County Health Department
> (253) 798-3549
> mailto: ianh@tpchd.org
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
The Microsoft SQL Server to PostgreSQL Migration HOWTO
Ian A. Harding <ianh@tpchd.org>
v1.00, 23 July 2000

How to move a database from a popular proprietary database to the world's most powerful open source database.
______________________________________________________________________

Table of Contents


1. Disclaimer

2. Introduction

3. Considerations

4. Tables

5. Data

6. Views

7. Summary


 ______________________________________________________________________

1.Disclaimer

The following document is offered in good faith as comprising only safe programming and procedures. No responsibility
isaccepted by the author for any loss or damage caused in any way to any person or equipment, as a direct or indirect
consequenceof following these instructions. 


2.Introduction

The most recent version of this document can always be found at http://www.tpchd.org/????.html

Microsoft SQL Server is very popular relational database management systems (RDBMS) with highly restrictive licensing
andhigh cost of ownership if the database is of significant size, or is used by a significant number of clients.  It
does,however, provide a very user-friendly interface, is easy to learn and use, and has low cost entry level
configurations. This has resulted in a very large installed user base. 

PostgreSQL now challenges MS SQL Server in basic feature set, reliability and performance, has a much less restrictive
license,and is open source.  As a matter of course, users are migrating to PostgreSQL from MS SQL Server as the cost of
ownershipbecomes an issue, and as their knowledge of relational database systems increases. 

This HOW-TO is intended for the MS SQL Server user who is now ready to migrate databases to PostgreSQL.

3.Considerations

RDBMS features are implemented differently and to different degrees by programmers.  Some applications rely heavily on
so-calledmiddleware, or on the client application to handle business logic.  Others attempt to put as much logic as
possiblein the database.  Your migration will be far more difficult if your application is in the latter group.  While
itis a sound design choice to put logic in the database server, it will require programming in a vendor specific
StructuredQuery Language (SQL) extension such as Microsoft's Transact SQL (T-SQL).  This is also the case with
PostgreSQL. There is no easy way to migrate stored procedures, triggers, or rules.  On the bright side, PostgreSQL
providesseveral language options, all of which are more graceful than T-SQL.  

RDBMS all provide built-in functions.  However, like procedural extensions to SQL, they are not portable.  Fortunately,
thereis some overlap, and the simple syntax makes migration relatively easy. 

Finally, the programmer's choice of SQL syntax can affect this process.  Most RDBMS are approaching the evolving SQL
standards. That is, they are leaning away from vendor specific syntax such as the '*=' syntax for a left outer join.
Thissyntax is still supported in MS SQL Server as of version 7.0, but was never supported in PostgreSQL. 

This process will require either a mind-numbing amount of hand editing of script and data files, or use of a scripting
languageto programmatically modify these files, followed by a somewhat less enormous amount of editing.  I am not smart
enoughto identify every possible option for the migration, or to accomodate them in a script.  I have done this
migrationon a relatively complex database application in a reasonable amount of time.  This, rather than a technically
flawlessscript, should be your goal. 

I use Tool Command Language (TCL) for almost everything, so I use it here.You can use whatever language you like.

4.  Tables

Dump the table defininitions with the MS SQL Server scripting tool.  From the Enterprise Manager, right click on your
databaseand select 'All Tasks', then 'Generate SQL Scripts' from the context menu.  Uncheck 'Script All Objects', and
select'All Tables'.  On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 'Options' tab, select 'Script
indexes'and Script PRIMARY KEYS...'.  Select the 'MS-DOS' file format, and make sure 'Create one file' is checked.
ClickOK, give it a name, and put it somewhere you can find it. 

A brief look at this file will show you what we are up against.  MS uses square brackets around all identifiers, to
protectyou from poor design choices such as using reserved keywords so crazy things like: 

CREATE TABLE [dbo].[Select] ([Union] [int])

are possible.  PostgreSQL uses double quotes instead.  MS uses the object owner qualification for all objects, 'dbo' in
thiscase.  PostgreSQL has no such qualifications in object names. 

Another thing to note is that MS SQL identifiers are case preserved, but in practice most installations are installed
ascase insensitive.PostgreSQL is case agnostic in the case of SQL keywords and unquoted identifiers, forcing all
queriesto lower case.  It is not the same as being case insensitive, in that you can create tables using the double
quoteprotection mentioned above, such that they can only be accessed using the same double quoting method.I find it is
bestto abandon case in object identifiers when migrating to PostgreSQL.  Also, it is safest to avoid any identifiers
thatrequire quoting to avoid problems down the road. 

It is worth noting that for data comparisons, PostgreSQL is case sensitive and there is no option to change this
behaviour.Youwill have to force data to upper or lower on both sides of text comparisons if case is not important to
theoperation and there is a chance of it being different.  This conversion might be a good time to force data used in
joinsand comparisons to all upper or lower case.  You will also need to look at the application for code that does
comparisonsof user-entered information taking advantage of MS SQL Server's typical case insensitivity. 

Another issue that is not immediately evident is that MS SQL Server supports ALTER TABLE statements that contain comma
separatedlists of alterations.  PostgreSQL currently does not.  This is important since the MS SQL Server scripting
programcreates all constraints in ALTER TABLE statements.  If any tables have more than one constraint, you will have
tosurgically separate them into their own ALTER TABLE statements, or move them in the CREATE TABLE statement.   

Indexes are a bright spot, mostly.  The CLUSTER keyword in PostgreSQL is not the same as the CLUSTERED keyword in a MS
SQLServer index creation.  PostgreSQL will allow you to 'cluster' a table, that is, rearranging the tuples in a table
inorder for that field.  This sounds good, except that the cluster is not maintained for updates and inserts, and the
factthat it will break all your other indexes whenever you generate the clustering. 

Having said all that, here is a partial list of things to correct:

  1.  Force to lower case.

  2.  Remove all square brackets.

  3.  Remove all object owner prefixes (i.e. "dbo.")

  4.  Remove all reference to filegroup (i.e. "ON PRIMARY")

  5.  Remove all non-supported optional keywords (i.e. "WITH NOCHECK", "CLUSTERED")

  6.  Update all non-supported data types (i.e. "DATETIME" becomes "TIMESTAMP") Also, this is a good time to get away
fromMONEY.It is supported in PostgreSQL, but is on its way out.Use NUMERIC(19,4). 

7.  Replace the T-SQL batch terminator "GO" with the PostgreSQL batch terminator ";"

Put this file somewhere safe, and now let's get the data.

5.Data

Data is data.  It is brought over in text form and cast into it's proper form by the database according to the
datatypesyou used in creating your tables.  If you have binary data, I am the wrong guy to ask. 

There are a couple gotchas here too, of course.  Since we use the COPY command, and it interprets a newline as the end
ofa tuple, you need to clean out all those newlines lurking in your text fields in MS SQL Server.  This is easy enough
todo.  Also, the data dump from MS SQL Server will use the standard cr/lf line terminator, which needs to be changed to
lfor it will cause havoc in comparisons of strings, among other problems.  I did this the easy way, downloading the
dumpsto my machine running my favorite Unix-like operating system via ftp, which does this translation for you.   

The first step in dumping the data out of MS SQL Server is to type all the names of your fields into a text file on the
Win32machine.  You can cheat and issue: 

"select name from sysobjects where type = 'U'"

in Query Analyzer (ISQL-W) to get the list, then save the results to a file.  Then, write a handy little script to call
bcp,the Bulk Copy Program.  Mine looks like this: 

set file [open "C:\\inetpub\\ftproot\\tablelist.txt" r]
while {![eof $file]} {
    set table [gets $file]
    exec bcp <database>..$table out $table -c -k -S192.168.100.1 -Usa -Ppassword -r ~
}
close $file

This will dump all the listed tables into files of the same name in the current directory.  The -c flag means to use
plaincharacter format.  The -k flag tells bcp to "keep nulls".  This is important later when we import the data.  The
-ris the "row terminator".  To make cleaning up the carriage returns easier, I use this to signal the end of a row.  I
putthis script in the C:\InetPub\ftproot directory, so I can go right to the next step. 

>From the Unix-like machine, start ftp and get the file listing you created earlier.  Put it in a work directory.
Changeto the new work directory and get the files: 

ftp> lcd /home/homer/workdir
Local directory now /home/homer/workdir
ftp> fget tablelist.txt

This should download all of the data files to the work directory, magically converting line terminators to Unix
compatibleformat.  If you can't use FTP, there are other ways to get files from here to there.  Just be advised that
youmay need a little sed script to fix the cr/lf problem. 

Now, let's fix the embedded line feed issue.

#!/usr/pkg/bin/tclsh
set file [open tblnames r]
set flist [read -nonewline $file]
close $file
set flist [split $flist \n]
foreach f $flist {
    set file [open $f r]
    set data [read -nonewline $file]
    close $file
    regsub -all {\000} $data {} data
    regsub -all {\n} $data \\\n data
    regsub -all {~} $data \n data
    set file [open $f w]
    puts -nonewline $file $data
    close $file
}

The regsub lines are where the work gets done.  They replace all nulls (\000) with an empty string, then all linefeeds
witha literal "\n" which will tell COPY what to do when we import the file, then my line terminators get replaced with
alinefeed, which is what COPY is expecting.  There are cleaner and easier ways to do this, but you get the point. 

Now, go back to the sql file you edited to create your database objects.  I assume it is on the Unix-like box at this
point. It should have a series of CREATE TABLE statements, followed by ALTER TABLE and CREATE INDEX, etc statements.
Whatwe need to do now is tell it we want to load data after the tables are created, but before anything else. 

For each CREATE TABLE statement, follow it with a COPY statment.  Something like

COPY tablename FROM '/home/homer/workdir/tablename' with null as '';

Once you have this done, execute it against your PostgreSQL database, something like

$ psql newdb < modifiedscript.sql &> outfile

should work.The output file is good to have for looking for problems.  It gets messy so

$ cat outfile | grep ERROR

can give you an idea how things went.  I guarantee you have some troubleshooting to do.

6.  Views

Views are pretty easy, as long as you didn't use too many functions in them.  A favorite of mine is isnull().  Like
mostfunctions, it has a PostgreSQL counterpart, coalesce().  A surprising number of functions will work just fine.  For
example,round() is exactly the same.  datepart() becomes date_part(), but the arguments are the same, althought
PostgreSQLmay be more particular about format strings.  For example, SQL Server will accept datepart(yyyy, mydatefield)
aswell as datepart(year, mydatefield)  .  PostgreSQL wants to see date_part('year', mydatefield) (note single quotes). 

Generating sql for views is pretty much the same as for tables.  From the Enterprise Manager, right click on your
databaseand select 'All Tasks', then 'Generate SQL Scripts' from the context menu.  Uncheck 'Script All Objects', and
select'All Views'.  On the 'Formatting' tab, de-select 'Generate DROP...'.  On the 'Options' tab, Select the 'MS-DOS'
fileformat, and make sure 'Create one file' is checked.  Click OK, give it a name, and put it somewhere you can find
it.  

Run this file through the same script you created to clean the sql for your tables, and see if it will work on
PostgreSQL. If not, you will have to do some fixing of functions. 

7.  Summary

Converting a database from MS SQL Server is not always easy.  It is, however, always worth it.  You will find
PostgreSQLto be an extremely powerful and flexible product, with the best tech support in the world, the actual
developersand users of the product.  If you spent days trying to get xp_sendmail to work on SQL Server version 7.0, or
wonderedwhat was in those enormous "Service Packs" then you will appreciate this. 

В списке pgsql-docs по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] Some changes to CVSup and AnonCVS access ...
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: PostgreSQL/JDBC Test Suite Howto