Notes on converting from MySQL 5.0.x to PostgreSQL 8.1.4

Поиск
Список
Период
Сортировка
On Converting from MySQL 5.0.x to PostgreSQL 8.1.x
--------------------------------------------------

I am in the process of converting a couple of major sites from MySQL 5.0.22
to PostgreSQL 8.1.4, and I thought I would share some of my observations on
this process and the two database systems in general.  I feel I am in a good
position to do this, as I am fairly familiar in-depth with MySQL's modern
features, am relatively technology-agnostic, and having done this since
the MySQL project has matured substantially over the last couple of years, I
feel I can give insight into the features of both systems from a current
perspective.

I will attempt to do this from an objective standpoint, as I still see
benefits and drawbacks to both systems (no software will ever meet the needs
of every situation).  If you are looking for a document that mocks one system
or another, this is not it; there are countless results for "X sucks" on
$your_favorite_search_engine.

My reasons for making this switch are primarily due to having a bit of spare
time, wanting to expand my horizons and familiarize myself with another
well-respected open source project, some of the nice in-built procedural
language features of PostgreSQL, and basically wanting to form my own
opinion of the features of both systems.  That being said, I have really
been impressed so far with the features I am discovering, and becoming
happier each day.

It is by no means comprehensive; I have just highlighted some of the
more notable and obvious differences as I discovered them.  Most of the
information in here is pretty basic for those of you very familiar with
both systems.  I also only highlight the differences in the Unix/POSIX
versions of these programs.  (I don't run Windows as a server, and I don't
care to.  My technology agnosticism stops at inadequate systems.)

All comments are current as of the time of writing (Spring/Summer 2006).
Error corrections are very welcome.  Flames can go to /dev/null.  Don't care.
Oh, and since I wrote this in vim, it is best read with a monospace font. :^)

Major differences I have noted:
-------------------------------

MySQL 5.0.x:
* Multiple storage engines with different features.
* Supports multi-insert syntax (INSERT INTO foo VALUES (1,2), (3,4) ...)
* A few more access controls on features built-in to the GRANT tables.  Many
  of these are still present, but implemented in other ways in PostgreSQL.
* Single AUTO_INCREMENT column allowed per table.
* Easy, built-in and extensive replication support.
* Single datastore location per server.
* ALL Stored Procedures are kept in the mysql system database.

PostgreSQL 8.1.x:
* Embedded procedures in multiple native languages (stored procedures and
  functions can be written in native Perl, Python, Tcl, SQL, PL/PgSQL)
* Extensive and versatile procedural language functionality.
* User-definable data types and operators.
* Multiple sequence generators allowed per table.
* Replication support still rudimentary.
* Stored procedures are kept (somewhat more logically, imho) in the
  corresponding databases.
* Multiple datastore locations possible using tablespaces concept.
  (For the record, MySQL will have tablespaces when 5.1.x is stabilized.)
* Most system variables, "built-in" types and features configurable as they
  are just kept in a system catalog.
* Allows deletions and subselects to specify the same table (e.g. DELETE
  FROM foo WHERE id IN (SELECT id FROM foo WHERE...) ).  MySQL does not
  allow this as of 5.0.22.
* Copious documentation on the database internals, for extending the
  database itself.

Pointers, tips, quick facts and gotchas for other people converting:
--------------------------------------------------------------------

* Don't bother using an old version, just go for 8.1.4 (or whatever is new
  at the time of your conversion.  This should be common sense.)
* Since Pg uses a full transactional storage engine, the speed is roughly
  comparable to InnoDB, rather than the stock MyISAM format.
* PostgreSQL's TCP port is 5432 by default.
* The main server process on PostgreSQL is 'postmaster'.
* 'postmaster' can be controlled via the 'pg_ctl' command.
* The administrative user is called 'postgres' by default.
* Like MySQL, Pg uses the system user as default, if no username is
  specified when connecting.
* The command-line client is called 'psql'.
* PostgreSQL by default comes configured to disallow network connections.
  To enable these, you must follow these steps:
    1.  Edit $DATADIR/pg_hba.conf and add access permissions.
    2.  Edit $DATADIR/postgresql.conf and uncomment the listen_addresses
        line, setting it to something reasonable.
    3.  Restart postmaster.
* PostgreSQL relies extensively upon quick aliases for common features within
  the CLI shell.  MySQL offers many similar features, but they aren't used
  as much from what I have observed.
* MySQL combines the concepts of 'database' and 'schema' into one.  PostgreSQL
  differentiates the two.  While the hierarchy in MySQL is
  database.table.field, PostgreSQL is roughly: database.schema.table.field.
  A schema is a 'logically grouped set of tables but still kept within a
  particular database.'  This could allow separate applications to be built
  that still rely upon the same database, but can be kept somewhat logically
  separated.  The default schema in each database is called 'public', and is
  the one referred to if no others are specified.  This can be modified with
  'SET search_path TO ...'.
* Pg uses a 'template1' pseudo-database that can be tailored to provide
  default objects for new database creation, if you should desire.  It
  obviously also offers a 'template0' database that is read-only and
  offers a barebones database, more equivalent to the empty db created with
  mysql's CREATE DATABASE statement.
* Pg's ROLEs can specify a single user or a group, and be nested to contain
  multiple users.
* Pg's default character set (in 8.1.4) is UTF8.
* Pg uses the 'serial' column type instead of AUTO_INCREMENT.  This allows
  more than one independent sequence to be specified per table (though the
  utility of this may be of dubious value).  These are closer to Oracle's
  concept of sequence generators, and they can be manipulated with the
  currval(), nextval(), setval(), and lastval() functions.
* Pg requires its tables and databases be 'vacuumed' regularly to remove
  completed transaction snapshots and optimize the tables on disk.  It is
  necessary because the way that PostgreSQL implements true MVCC is by
  writing all temporary transactions to disk and setting a visibility
  flag for the record.  Vacuuming can be performed automatically, and in
  a deferred manner by using vacuum_cost settings to limit it to low-load
  periods or based upon numerous other criteria.  See the manual for more
  information.
* Kept internally in Pg, there is a concept called the OID, which is a
  continuously incremented number used to assign unique IDs to system
  objects.  This allows the database to store and refer uniquely to user
  operators, new databases, basically anything that the system needs to
  refer to in the 'data directory', regardless of user-defined names.
* Most administrative procedures will refuse to run as root, and require you
  to su to the 'postgres' system user to perform the action.
* PgAdminIII gives you a great overview of the hierarchy of system objects
  throughout the server.  Even though you may administrate your server
  primarily via the CLI, as I do, it is still valuable during the learning
  process to use this tool to browse around the various objects, to learn
  the system setup and hierarchy visually.
* While MySQL supports transactions with the InnoDB databases, many MySQL
  users generally do not use them extensively enough.  With Pg, due to the
  behaviour of the server in attempting to ensure data integrity in a
  variety of situations (client disconnection, network trouble, server
  crashes, etc.), it is highly advisable to become familiar and utilize
  transactions a lot more, to ensure your DATA is left in a consistent state
  before and after every change you wish to make.
* There is a conversion utility called 'mysql2pgsql' that will convert
  dump files from the mysqldump format, to a format that psql can
  understand.  It is available at:
  - http://gborg.postgresql.org/project/mysql2psql/projdisplay.php
* To turn on query time output, similar to the mysql CLI, use the '\timing'
  command from psql.  (Note that the time is displayed in milliseconds,
  whereas in the mysql client it is displayed in seconds.)

Common equivalents:
-------------------

MySQL                           PostgreSQL
-----                           -----------
mysql database                  system tables
mysql                           psql
mysqld                          postmaster
mysqladmin                      initdb/dropdb/createuser/dropuser/
                                  createlang/droplang/vacuumdb
mysqldump                       pg_dump/pg_dumpall/pg_restore
-nothing-                       ecpg
SHOW DATABASES;                 \l
SHOW GRANTS;                    \du
SHOW TABLES;                    \dt
DESC tblname;                   \d foo
USE dbname;                     \c dbname
ALTER TABLE foo AUTO_INCREMENT = n;     SELECT setval('seq_name',n);
SHOW PROCESSLIST;               SELECT * FROM pg_stat_activity;
OPTIMIZE TABLE ...              VACUUM ...

Final thoughts:
---------------
Overall, I have been happy and very impressed with the features offered by
PostgreSQL 8.1.4, and believe that I will be using it for the majority of my
future projects.  There are still some niches where I see utility for the
pluggable storage engines and raw speed of MySQL.  I will be keeping my eye
on that project closely, also, as I want to test out the new Scheduled
Events feature from the 5.1 series once it is stabilized.

I have also been using Apress' "Beginning Databases with PostgreSQL, 2nd
Edition" (2005) as my learning material.  I briefly examined Sams'
"PostgreSQL - The Comprehensive Guide, 2nd Ed." (2005) and Pearson's
"PostgreSQL: Introduction and Concepts" (2001), but found the Apress book
the best of the 3 that I had access to.  YMMV.  More info in the book
reviews linked below.

Further information:
--------------------
PostgreSQL manual: http://www.postgresql.org/docs/manuals/
PgAdmin3: http://www.pgadmin.org/
PostgreSQL book reviews:
  http://techdocs.postgresql.org/techdocs/bookreviews.php
Community Support Channel: irc.freenode.net in #postgresql
Varlena Consulting's General Bits archives:
  http://www.varlena.com/GeneralBits/

---------

Thank you, and I hope that these notes prove helpful to others!

Jason McManus

В списке pgsql-general по дате отправления:

Предыдущее
От: lbolognini@gmail.com
Дата:
Сообщение: Re: pg_restore: [archiver] could not open input file
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: pg_restore: [archiver] could not open input file