Обсуждение: Trying to load MySQL data

Поиск
Список
Период
Сортировка

Trying to load MySQL data

От
"garrettmoore@gmail.com"
Дата:
Hello,

I am working on a project where we are converting from MySQL to
Postgres. I figured the easiest way would be to export the MySQL data
as CSV.

I'm having a problem importing some of the data. What I have done is
exported the MySQL data and then modified it so that all single quotes
(a ' quote) are doubled, and null values are replaced with an empty
value in the CSV.

Our data, for example, looks like this:

2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow
users, then to freely edit topics?[/quote]
We could either go with "it''s a minor annoyance that is one measure
that helps keep DB load lower," or "we do it just to piss you off."
You choose.',0,1124498148,,,376,0,0,,

This row causes an error. The error is that:

The value "why not allow users" is not valid for column 'x'.

Column x is the first column after the long section of text, with a
value of '0'.

It appears that the quote in "I'm", which has been doubled quoted to
'', is not being properly skipped over, and COPY thinks that the next
comma (after "curious") is a new column, and tries to start inserting
data there. At least that's what I have come up with.

Why is this happening? I've used this method before and I didn't have
any trouble, when loading a bunch of Wikipedia test data (which has all
manner of quotes, commas, and apostrophes in it).

This is line 39150 in the file; all previous lines import fine, but
this kills the COPY process and all of the previous inserts are rolled
back. I need to get this data loaded intact.

My copy command is
COPY posts FROM '/tmp/posts.txt' CSV QUOTE $$'$$;


Re: Trying to load MySQL data

От
Scott Marlowe
Дата:
On Tue, 2007-01-09 at 19:54, garrettmoore@gmail.com wrote:
> Hello,
>
> I am working on a project where we are converting from MySQL to
> Postgres. I figured the easiest way would be to export the MySQL data
> as CSV.
>
> I'm having a problem importing some of the data. What I have done is
> exported the MySQL data and then modified it so that all single quotes
> (a ' quote) are doubled, and null values are replaced with an empty
> value in the CSV.
>
> Our data, for example, looks like this:
>
> 2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow
> users, then to freely edit topics?[/quote]
> We could either go with "it''s a minor annoyance that is one measure
> that helps keep DB load lower," or "we do it just to piss you off."
> You choose.',0,1124498148,,,376,0,0,,

If that is indeed the line, then this part:

2628,'Poster,5,'255.255.18.138',

is misformed.

I'm assuming you really need:

2628,'Poster',5,'255.255.18.138',

Re: Trying to load MySQL data

От
brian
Дата:
garrettmoore@gmail.com wrote:
> Hello,
>
> I am working on a project where we are converting from MySQL to
> Postgres. I figured the easiest way would be to export the MySQL data
> as CSV.
>
> I'm having a problem importing some of the data. What I have done is
> exported the MySQL data and then modified it so that all single quotes
> (a ' quote) are doubled, and null values are replaced with an empty
> value in the CSV.
>
> Our data, for example, looks like this:
>
> 2628,'Poster,5,'255.255.18.138',,'[quote=gz]I''m curious, why not allow
> users, then to freely edit topics?[/quote]
> We could either go with "it''s a minor annoyance that is one measure
> that helps keep DB load lower," or "we do it just to piss you off."
> You choose.',0,1124498148,,,376,0,0,,
>
> This row causes an error. The error is that:
>
> The value "why not allow users" is not valid for column 'x'.
>
> Column x is the first column after the long section of text, with a
> value of '0'.
>
> It appears that the quote in "I'm", which has been doubled quoted to
> '', is not being properly skipped over, and COPY thinks that the next
> comma (after "curious") is a new column, and tries to start inserting
> data there. At least that's what I have come up with.
>
> Why is this happening? I've used this method before and I didn't have
> any trouble, when loading a bunch of Wikipedia test data (which has all
> manner of quotes, commas, and apostrophes in it).
>
> This is line 39150 in the file; all previous lines import fine, but
> this kills the COPY process and all of the previous inserts are rolled
> back. I need to get this data loaded intact.
>
> My copy command is
> COPY posts FROM '/tmp/posts.txt' CSV QUOTE $$'$$;
>

It appears that you're missing a quote after the word 'Poster':

 > 2628,'Poster,5,'255.255.18.138',

brian

Re: Trying to load MySQL data

От
Walter Vaughan
Дата:
garrettmoore@gmail.com wrote:

> Hello,
>
> I am working on a project where we are converting from MySQL to
> Postgres. I figured the easiest way would be to export the MySQL data
> as CSV.

I dunno, but unless you don't really care about your data, I'd use something
that you have no chance of in your data. Things like | and ~ are pretty standard
field separators.

So you'd use things like FIELDS TERMINATED BY '|' in your MySQL export
and WITH DELIMITER '|' NULL '' in your postgresql import

Also remember that mySQL nulls are nothing like postgresql nulls.

--
Walter

Re: Trying to load MySQL data

От
Dimitri Fontaine
Дата:
Hi,

Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a écrit :
> I am working on a project where we are converting from MySQL to
> Postgres. I figured the easiest way would be to export the MySQL data
> as CSV.

You could also give pgloader a try.
It uses COPY but allows you to load good data even in the presence of errors,
and have a reject file containing erroneous data lines, to replay insertion
later.
It even allows you to reorder data for matching your columns definition, but
as of now suffer from a psycopg2 limitation : you have to provide all table
columns into your data file.

  http://pgfoundry.org/projects/pgloader/
  http://debian.dalibo.org/unstable/
  http://debian.dalibo.org/unstable/pgloader_2.0.2.tar.gz

Hope this helps,
--
Dimitri Fontaine
http://www.dalibo.com/

Вложения

Re: Trying to load MySQL data

От
"Merlin Moncure"
Дата:
On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote:
> Hi,
>
> Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a écrit:
> > I am working on a project where we are converting from MySQL to
> > Postgres. I figured the easiest way would be to export the MySQL data
> > as CSV.

If you are using pg 8.2+, I've had good luck with the following:

1. create pgsql schema by hand or using some method, so they match mysql
2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda]

this will work for most data types. as of 8.2, postgresql supports
multiple record inserts, which while not as fast as copy, is pretty
close.  if mysqldump is dumping single line inserts, change it to
multiple with -e switch iiirc.

merlin

Re: Trying to load MySQL data

От
Bruce Momjian
Дата:
Merlin Moncure wrote:
> On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote:
> > Hi,
> >
> > Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a ?crit:
> > > I am working on a project where we are converting from MySQL to
> > > Postgres. I figured the easiest way would be to export the MySQL data
> > > as CSV.
>
> If you are using pg 8.2+, I've had good luck with the following:
>
> 1. create pgsql schema by hand or using some method, so they match mysql
> 2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda]

Wow, mysqldump has a postgresql compatibility mode?  Intersting.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Trying to load MySQL data

От
"Merlin Moncure"
Дата:
On 1/11/07, Bruce Momjian <bruce@momjian.us> wrote:
> Merlin Moncure wrote:
> > On 1/10/07, Dimitri Fontaine <dim@dalibo.com> wrote:
> > > Hi,
> > >
> > > Le mercredi 10 janvier 2007 02:54, garrettmoore@gmail.com a ?crit:
> > > > I am working on a project where we are converting from MySQL to
> > > > Postgres. I figured the easiest way would be to export the MySQL data
> > > > as CSV.
> >
> > If you are using pg 8.2+, I've had good luck with the following:
> >
> > 1. create pgsql schema by hand or using some method, so they match mysql
> > 2. mysqldump --compatible=postgresql [yadda] | grep ^INSERT | psql [yadda]
>
> Wow, mysqldump has a postgresql compatibility mode?  Intersting.

It does (had it for years), but it doesn't do very much...fixes the
quotes and a couple of other  things.  In particular I know of no easy
ways to convert the table schemas without use of external tools.

merlin

Re: Trying to load MySQL data

От
"garrettmoore@gmail.com"
Дата:
The missing quote after Poster is a mistake I made when sanitzing the
data for posting here. That error is NOT present in the actual data.
There is a quote where needed in the data. So, with that in mind, why
am I still getting the error?

Also, there is no symbol we can expect to not be in the data. This data
is from several sources, including a message board, and there could be
tildes, pipes, or any other symbol in discussion fields.

Also since it's CSV we just have null represented by lack of any value
between two commas, so: a,b,,d represents a row with values a, b, NULL,
d. This works fine in general.

Walter Vaughan wrote:
> garrettmoore@gmail.com wrote:
>
> > Hello,
> >
> > I am working on a project where we are converting from MySQL to
> > Postgres. I figured the easiest way would be to export the MySQL data
> > as CSV.
>
> I dunno, but unless you don't really care about your data, I'd use something
> that you have no chance of in your data. Things like | and ~ are pretty standard
> field separators.
>
> So you'd use things like FIELDS TERMINATED BY '|' in your MySQL export
> and WITH DELIMITER '|' NULL '' in your postgresql import
>
> Also remember that mySQL nulls are nothing like postgresql nulls.
>
> --
> Walter
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/


Re: Trying to load MySQL data

От
Scott Marlowe
Дата:
On Wed, 2007-01-10 at 16:51, garrettmoore@gmail.com wrote:
> The missing quote after Poster is a mistake I made when sanitzing the
> data for posting here. That error is NOT present in the actual data.
> There is a quote where needed in the data. So, with that in mind, why
> am I still getting the error?
>
> Also, there is no symbol we can expect to not be in the data. This data
> is from several sources, including a message board, and there could be
> tildes, pipes, or any other symbol in discussion fields.
>
> Also since it's CSV we just have null represented by lack of any value
> between two commas, so: a,b,,d represents a row with values a, b, NULL,
> d. This works fine in general.

Can you make a sanitized test case, complete unto itself, and post that?

The data to look for are generally \ and '