Обсуждение: Re: [HACKERS] OSS database needed for testing

Поиск
Список
Период
Сортировка

Re: [HACKERS] OSS database needed for testing

От
"Merlin Moncure"
Дата:
Josh Berkus wrote:
> Cool.   I'll tackle this in a week or two.  Right now, I'm being paid
to
> convert a client's data and that'll keep me busy through the weekend
...

I would suggest downloading the data now.  I can help get you started
with the create table statements and the import scripts.  There are not
very many ways to get the data in a reasonable timeframe: the spi
functions or the copy command are a good place to start.  Do not bother
with running stuff through insert queries: take my word for it, it just
won't work.  Of course, if you use copy, you have to pre-format.  Be
aware that you will have many gigabytes (like more than 20) of data
before you are done.

Whatever you decide to do, document the process: the difficulty of
getting large amounts of data into postgres quickly and easily has been
a historical complaint of mine.  Using mysql, it was a snap to get the
data in but using *that* database I really felt it couldn't handle this
much data.

I can also get you started with some example queries that should be
quite a challenge to set up to run quickly.  After that, it's your
ballgame.

Merlin


Re: OSS database needed for testing

От
Josh Berkus
Дата:
Merlin,

> I would suggest downloading the data now.  I can help get you started

OK, downloading now.

> with the create table statements and the import scripts.  There are not
> very many ways to get the data in a reasonable timeframe: the spi
> functions or the copy command are a good place to start.  Do not bother
> with running stuff through insert queries: take my word for it, it just
> won't work.  Of course, if you use copy, you have to pre-format.  Be
> aware that you will have many gigabytes (like more than 20) of data
> before you are done.

From my perspective, the easiest and fastest way to do this is to create the
table definitions in PostgreSQL, and then to use Perl to convert the data
format to something COPY will recognize.   If you can do the create table
statements for the LM* data, I can do the Perl scripts.

Given that the *total* data is 20G, we'll want to use a subset of it.   Per
your suggestion, I am downloading the *LM* tables.  I may truncate them
further if the resulting database is too large.  If some of the other tables
are reference lists or child tables, please tell me and I will download them
as well.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco