Обсуждение: Atul.Goel@globaldatapoint.com

Поиск
Список
Период
Сортировка

Atul.Goel@globaldatapoint.com

От
Craig Ringer
Дата:
> We are a data based company and are migrating from Oracle to
> Postgresql. For this purpose I am doing POC for the same. We have a
> business requirement to send the Data in XML files to our clients. The
> file size of XMLs is around 700MB and is growing.
>
> I have been able to generate sample XML files and store the same(Still to test for full files).
>
> However I have the following problems
>
> 1) It took 35 minutes to store 1 GB data in the TEXT column
> (create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)

That's a fairly inefficient way to generate the data to insert. You'll
probably find that much of the time was spent in repeat() not anywhere
interesting to your query.

I'd be more interested in how long COPY ... FROM took, personally.

> 2) It took 30 Minutes to write the same to OS using COPY command
>  Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'

That's a write rate of 34MB/min, or half a meg a second. Not pretty.

Where's the load during the COPY? Mostly CPU? Or mostly disk I/O?

Are you writing the output to the same disk the database is on?  (Not
that it should make this much difference).

> 3)      Find a solution when the file size become > 1GB

That's going to be interesting.

Do you really need to store whole XML documents this size in the
database, rather than broken up into structures that can be worked with
usefully in the database? If so, PostgreSQL might not be your best choice.

--
Craig Ringer

Re: Atul.Goel@globaldatapoint.com

От
"Joshua D. Drake"
Дата:
On Tue, 2010-07-13 at 20:33 +0800, Craig Ringer wrote:

> That's a write rate of 34MB/min, or half a meg a second. Not pretty.
>
> Where's the load during the COPY? Mostly CPU? Or mostly disk I/O?
>
> Are you writing the output to the same disk the database is on?  (Not
> that it should make this much difference).
>
> > 3)      Find a solution when the file size become > 1GB
>
> That's going to be interesting.

> Do you really need to store whole XML documents this size in the
> database, rather than broken up into structures that can be worked with
> usefully in the database? If so, PostgreSQL might not be your best choice.
>

This is solution seems wrong as a whole. A file that size has no
business inside PostgreSQL. If he "really" needs a DB API to it, have a
point to the filesystem where the file lives and write a
pl-/perl/java/python/php/ruby function to slurp the XML from the
filesystem and hand it off.

JD

> --
> Craig Ringer
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: Atul.Goel@globaldatapoint.com

От
"Joshua D. Drake"
Дата:
On Tue, 2010-07-13 at 20:33 +0800, Craig Ringer wrote:

> That's a write rate of 34MB/min, or half a meg a second. Not pretty.
>
> Where's the load during the COPY? Mostly CPU? Or mostly disk I/O?
>
> Are you writing the output to the same disk the database is on?  (Not
> that it should make this much difference).
>
> > 3)      Find a solution when the file size become > 1GB
>
> That's going to be interesting.

> Do you really need to store whole XML documents this size in the
> database, rather than broken up into structures that can be worked with
> usefully in the database? If so, PostgreSQL might not be your best choice.
>

This is solution seems wrong as a whole. A file that size has no
business inside PostgreSQL. If he "really" needs a DB API to it, have a
point to the filesystem where the file lives and write a
pl-/perl/java/python/php/ruby function to slurp the XML from the
filesystem and hand it off.

JD

> --
> Craig Ringer
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering