Re: reducing IO and memory usage: sending the content of a table to multiple files

Поиск
Список
Период
Сортировка
От Ivan Sergio Borgonovo
Тема Re: reducing IO and memory usage: sending the content of a table to multiple files
Дата
Msg-id 20090402214833.19d0f446@dawn.webthatworks.it
обсуждение исходный текст
Ответ на Re: reducing IO and memory usage: sending the content of a table to multiple files  (Sam Mason <sam@samason.me.uk>)
Ответы Re: reducing IO and memory usage: sending the content of a table to multiple files  (Sam Mason <sam@samason.me.uk>)
Список pgsql-general
On Thu, 2 Apr 2009 17:27:55 +0100
Sam Mason <sam@samason.me.uk> wrote:

> On Thu, Apr 02, 2009 at 11:20:02AM +0200, Ivan Sergio Borgonovo
> wrote:
> > This is the work-flow I've in mind:
> >
> > 1a) take out *all* data from a table in chunks (M record for each
> > file, one big file?) (\copy??, from inside a scripting language?)
>
> What about using cursors here?

The only way I've seen using cursors with php involve defining a
function... It looks a bit messy for a 10 lines script having to
define a function just as a shell for a sql statement.
I'd even write it in python if the project didn't start to look as a
small Frankenstein... and sooner or later I bet I'll have to include
some php files to recycle some function.
I didn't find any elegant example of cursor use in PHP... OK PHP is
not the most elegant language around... but still any good exapmle
someone could point me at?

If you could point me to some clean way to use cursors in php I'd
happy to learn.

I was thinking about using another embedded language that better
suits data processing (an unsafe version?) so I could directly
output to files from within a postgresql function...

> > 2a) process each file with awk to produce N files very similar
> > each other (substantially turn them into very simple xml)
> > 3a) gzip them

> GZIP uses significant CPU time; there are various lighter weight
> schemes available that may be better depending on where this data
> is going.

That's a requirement.

> > 2b) use any scripting language to process and gzip them avoiding
> > a bit of disk IO

> What disk IO are you trying to save and why?

Because this is going to be the largest write operation the all
system will have to handle during the day.
I'm not interested in fast complicated queries, planning,
transactions, caching... I just need to get a whole table pass it
through a filter and output several filtered "versions" of the same
table.
So I think the largest cost of the operation will be IO.
\copy should be optimised for "raw" data output, but maybe all its
advantages get lost once I've to use pipes and adding complexity to
filtering.

> > Does PostgreSQL offer me any contrib, module, technique... to
> > save some IO (and maybe disk space for temporary results?).
> >
> > Are there any memory usage implication if I'm doing a:
> > pg_query("select a,b,c from verylargetable; --no where clause");
> > vs.
> > the \copy equivalent
> > any way to avoid them?
>
> As far as I understand it will get all the data from the database
> into memory first and then your code gets a chance.  For large
> datasets this obviously doesn't work well.  CURSORs are you friend
> here.

I was reading about all the php documents and trying to understand
how buffers and memory usage works, so I gave a look to MySQL
documents too...
MySQL has mysql_unbuffered_query.
So I was wondering how memory is managed on the server and on
clients.

What's going to happen when I do a
$result=pg_query("select * from t1;");
while($row=pg_fetch_array($result)) {
}
vs.
using cursors...
vs.
asynchronous query (they just look as non stopping queries with no
relationship with memory usage)

Where are the buffers etc...

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


В списке pgsql-general по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: Postgresql installation with ssh connection.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] string_to_array with empty input