Обсуждение: Disabling WAL for bulk data loads

Поиск
Список
Период
Сортировка

Disabling WAL for bulk data loads

От
"Kevin Keith"
Дата:
I am coming from an Oracle background - which in the case of bulk data loads
there were several options I had where I could disable writing to the redo
log to speed up the bulk data load (i.e. direct load, set the user session
in no archive logging, set the affected tables to have no logging).

I know the COPY command is one option - however it appears the data would
need to be in formatted file in order to use it correct? I want to avoid
writing a new file out for the COPY command and loading that.

What other options does Postgres 7.4 provide which would allow data loads to
bypass writing to the WAL? I don't need to have this enabled - because in
the event of a database crash, I would simply reload the data from the
source files as recovery.

Thanks,

Kevin

_________________________________________________________________
Don�t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


Re: Disabling WAL for bulk data loads

От
"Jim C. Nasby"
Дата:
On Fri, Sep 09, 2005 at 11:02:33AM -0500, Kevin Keith wrote:
> I am coming from an Oracle background - which in the case of bulk data
> loads there were several options I had where I could disable writing to the
> redo log to speed up the bulk data load (i.e. direct load, set the user
> session in no archive logging, set the affected tables to have no logging).
>
> I know the COPY command is one option - however it appears the data would
> need to be in formatted file in order to use it correct? I want to avoid
> writing a new file out for the COPY command and loading that.

Well, COPY will give you the best performance. It's also fairly flexable
when it comes to formatting. It shouldn't be difficult to get Oracle's
export tool to generate data that COPY can load.

> What other options does Postgres 7.4 provide which would allow data loads
> to bypass writing to the WAL? I don't need to have this enabled - because
> in the event of a database crash, I would simply reload the data from the
> source files as recovery.

At this point, your only option to change fsync to off in
postgresql.conf and reload the postmaster (kill -HUP). There are plans
to improve support for non-logged tables, but I don't think any of it is
in 8.1.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Disabling WAL for bulk data loads

От
Chris Browne
Дата:
kevinrkeith@hotmail.com ("Kevin Keith") writes:
> I am coming from an Oracle background - which in the case of bulk data
> loads there were several options I had where I could disable writing
> to the redo log to speed up the bulk data load (i.e. direct load, set
> the user session in no archive logging, set the affected tables to
> have no logging).
>
> I know the COPY command is one option - however it appears the data
> would need to be in formatted file in order to use it correct? I want
> to avoid writing a new file out for the COPY command and loading that.

The options get a bit better in 8.1, as the COPY command supports some
more format options.

But you'll DEFINITELY get a big boost if you can format things so COPY
can work for you.

> What other options does Postgres 7.4 provide which would allow data
> loads to bypass writing to the WAL? I don't need to have this enabled
> - because in the event of a database crash, I would simply reload the
> data from the source files as recovery.

Arguably you could turn "fsync = off" in postgresql.conf; there is
regrettably the increased risk that a crash would trash the ENTIRE
database, not just the recently-loaded data.

It's somewhat regrettable that you can't turn off logging just for a
single transaction's context; that would probably be the ideal for
this...
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/spreadsheets.html
"If roach hotels worked on pointy haired people, Microsoft would die."
-- Pete Koren

Re: Disabling WAL for bulk data loads

От
tgoodair@ca.afilias.info (Tim Goodaire)
Дата:
On Fri, Sep 09, 2005 at 01:14:28PM -0400, Chris Browne wrote:
> kevinrkeith@hotmail.com ("Kevin Keith") writes:
> > I am coming from an Oracle background - which in the case of bulk data
> > loads there were several options I had where I could disable writing
> > to the redo log to speed up the bulk data load (i.e. direct load, set
> > the user session in no archive logging, set the affected tables to
> > have no logging).
> >
> > I know the COPY command is one option - however it appears the data
> > would need to be in formatted file in order to use it correct? I want
> > to avoid writing a new file out for the COPY command and loading that.
>
> The options get a bit better in 8.1, as the COPY command supports some
> more format options.
>
> But you'll DEFINITELY get a big boost if you can format things so COPY
> can work for you.
>
> > What other options does Postgres 7.4 provide which would allow data
> > loads to bypass writing to the WAL? I don't need to have this enabled
> > - because in the event of a database crash, I would simply reload the
> > data from the source files as recovery.
>
> Arguably you could turn "fsync = off" in postgresql.conf; there is
> regrettably the increased risk that a crash would trash the ENTIRE
> database, not just the recently-loaded data.

Actually, it may trash all of the databases on that cluster!

>
> It's somewhat regrettable that you can't turn off logging just for a
> single transaction's context; that would probably be the ideal for
> this...
> --
> output = ("cbbrowne" "@" "cbbrowne.com")
> http://cbbrowne.com/info/spreadsheets.html
> "If roach hotels worked on pointy haired people, Microsoft would die."
> -- Pete Koren
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
Tim Goodaire    416-673-4126    tgoodair@ca.afilias.info
Database Administrator, Afilias Canada Corp.

Вложения

Re: Disabling WAL for bulk data loads

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Well, COPY will give you the best performance. It's also fairly flexable
> when it comes to formatting. It shouldn't be difficult to get Oracle's
> export tool to generate data that COPY can load.

With Oracle, everything is difficult. Their export utility does not have
a way to dump things as a series of INSERT commands, nor as tab-delimited
data. There are other tools/scripts around that can do this, but as a
general rule, the word "difficult" is one of the more mild adjectives
usually found in the same sentence as "Oracle utility". :)

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509101614
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFDIz7IvJuQZxSWSsgRAthrAJ9YaHzRCcaixLJ+N6WyWVYmoxXpMgCcCz6T
iVfCGIk7hASpQMawDEWzzaY=
=jFH3
-----END PGP SIGNATURE-----



Re: Disabling WAL for bulk data loads

От
"Jim C. Nasby"
Дата:
On Sat, Sep 10, 2005 at 08:15:50PM -0000, Greg Sabino Mullane wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > Well, COPY will give you the best performance. It's also fairly flexable
> > when it comes to formatting. It shouldn't be difficult to get Oracle's
> > export tool to generate data that COPY can load.
>
> With Oracle, everything is difficult. Their export utility does not have
> a way to dump things as a series of INSERT commands, nor as tab-delimited
> data. There are other tools/scripts around that can do this, but as a
> general rule, the word "difficult" is one of the more mild adjectives
> usually found in the same sentence as "Oracle utility". :)

Can't it export CSV? Copy can deal with CSV.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Disabling WAL for bulk data loads

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jim C. Nasby asked:
> Can't it export CSV? Copy can deal with CSV.

Nope. I was not exaggerating the difficulty of their utilities earlier.
Oracle's export program writes to a binary format that is readable
only by Oracle's import program. And don't get me started on "sqlplus",
their command-line equivalent to psql. Quite possibly the worst
command-line database interface ever. :)

On the plus side, there are third-party tools that can do a CSV or other
type of dump from an Oracle database.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200509111014
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFDJDxRvJuQZxSWSsgRAsbPAJ96Y3mKIa8NEaEMJyMxSuTgyfNRQgCgjD0v
H9qeHqpDibCXPUEhTx0W3W0=
=7FCI
-----END PGP SIGNATURE-----



Re: Disabling WAL for bulk data loads

От
William Yu
Дата:
Kevin Keith wrote:
> I am coming from an Oracle background - which in the case of bulk data
> loads there were several options I had where I could disable writing to
> the redo log to speed up the bulk data load (i.e. direct load, set the
> user session in no archive logging, set the affected tables to have no
> logging).

If you have enough memory, you can always make a ramdisk to store the
WAL temporarily while you bulk load data. I also temporarily alter the
config to set checkpoint_segments = 1 to use as few pg_xlog files as
possible.

Re: Disabling WAL for bulk data loads

От
Tom Lane
Дата:
William Yu <wyu@talisys.com> writes:
> If you have enough memory, you can always make a ramdisk to store the
> WAL temporarily while you bulk load data. I also temporarily alter the
> config to set checkpoint_segments = 1 to use as few pg_xlog files as
> possible.

Really?  That seems pretty counterproductive to me.  I've never seen
a case where reducing checkpoint_segments increased performance.
Checkpoints are expensive, and usually you want 'em far apart.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org


Re: Disabling WAL for bulk data loads

От
William Yu
Дата:
Tom Lane wrote:
> William Yu <wyu@talisys.com> writes:
>
>>If you have enough memory, you can always make a ramdisk to store the
>>WAL temporarily while you bulk load data. I also temporarily alter the
>>config to set checkpoint_segments = 1 to use as few pg_xlog files as
>>possible.
>
>
> Really?  That seems pretty counterproductive to me.  I've never seen
> a case where reducing checkpoint_segments increased performance.
> Checkpoints are expensive, and usually you want 'em far apart.

Just poor grammar on my part. I meant decreasing checkpoint_segments
when using a ramdisk for WAL in order to keep Postgres from croaking on
lack of disk space for pg_xlog.


Re: Disabling WAL for bulk data loads

От
Tom Lane
Дата:
William Yu <wyu@talisys.com> writes:
> If you have enough memory, you can always make a ramdisk to store the
> WAL temporarily while you bulk load data. I also temporarily alter the
> config to set checkpoint_segments = 1 to use as few pg_xlog files as
> possible.

Really?  That seems pretty counterproductive to me.  I've never seen
a case where reducing checkpoint_segments increased performance.
Checkpoints are expensive, and usually you want 'em far apart.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org