Обсуждение: Disabling WAL for bulk data loads
I am coming from an Oracle background - which in the case of bulk data loads there were several options I had where I could disable writing to the redo log to speed up the bulk data load (i.e. direct load, set the user session in no archive logging, set the affected tables to have no logging). I know the COPY command is one option - however it appears the data would need to be in formatted file in order to use it correct? I want to avoid writing a new file out for the COPY command and loading that. What other options does Postgres 7.4 provide which would allow data loads to bypass writing to the WAL? I don't need to have this enabled - because in the event of a database crash, I would simply reload the data from the source files as recovery. Thanks, Kevin _________________________________________________________________ Don�t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/
On Fri, Sep 09, 2005 at 11:02:33AM -0500, Kevin Keith wrote: > I am coming from an Oracle background - which in the case of bulk data > loads there were several options I had where I could disable writing to the > redo log to speed up the bulk data load (i.e. direct load, set the user > session in no archive logging, set the affected tables to have no logging). > > I know the COPY command is one option - however it appears the data would > need to be in formatted file in order to use it correct? I want to avoid > writing a new file out for the COPY command and loading that. Well, COPY will give you the best performance. It's also fairly flexable when it comes to formatting. It shouldn't be difficult to get Oracle's export tool to generate data that COPY can load. > What other options does Postgres 7.4 provide which would allow data loads > to bypass writing to the WAL? I don't need to have this enabled - because > in the event of a database crash, I would simply reload the data from the > source files as recovery. At this point, your only option to change fsync to off in postgresql.conf and reload the postmaster (kill -HUP). There are plans to improve support for non-logged tables, but I don't think any of it is in 8.1. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
kevinrkeith@hotmail.com ("Kevin Keith") writes: > I am coming from an Oracle background - which in the case of bulk data > loads there were several options I had where I could disable writing > to the redo log to speed up the bulk data load (i.e. direct load, set > the user session in no archive logging, set the affected tables to > have no logging). > > I know the COPY command is one option - however it appears the data > would need to be in formatted file in order to use it correct? I want > to avoid writing a new file out for the COPY command and loading that. The options get a bit better in 8.1, as the COPY command supports some more format options. But you'll DEFINITELY get a big boost if you can format things so COPY can work for you. > What other options does Postgres 7.4 provide which would allow data > loads to bypass writing to the WAL? I don't need to have this enabled > - because in the event of a database crash, I would simply reload the > data from the source files as recovery. Arguably you could turn "fsync = off" in postgresql.conf; there is regrettably the increased risk that a crash would trash the ENTIRE database, not just the recently-loaded data. It's somewhat regrettable that you can't turn off logging just for a single transaction's context; that would probably be the ideal for this... -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/spreadsheets.html "If roach hotels worked on pointy haired people, Microsoft would die." -- Pete Koren
On Fri, Sep 09, 2005 at 01:14:28PM -0400, Chris Browne wrote: > kevinrkeith@hotmail.com ("Kevin Keith") writes: > > I am coming from an Oracle background - which in the case of bulk data > > loads there were several options I had where I could disable writing > > to the redo log to speed up the bulk data load (i.e. direct load, set > > the user session in no archive logging, set the affected tables to > > have no logging). > > > > I know the COPY command is one option - however it appears the data > > would need to be in formatted file in order to use it correct? I want > > to avoid writing a new file out for the COPY command and loading that. > > The options get a bit better in 8.1, as the COPY command supports some > more format options. > > But you'll DEFINITELY get a big boost if you can format things so COPY > can work for you. > > > What other options does Postgres 7.4 provide which would allow data > > loads to bypass writing to the WAL? I don't need to have this enabled > > - because in the event of a database crash, I would simply reload the > > data from the source files as recovery. > > Arguably you could turn "fsync = off" in postgresql.conf; there is > regrettably the increased risk that a crash would trash the ENTIRE > database, not just the recently-loaded data. Actually, it may trash all of the databases on that cluster! > > It's somewhat regrettable that you can't turn off logging just for a > single transaction's context; that would probably be the ideal for > this... > -- > output = ("cbbrowne" "@" "cbbrowne.com") > http://cbbrowne.com/info/spreadsheets.html > "If roach hotels worked on pointy haired people, Microsoft would die." > -- Pete Koren > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Tim Goodaire 416-673-4126 tgoodair@ca.afilias.info Database Administrator, Afilias Canada Corp.
Вложения
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Well, COPY will give you the best performance. It's also fairly flexable > when it comes to formatting. It shouldn't be difficult to get Oracle's > export tool to generate data that COPY can load. With Oracle, everything is difficult. Their export utility does not have a way to dump things as a series of INSERT commands, nor as tab-delimited data. There are other tools/scripts around that can do this, but as a general rule, the word "difficult" is one of the more mild adjectives usually found in the same sentence as "Oracle utility". :) - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200509101614 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDIz7IvJuQZxSWSsgRAthrAJ9YaHzRCcaixLJ+N6WyWVYmoxXpMgCcCz6T iVfCGIk7hASpQMawDEWzzaY= =jFH3 -----END PGP SIGNATURE-----
On Sat, Sep 10, 2005 at 08:15:50PM -0000, Greg Sabino Mullane wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Well, COPY will give you the best performance. It's also fairly flexable > > when it comes to formatting. It shouldn't be difficult to get Oracle's > > export tool to generate data that COPY can load. > > With Oracle, everything is difficult. Their export utility does not have > a way to dump things as a series of INSERT commands, nor as tab-delimited > data. There are other tools/scripts around that can do this, but as a > general rule, the word "difficult" is one of the more mild adjectives > usually found in the same sentence as "Oracle utility". :) Can't it export CSV? Copy can deal with CSV. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jim C. Nasby asked: > Can't it export CSV? Copy can deal with CSV. Nope. I was not exaggerating the difficulty of their utilities earlier. Oracle's export program writes to a binary format that is readable only by Oracle's import program. And don't get me started on "sqlplus", their command-line equivalent to psql. Quite possibly the worst command-line database interface ever. :) On the plus side, there are third-party tools that can do a CSV or other type of dump from an Oracle database. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200509111014 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDJDxRvJuQZxSWSsgRAsbPAJ96Y3mKIa8NEaEMJyMxSuTgyfNRQgCgjD0v H9qeHqpDibCXPUEhTx0W3W0= =7FCI -----END PGP SIGNATURE-----
Kevin Keith wrote: > I am coming from an Oracle background - which in the case of bulk data > loads there were several options I had where I could disable writing to > the redo log to speed up the bulk data load (i.e. direct load, set the > user session in no archive logging, set the affected tables to have no > logging). If you have enough memory, you can always make a ramdisk to store the WAL temporarily while you bulk load data. I also temporarily alter the config to set checkpoint_segments = 1 to use as few pg_xlog files as possible.
William Yu <wyu@talisys.com> writes: > If you have enough memory, you can always make a ramdisk to store the > WAL temporarily while you bulk load data. I also temporarily alter the > config to set checkpoint_segments = 1 to use as few pg_xlog files as > possible. Really? That seems pretty counterproductive to me. I've never seen a case where reducing checkpoint_segments increased performance. Checkpoints are expensive, and usually you want 'em far apart. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Tom Lane wrote: > William Yu <wyu@talisys.com> writes: > >>If you have enough memory, you can always make a ramdisk to store the >>WAL temporarily while you bulk load data. I also temporarily alter the >>config to set checkpoint_segments = 1 to use as few pg_xlog files as >>possible. > > > Really? That seems pretty counterproductive to me. I've never seen > a case where reducing checkpoint_segments increased performance. > Checkpoints are expensive, and usually you want 'em far apart. Just poor grammar on my part. I meant decreasing checkpoint_segments when using a ramdisk for WAL in order to keep Postgres from croaking on lack of disk space for pg_xlog.
William Yu <wyu@talisys.com> writes: > If you have enough memory, you can always make a ramdisk to store the > WAL temporarily while you bulk load data. I also temporarily alter the > config to set checkpoint_segments = 1 to use as few pg_xlog files as > possible. Really? That seems pretty counterproductive to me. I've never seen a case where reducing checkpoint_segments increased performance. Checkpoints are expensive, and usually you want 'em far apart. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org