Обсуждение: skip WAL on COPY patch
The attached patch adds an option to the COPY command to skip writing WAL when the following conditions are all met: 1) The table is empty (zero size on disk) 2) The copy command can obtain an access exclusive lock on the table with out blocking. 3) The WAL isn't needed for replication For example COPY a FROM '/tmp/a.txt' (SKIP_WAL); A non-default option to the copy command is required because the copy will block out any concurrent access to the table which would be undesirable in some cases and is different from the current behaviour. This can safely be done because if the transaction does not commit the empty version of the data files are still available. The COPY command already skips WAL if the table was created in the current transaction. There was a discussion on something similar before[1] but I didn't see any discussion of having it only obtain the lock if it can do so without waiting (nor could I find in the archives what happened to that patch). I'm not attached to the SKIP_WAL vs LOCK as the option 1- see http://archives.postgresql.org/pgsql-patches/2005-12/msg00206.php Steve
Вложения
Steve Singer <ssinger@ca.afilias.info> writes: > The attached patch adds an option to the COPY command to skip writing > WAL when the following conditions are all met: > 1) The table is empty (zero size on disk) > 2) The copy command can obtain an access exclusive lock on the table > with out blocking. > 3) The WAL isn't needed for replication Exposing this as a user-visible option seems a seriously bad idea. We'd have to support that forever. ISTM it ought to be possible to avoid the exclusive lock ... maybe not with this particular implementation, but somehow. regards, tom lane
On Tue, Aug 23, 2011 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Steve Singer <ssinger@ca.afilias.info> writes: >> The attached patch adds an option to the COPY command to skip writing >> WAL when the following conditions are all met: > >> 1) The table is empty (zero size on disk) >> 2) The copy command can obtain an access exclusive lock on the table >> with out blocking. >> 3) The WAL isn't needed for replication > > Exposing this as a user-visible option seems a seriously bad idea. > We'd have to support that forever. ISTM it ought to be possible to > avoid the exclusive lock ... maybe not with this particular > implementation, but somehow. Also, if it only works when the table is zero size on disk, you might as well just let people truncate their already-empty tables when they want this optimization. What I think would be really interesting is a way to make this work when the table *isn't* empty. In other words, have a COPY option that (1) takes an exclusive lock on the table, (2) writes the data being inserted into new pages beyond the old EOF, and (3) arranges for crash recovery or transaction abort to truncate the table back to its previous length. Then you could do fast bulk loads even into a table that's already populated, so long as you don't mind that the table will be excusive-locked and freespace within existing heap pages won't be reused. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > What I think would be really interesting is a way to make this work > when the table *isn't* empty. In other words, have a COPY option that > (1) takes an exclusive lock on the table, (2) writes the data being > inserted into new pages beyond the old EOF, and (3) arranges for crash > recovery or transaction abort to truncate the table back to its > previous length. Then you could do fast bulk loads even into a table > that's already populated, so long as you don't mind that the table > will be excusive-locked and freespace within existing heap pages won't > be reused. What are you going to do with the table's indexes? regards, tom lane
Excerpts from Robert Haas's message of mar ago 23 17:08:50 -0300 2011: > What I think would be really interesting is a way to make this work > when the table *isn't* empty. In other words, have a COPY option that > (1) takes an exclusive lock on the table, (2) writes the data being > inserted into new pages beyond the old EOF, and (3) arranges for crash > recovery or transaction abort to truncate the table back to its > previous length. Then you could do fast bulk loads even into a table > that's already populated, so long as you don't mind that the table > will be excusive-locked and freespace within existing heap pages won't > be reused. It seems to me this would be relatively simple if we allowed segments that are not a full GB in length. That way, COPY could write into a whole segment and "attach" it to the table at commit time (say, by renaming). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 11-08-23 04:17 PM, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> What I think would be really interesting is a way to make this work >> when the table *isn't* empty. In other words, have a COPY option that >> (1) takes an exclusive lock on the table, (2) writes the data being >> inserted into new pages beyond the old EOF, and (3) arranges for crash >> recovery or transaction abort to truncate the table back to its >> previous length. Then you could do fast bulk loads even into a table >> that's already populated, so long as you don't mind that the table >> will be excusive-locked and freespace within existing heap pages won't >> be reused. > > What are you going to do with the table's indexes? > > regards, tom lane > What about not updating the indexes during the copy operation then to an automatic rebuild of the indexes after the copy (but during the same transaction). If your only adding a few rows to a large table this wouldn't be what you want, but if your only adding a few rows then a small amount of WAL isn't a big concern either.
On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> What I think would be really interesting is a way to make this work >> when the table *isn't* empty. In other words, have a COPY option that >> (1) takes an exclusive lock on the table, (2) writes the data being >> inserted into new pages beyond the old EOF, and (3) arranges for crash >> recovery or transaction abort to truncate the table back to its >> previous length. Then you could do fast bulk loads even into a table >> that's already populated, so long as you don't mind that the table >> will be excusive-locked and freespace within existing heap pages won't >> be reused. > > What are you going to do with the table's indexes? Oh, hmm. That's awkward. I suppose you could come up with some solution that involved saving preimages of each already-existing index page that was modified until commit. If you crash before commit, you truncate away all the added pages and roll back to the preimages of any modified pages. That's pretty complex, though, and I'm not sure that it would be enough of a win to justify the effort. It also sounds suspiciously like a poor-man's implementation of a rollback segment; and if we ever decide we want to have an option for rollback segments, we probably want more than a poor man's version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mar ago 23 17:43:13 -0300 2011: > On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> What I think would be really interesting is a way to make this work > >> when the table *isn't* empty. In other words, have a COPY option that > >> (1) takes an exclusive lock on the table, (2) writes the data being > >> inserted into new pages beyond the old EOF, and (3) arranges for crash > >> recovery or transaction abort to truncate the table back to its > >> previous length. Then you could do fast bulk loads even into a table > >> that's already populated, so long as you don't mind that the table > >> will be excusive-locked and freespace within existing heap pages won't > >> be reused. > > > > What are you going to do with the table's indexes? > > Oh, hmm. That's awkward. If you see what I proposed, it's simple: you can scan the new segment(s) and index the tuples found there (maybe in bulk which would be even faster). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2011-08-23 at 15:05 -0400, Tom Lane wrote: > Steve Singer <ssinger@ca.afilias.info> writes: > > The attached patch adds an option to the COPY command to skip writing > > WAL when the following conditions are all met: > > > 1) The table is empty (zero size on disk) > > 2) The copy command can obtain an access exclusive lock on the table > > with out blocking. > > 3) The WAL isn't needed for replication > > Exposing this as a user-visible option seems a seriously bad idea. In that particular way, I agree. But it might be useful if there were a more general declarative option like "BULKLOAD". We might then use that information for a number of optimizations that make sense for large loads. Regards,Jeff Davis
On Tue, Aug 23, 2011 at 4:51 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mar ago 23 17:43:13 -0300 2011: >> On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Robert Haas <robertmhaas@gmail.com> writes: >> >> What I think would be really interesting is a way to make this work >> >> when the table *isn't* empty. In other words, have a COPY option that >> >> (1) takes an exclusive lock on the table, (2) writes the data being >> >> inserted into new pages beyond the old EOF, and (3) arranges for crash >> >> recovery or transaction abort to truncate the table back to its >> >> previous length. Then you could do fast bulk loads even into a table >> >> that's already populated, so long as you don't mind that the table >> >> will be excusive-locked and freespace within existing heap pages won't >> >> be reused. >> > >> > What are you going to do with the table's indexes? >> >> Oh, hmm. That's awkward. > > If you see what I proposed, it's simple: you can scan the new segment(s) > and index the tuples found there (maybe in bulk which would be even > faster). You can do that much even if you just append to the file - you don't need variable-length segments to make that part work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company