Обсуждение: skip WAL on COPY patch

Поиск
Список
Период
Сортировка

skip WAL on COPY patch

От
Steve Singer
Дата:
The attached patch adds an option to the COPY command to skip writing
WAL when the following conditions are all met:

1) The table is empty (zero size on disk)
2) The copy command can obtain an access exclusive lock on the table
with out blocking.
3) The WAL isn't needed for replication

For example

COPY a FROM '/tmp/a.txt' (SKIP_WAL);

A non-default option to the copy command is required because the copy
will block out any concurrent access to the table which would be
undesirable in some cases and is different from the current behaviour.

This can safely be done because if the transaction does not commit the
empty version of the data files are still available.  The COPY command
already skips WAL if the table was created in the current transaction.


There was a discussion on something similar before[1] but I didn't see
any discussion of having it only obtain the lock if it can do so without
waiting (nor could I find in the archives what happened to that patch).
  I'm not attached to the SKIP_WAL vs LOCK as the option


1- see http://archives.postgresql.org/pgsql-patches/2005-12/msg00206.php

Steve

Вложения

Re: skip WAL on COPY patch

От
Tom Lane
Дата:
Steve Singer <ssinger@ca.afilias.info> writes:
> The attached patch adds an option to the COPY command to skip writing 
> WAL when the following conditions are all met:

> 1) The table is empty (zero size on disk)
> 2) The copy command can obtain an access exclusive lock on the table 
> with out blocking.
> 3) The WAL isn't needed for replication

Exposing this as a user-visible option seems a seriously bad idea.
We'd have to support that forever.  ISTM it ought to be possible to
avoid the exclusive lock ... maybe not with this particular
implementation, but somehow.
        regards, tom lane


Re: skip WAL on COPY patch

От
Robert Haas
Дата:
On Tue, Aug 23, 2011 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Steve Singer <ssinger@ca.afilias.info> writes:
>> The attached patch adds an option to the COPY command to skip writing
>> WAL when the following conditions are all met:
>
>> 1) The table is empty (zero size on disk)
>> 2) The copy command can obtain an access exclusive lock on the table
>> with out blocking.
>> 3) The WAL isn't needed for replication
>
> Exposing this as a user-visible option seems a seriously bad idea.
> We'd have to support that forever.  ISTM it ought to be possible to
> avoid the exclusive lock ... maybe not with this particular
> implementation, but somehow.

Also, if it only works when the table is zero size on disk, you might
as well just let people truncate their already-empty tables when they
want this optimization.

What I think would be really interesting is a way to make this work
when the table *isn't* empty.  In other words, have a COPY option that
(1) takes an exclusive lock on the table, (2) writes the data being
inserted into new pages beyond the old EOF, and (3) arranges for crash
recovery or transaction abort to truncate the table back to its
previous length.  Then you could do fast bulk loads even into a table
that's already populated, so long as you don't mind that the table
will be excusive-locked and freespace within existing heap pages won't
be reused.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: skip WAL on COPY patch

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> What I think would be really interesting is a way to make this work
> when the table *isn't* empty.  In other words, have a COPY option that
> (1) takes an exclusive lock on the table, (2) writes the data being
> inserted into new pages beyond the old EOF, and (3) arranges for crash
> recovery or transaction abort to truncate the table back to its
> previous length.  Then you could do fast bulk loads even into a table
> that's already populated, so long as you don't mind that the table
> will be excusive-locked and freespace within existing heap pages won't
> be reused.

What are you going to do with the table's indexes?
        regards, tom lane


Re: skip WAL on COPY patch

От
Alvaro Herrera
Дата:
Excerpts from Robert Haas's message of mar ago 23 17:08:50 -0300 2011:

> What I think would be really interesting is a way to make this work
> when the table *isn't* empty.  In other words, have a COPY option that
> (1) takes an exclusive lock on the table, (2) writes the data being
> inserted into new pages beyond the old EOF, and (3) arranges for crash
> recovery or transaction abort to truncate the table back to its
> previous length.  Then you could do fast bulk loads even into a table
> that's already populated, so long as you don't mind that the table
> will be excusive-locked and freespace within existing heap pages won't
> be reused.

It seems to me this would be relatively simple if we allowed segments
that are not a full GB in length.  That way, COPY could write into a
whole segment and "attach" it to the table at commit time (say, by
renaming).

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: skip WAL on COPY patch

От
Steve Singer
Дата:
On 11-08-23 04:17 PM, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> What I think would be really interesting is a way to make this work
>> when the table *isn't* empty.  In other words, have a COPY option that
>> (1) takes an exclusive lock on the table, (2) writes the data being
>> inserted into new pages beyond the old EOF, and (3) arranges for crash
>> recovery or transaction abort to truncate the table back to its
>> previous length.  Then you could do fast bulk loads even into a table
>> that's already populated, so long as you don't mind that the table
>> will be excusive-locked and freespace within existing heap pages won't
>> be reused.
>
> What are you going to do with the table's indexes?
>
>             regards, tom lane
>

What about not updating the indexes during the copy operation then to an 
automatic rebuild of the indexes after the copy (but during the same 
transaction).  If your only adding a few rows to a large table this 
wouldn't be what you want, but if your only adding a few rows then a 
small amount of WAL isn't a big concern either.




Re: skip WAL on COPY patch

От
Robert Haas
Дата:
On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What I think would be really interesting is a way to make this work
>> when the table *isn't* empty.  In other words, have a COPY option that
>> (1) takes an exclusive lock on the table, (2) writes the data being
>> inserted into new pages beyond the old EOF, and (3) arranges for crash
>> recovery or transaction abort to truncate the table back to its
>> previous length.  Then you could do fast bulk loads even into a table
>> that's already populated, so long as you don't mind that the table
>> will be excusive-locked and freespace within existing heap pages won't
>> be reused.
>
> What are you going to do with the table's indexes?

Oh, hmm.  That's awkward.

I suppose you could come up with some solution that involved saving
preimages of each already-existing index page that was modified until
commit.  If you crash before commit, you truncate away all the added
pages and roll back to the preimages of any modified pages.  That's
pretty complex, though, and I'm not sure that it would be enough of a
win to justify the effort.

It also sounds suspiciously like a poor-man's implementation of a
rollback segment; and if we ever decide we want to have an option for
rollback segments, we probably want more than a poor man's version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: skip WAL on COPY patch

От
Alvaro Herrera
Дата:
Excerpts from Robert Haas's message of mar ago 23 17:43:13 -0300 2011:
> On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> What I think would be really interesting is a way to make this work
> >> when the table *isn't* empty.  In other words, have a COPY option that
> >> (1) takes an exclusive lock on the table, (2) writes the data being
> >> inserted into new pages beyond the old EOF, and (3) arranges for crash
> >> recovery or transaction abort to truncate the table back to its
> >> previous length.  Then you could do fast bulk loads even into a table
> >> that's already populated, so long as you don't mind that the table
> >> will be excusive-locked and freespace within existing heap pages won't
> >> be reused.
> >
> > What are you going to do with the table's indexes?
> 
> Oh, hmm.  That's awkward.

If you see what I proposed, it's simple: you can scan the new segment(s)
and index the tuples found there (maybe in bulk which would be even
faster).

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: skip WAL on COPY patch

От
Jeff Davis
Дата:
On Tue, 2011-08-23 at 15:05 -0400, Tom Lane wrote:
> Steve Singer <ssinger@ca.afilias.info> writes:
> > The attached patch adds an option to the COPY command to skip writing 
> > WAL when the following conditions are all met:
> 
> > 1) The table is empty (zero size on disk)
> > 2) The copy command can obtain an access exclusive lock on the table 
> > with out blocking.
> > 3) The WAL isn't needed for replication
> 
> Exposing this as a user-visible option seems a seriously bad idea.

In that particular way, I agree. But it might be useful if there were a
more general declarative option like "BULKLOAD". We might then use that
information for a number of optimizations that make sense for large
loads.

Regards,Jeff Davis



Re: skip WAL on COPY patch

От
Robert Haas
Дата:
On Tue, Aug 23, 2011 at 4:51 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of mar ago 23 17:43:13 -0300 2011:
>> On Tue, Aug 23, 2011 at 4:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Robert Haas <robertmhaas@gmail.com> writes:
>> >> What I think would be really interesting is a way to make this work
>> >> when the table *isn't* empty.  In other words, have a COPY option that
>> >> (1) takes an exclusive lock on the table, (2) writes the data being
>> >> inserted into new pages beyond the old EOF, and (3) arranges for crash
>> >> recovery or transaction abort to truncate the table back to its
>> >> previous length.  Then you could do fast bulk loads even into a table
>> >> that's already populated, so long as you don't mind that the table
>> >> will be excusive-locked and freespace within existing heap pages won't
>> >> be reused.
>> >
>> > What are you going to do with the table's indexes?
>>
>> Oh, hmm.  That's awkward.
>
> If you see what I proposed, it's simple: you can scan the new segment(s)
> and index the tuples found there (maybe in bulk which would be even
> faster).

You can do that much even if you just append to the file - you don't
need variable-length segments to make that part work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company