Обсуждение: Detach/attach table and index data files from one cluster to another
<div dir="ltr"><p class="" style="style">Hello,<p class="">The current process of transferring data files from one clusterto another by using pg_dump and pg_restore is time consuming.<p class="">The proposed tool tries to make migrationfaster for tables and indices only by copying their binary data files. This is like pg_upgrade but used for migrationof table and indices<p class=""><p class=""><b><span style="font-size:10pt"> </span></b>The discussion here @ <a href="http://www.postgresql.org/message-id/CAA-aLv5cQf09zvFRCB1XXUQLSp-ouX0S_Hq6ryScd6CtamipFQ@mail.gmail.com">http://www.postgresql.org/message-id/CAA-aLv5cQf09zvFRCB1XXUQLSp-ouX0S_Hq6ryScd6CtamipFQ@mail.gmail.com</a><p class=""><pclass="">speaks of possibility detaching/attaching databases as an alternative to dump/restore. But the processof freezing XID’s and zeroing out LSN’s make the solution equally time consuming if not more.<p class=""> But if weconsider just tables and indexes to be detached/reattached, would this be a viable alternative to dump and restore of tables?<pclass=""><p class=""> The same discussion indicates it could be done but is more complicated as one has to dealwith system catalogs of the newly mounted table and map old OID’s to new ones. This is required to ensure consistencyin roles, and objects owned by those roles.<p class=""><p class=""> We would also need to ensure LSN values ofthe reattached pages are less than the current WAL endpoint in receiver.<p class=""><p class=""><span style="font-size:10pt;font-family:Arial,sans-serif;color:black"> </span><span style="color:black;font-family:Arial,sans-serif;font-size:10pt">Arethere any more issues we need to be aware of?</span><pclass=""><span style="color:black;font-family:Arial,sans-serif;font-size:10pt">regards</span><p class="" style="style"><spanstyle="font-size:10pt;font-family:Arial,sans-serif;color:black">Sameer</span></div>
Sameer Thakur <samthakur74@gmail.com> writes: > The proposed tool tries to make migration faster for tables and indices > only by copying their binary data files. There's 0 chance of making that work, because the two databases wouldn't have the same notions of committed XIDs. You apparently don't understand what you read in the other discussion --- the steps you are objecting to are not optional, whether copying a whole tablespace or only one table. regards, tom lane
On 04/12/2013 10:15 AM, Tom Lane wrote: > Sameer Thakur <samthakur74@gmail.com> writes: >> The proposed tool tries to make migration faster for tables and indices >> only by copying their binary data files. > There's 0 chance of making that work, because the two databases wouldn't > have the same notions of committed XIDs. Yeah. Trying to think way outside the box, could we invent some sort of fixup mechanism that could be applied to adopted files? Of course, that could slow things down so much that it wouldn't be worth it, but it might be a nice research project. cheers andrew
Andrew Dunstan escribió: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >Sameer Thakur <samthakur74@gmail.com> writes: > >>The proposed tool tries to make migration faster for tables and indices > >>only by copying their binary data files. > >There's 0 chance of making that work, because the two databases wouldn't > >have the same notions of committed XIDs. > > Yeah. Trying to think way outside the box, could we invent some sort > of fixup mechanism that could be applied to adopted files? Of > course, that could slow things down so much that it wouldn't be > worth it, but it might be a nice research project. I think the fixup procedure involves freezing Xids (prior to the transporting), which the OP said he didn't want to do. If you don't freeze beforehand, there's not enough info in the new cluster to know which tuples are dead/alive. Another option would be to have a "private" copy of pg_clog/pg_subtrans for the transported table(s), but that seems very difficult to arrange. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andrew Dunstan <andrew@dunslane.net> writes: > On 04/12/2013 10:15 AM, Tom Lane wrote: >> There's 0 chance of making that work, because the two databases wouldn't >> have the same notions of committed XIDs. > Yeah. Trying to think way outside the box, could we invent some sort of > fixup mechanism that could be applied to adopted files? Well, it wouldn't be that hard to replace XIDs with FrozenXID or InvalidXID as appropriate, if you had access to the source database's clog while you did the copying. It just wouldn't be very fast. I suppose it would still be faster than a COPY transfer, but I'm not sure it'd be enough faster to justify the work and the additional portability hits you'd be taking. regards, tom lane
On 2013-04-12 12:14:24 -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >> There's 0 chance of making that work, because the two databases wouldn't > >> have the same notions of committed XIDs. > > > Yeah. Trying to think way outside the box, could we invent some sort of > > fixup mechanism that could be applied to adopted files? > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. I think if one goes over the heap and hint bits everything (so the item pointers don't have to be immediately rewritten), freeze everything and such it should be doable at about disk speed unless you have a really fast disk subsystem. But it still is fairly complicated and I doubt its really necessary. > I suppose it would still be faster than a COPY transfer, but I'm not > sure it'd be enough faster to justify the work and the additional > portability hits you'd be taking. Using binary copy might already give quite a speedup, Sameer, did you try that? Also, do you really need parts of a cluster or would a base backup of the whole cluster do the trick? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane<span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">AndrewDunstan <<a href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>> writes:<br /> > On 04/12/201310:15 AM, Tom Lane wrote:<br /></div><div class="im">>> There's 0 chance of making that work, because thetwo databases wouldn't<br /> >> have the same notions of committed XIDs.<br /><br /> > Yeah. Trying to thinkway outside the box, could we invent some sort of<br /> > fixup mechanism that could be applied to adopted files?<br/><br /></div>Well, it wouldn't be that hard to replace XIDs with FrozenXID or<br /> InvalidXID as appropriate,if you had access to the source database's<br /> clog while you did the copying. It just wouldn't be very fast.<br/><div class="HOEnZb"><div class="h5"><br /></div></div></blockquote></div><div class="gmail_extra"><br /></div>Wouldit be possible to fix the XIDs *after* copying the data files, potentially on a different server so as to avoidany additional overhead on the main server ? I guess so, though we will probably need some mechanism to lock out accessto the table (which seems easy), flush all its data pages to the disk and some way to reliably flush all clog pagesas well so that they can be copied along with the data files. The page LSNs seem to be easy to handle and can be easilyzeroed out outside the server.<br /><br clear="all" /><div style="style">I wonder though if this all look like a materialfor something like pg_reorg(pack) though some kind of support from the core may be required.</div><div style="style"><br/></div><div style="style">Thanks,</div><div style="style"> Pavan</div><div style="style"><br /></div>--<br />Pavan Deolasee<br /><a href="http://www.linkedin.com/in/pavandeolasee" target="_blank">http://www.linkedin.com/in/pavandeolasee</a></div></div>
On Fri, Apr 12, 2013 at 10:22:38PM +0530, Pavan Deolasee wrote: > > > > On Fri, Apr 12, 2013 at 9:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > > On 04/12/2013 10:15 AM, Tom Lane wrote: > >> There's 0 chance of making that work, because the two databases wouldn't > >> have the same notions of committed XIDs. > > > Yeah. Trying to think way outside the box, could we invent some sort of > > fixup mechanism that could be applied to adopted files? > > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. > > > > Would it be possible to fix the XIDs *after* copying the data files, > potentially on a different server so as to avoid any additional overhead on the > main server ? I guess so, though we will probably need some mechanism to lock > out access to the table (which seems easy), flush all its data pages to the > disk and some way to reliably flush all clog pages as well so that they can be > copied along with the data files. The page LSNs seem to be easy to handle and > can be easily zeroed out outside the server. > > I wonder though if this all look like a material for something like pg_reorg > (pack) though some kind of support from the core may be required. Uh, now that you mention it, pg_upgrade in non-link mode does something similer, in that it copies the data files and clog. You could use pg_upgrade in non-link mode, run VACUUM FREEZE on the upgraded cluster, and then copy the data files. The only problem is that pg_upgrade can't upgrade tablespaces with the same system catalog version because the tablespace directory names would conflict. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Well, it wouldn't be that hard to replace XIDs with FrozenXID or > InvalidXID as appropriate, if you had access to the source database's > clog while you did the copying. It just wouldn't be very fast. If you're doing that in a streaming method, it strikes me that it'd be plenty fast. > I suppose it would still be faster than a COPY transfer, but I'm not > sure it'd be enough faster to justify the work and the additional > portability hits you'd be taking. The big win here over a binary COPY is pulling through the indexes as-is as well- without having to rebuild them. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I suppose it would still be faster than a COPY transfer, but I'm not >> sure it'd be enough faster to justify the work and the additional >> portability hits you'd be taking. > The big win here over a binary COPY is pulling through the indexes as-is > as well- without having to rebuild them. Meh. That raises the ante another substantial multiple with respect to the amount of portability risk (eg, you're now absolutely dependent on locale sort orders to be identical in both databases). And I think you'd have to freeze all updates to the table while you were copying the table+indexes, if you wanted them to be consistent. I can't imagine that we'd accept a patch that says to the recipient database, "here are some large binary blobs, please believe that they represent a valid table and associated indexes. Oh, and don't you dare try to actually check them, because that would be slow." Some other interesting things to think about here would be toast-table OIDs embedded in toast pointers, data type OIDs embedded in arrays (and maybe records too, I forget), enum value OIDs, btree vacuum cycle IDs, GiST NSNs ... not sure what else, but I bet that's not a complete list. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > The big win here over a binary COPY is pulling through the indexes as-is > > as well- without having to rebuild them. [... lots of reasons this is hard ...] I agree that it's quite a bit more difficult, to the point that logical replication which can be selective (eg: give me only table X + indexes) might end up being the only answer, but otherwise this approach will likely only be a modest improvement over binary COPY FREEZE- and there only because we essentially end up skipping the type validation (which we could just provide as an option, similar to COPY FREEZE...). Thanks, Stephen
On Fri, Apr 12, 2013 at 9:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-04-12 12:14:24 -0400, Tom Lane wrote:>I think if one goes over the heap and hint bits everything (so the item
> Andrew Dunstan <andrew@dunslane.net> writes:
> > On 04/12/2013 10:15 AM, Tom Lane wrote:
> >> There's 0 chance of making that work, because the two databases wouldn't
> >> have the same notions of committed XIDs.
>
> > Yeah. Trying to think way outside the box, could we invent some sort of
> > fixup mechanism that could be applied to adopted files?
>
> Well, it wouldn't be that hard to replace XIDs with FrozenXID or
> InvalidXID as appropriate, if you had access to the source database's
> clog while you did the copying. It just wouldn't be very fast.
>pointers don't have to be immediately rewritten), freeze everything and
>such it should be doable at about disk speed unless you have a really
>fast disk subsystem.
>But it still is fairly complicated and I doubt its really necessary.>Using binary copy might already give quite a speedup, Sameer, did you
>> I suppose it would still be faster than a COPY transfer, but I'm not
> >sure it'd be enough faster to justify the work and the additional
> >portability hits you'd be taking.
try that?
No we have not so far, was soliciting feedback first from the hackers and possibly implement as a contrib module. Also i did misread the earlier post on the subject.
>Also, do you really need parts of a cluster or would a base backup of
>the whole cluster do the trick?
We were looking at parts of cluster as an faster alternative to pg_dump and restore
--
Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services