Modifying COPY TO
От | Dave Held |
---|---|
Тема | Modifying COPY TO |
Дата | |
Msg-id | 49E94D0CFCD4DB43AFBA928DDD20C8F902618457@asg002.asg.local обсуждение исходный текст |
Ответы |
Re: Modifying COPY TO
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Modifying COPY TO (Stephen Frost <sfrost@snowman.net>) Re: Modifying COPY TO ("Jim C. Nasby" <decibel@decibel.org>) |
Список | pgsql-hackers |
<p><font face="Arial" size="2">I am interested in hacking COPY TO such that one can specify that</font><br /><font face="Arial"size="2">rows are copied in a certain index order. I got as far as </font><br /><font face="Arial" size="2">src/backend/commands/copy.c:CopyTo(),and it looks like I would need</font><br /><font face="Arial" size="2">to modifythe call to heap_beginscan() so that it uses a key. However,</font><br /><font face="Arial" size="2">I couldn't figureout how to provide one, or if I'm even looking at the</font><br /><font face="Arial" size="2">right area. Ideally,this behavior would be specified with a flag,</font><br /><font face="Arial" size="2">perhaps: "WITH INDEX <index_name>"or "WITH PRIMARY KEY"</font><br /><font face="Arial" size="2">or something similar.</font><p><font face="Arial"size="2">The motivation for this change is as follows. I have a fairly large</font><br /><font face="Arial"size="2">database (10 million+ records) that mirrors the data in a proprietary</font><br /><font face="Arial"size="2">system. The only access to that data is through exported flat files.</font><br /><font face="Arial"size="2">Currently, those flat files are copied directly into a staging area in the </font><br /><font face="Arial"size="2">db via a COPY FROM, the actual tables are truncated, and the</font><br /><font face="Arial" size="2">stagingdata is inserted into the live tables. Since the data is read-only,</font><br /><font face="Arial" size="2">itdoesn't matter that it is recreated every day. However, as you</font><br /><font face="Arial" size="2">can imagine,the import process takes quite a while (several hours).</font><br /><font face="Arial" size="2">Also, rebuildingthe db from scratch every day loses any statistical</font><br /><font face="Arial" size="2">information gatheredfrom the execution of queries during the day.</font><p><font face="Arial" size="2">A possibility that I would liketo pursue is to keep the staging data</font><br /><font face="Arial" size="2">from the previous day, do a COPY TO, importthe new data into</font><br /><font face="Arial" size="2">another staging table with a COPY FROM, then export the fresh</font><br/><font face="Arial" size="2">data with another COPY TO. Then, I can write a fast C/C++</font><br /><fontface="Arial" size="2">program to do a line-by-line comparison of each record, isolating</font><br /><font face="Arial"size="2">the ones that have changed from the previous day. I can then</font><br /><font face="Arial" size="2">emitthose records in a change file that should be relatively small</font><br /><font face="Arial" size="2">and easyto update. Of course, this scheme can only work if</font><br /><font face="Arial" size="2">COPY TO emits the recordsin a reliable order.</font><p><font face="Arial" size="2">Any assistance on this project would be greatly appreciated. The</font><br /><font face="Arial" size="2">best I can see, I'm stuck on line 1053 from copy.c:</font><p><fontface="Arial" size="2"> scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);</font><p><font face="Arial"size="2">I suspect that I want it to look like this:</font><p><font face="Arial" size="2"> scandesc = heap_beginscan(rel,mySnapshot, 1, key);</font><p><font face="Arial" size="2">where 'key' is an appropriately constructedScanKey. It looks</font><br /><font face="Arial" size="2">like I want to call ScanKeyEntryInitialize(), but I'mnot sure what</font><br /><font face="Arial" size="2">parameters I need to pass to it to get an index or the primary</font><br/><font face="Arial" size="2">key. I mostly need help building the ScanKey object. I think I </font><br/><font face="Arial" size="2">can figure out how to hack the custom option, etc. I should </font><br /><font face="Arial"size="2">mention that I am using the 7.4.7 codebase on Linux 2.4.</font><p><font face="Courier New" size="2">__</font><br/><font face="Courier New" size="2">David B. Held</font><br /><font face="Courier New" size="2">SoftwareEngineer/Array Services Group</font><br /><font face="Courier New" size="2">200 14th Ave. East, Sartell,MN 56377</font><br /><font face="Courier New" size="2">320.534.3637 320.253.7800 800.752.8129</font><br />
В списке pgsql-hackers по дате отправления: