Обсуждение: Restore v. Running COPY/INDEX seperatly
Hi, So, I built my tables which contains a TSearch2 field by 1. Create table without indexes 2. COPY data into table 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); 5. Index all the fields including the TSearch2 field The process takes several days. In contrast, if I backup the table and restore it to a new table it takes a fraction of the time as running the above operation manually. I am building my indexes at the end but I think the step 4 may be causing uneeded overhead. Can I somehow just copy data into the idxFTI field during the copy process? Is there anything else I can do to get my loading process to perform similar to backup/restore? Does pg_dump also dump the indexes? That would explain why it is so much faster... Benjamin
On Sun, 26 Aug 2007, Benjamin Arai wrote: > Hi, > > So, I built my tables which contains a TSearch2 field by > > 1. Create table without indexes > 2. COPY data into table > 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; > 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); vacuum here > 5. Index all the fields including the TSearch2 field > > The process takes several days. > > In contrast, if I backup the table and restore it to a new table it takes a > fraction of the time as running the above operation manually. I am building > my indexes at the end but I think the step 4 may be causing uneeded overhead. > Can I somehow just copy data into the idxFTI field during the copy process? > Is there anything else I can do to get my loading process to perform similar > to backup/restore? > > Does pg_dump also dump the indexes? That would explain why it is so much > faster... > > Benjamin > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
"Oleg Bartunov" <oleg@sai.msu.su> writes: > On Sun, 26 Aug 2007, Benjamin Arai wrote: > >> Hi, >> >> So, I built my tables which contains a TSearch2 field by >> >> 1. Create table without indexes >> 2. COPY data into table >> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; >> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > > vacuum here Or you could do something tricky and do the update like this which would avoid the need to vacuum: ALTER TABLE tblMessages ALTER COLUMN idxFTI TYPE tsvector USING to_tsvector('default, strMesage); This only works because ALTER TABLE rewrites the table from scratch any time it does an operation like this. Don't try this if there are transactions working against the table at the same time (such as a pg_dump!). Or you could set up a trigger to generate the tsvector when you first load the data instead of adding it later. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: >> On Sun, 26 Aug 2007, Benjamin Arai wrote: >>> So, I built my tables which contains a TSearch2 field by >>> 1. Create table without indexes >>> 2. COPY data into table >>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; >>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > Or you could set up a trigger to generate the tsvector when you first > load the data instead of adding it later. You're going to want such a trigger anyway, so installing it before the COPY step seems like the Obviously Right Thing. Any other approach implies rewriting the entire table after you've loaded it, with no compensating advantage that I can see. regards, tom lane
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > >> On Sun, 26 Aug 2007, Benjamin Arai wrote: > >>> So, I built my tables which contains a TSearch2 field by > >>> 1. Create table without indexes > >>> 2. COPY data into table > >>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; > >>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage); > > > Or you could set up a trigger to generate the tsvector when you first > > load the data instead of adding it later. > > You're going to want such a trigger anyway, so installing it before the > COPY step seems like the Obviously Right Thing. Any other approach > implies rewriting the entire table after you've loaded it, with no > compensating advantage that I can see. Isn't the main speed advantage of the dump the fact that the to_tsvector() results already come in the COPY data? The dump already comes with the idxFTI column contents, instead of having to generate it from scratch. That would depend on how expensive that function is, of course. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "PHP is what I call the "Dumb Monkey" language. [A]ny dumb monkey can code something in PHP. Python takes actual thought to produce something useful." (J. Drake)
Why is a trigger faster than doing a ALTER after table is created? I thought a trigger would be slower because it would be invoked every iteration (a new row is inserted) during the COPY process. Benjamin On Aug 26, 2007, at 8:43 PM, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >>> On Sun, 26 Aug 2007, Benjamin Arai wrote: >>>> So, I built my tables which contains a TSearch2 field by >>>> 1. Create table without indexes >>>> 2. COPY data into table >>>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector; >>>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', >>>> strMessage); > >> Or you could set up a trigger to generate the tsvector when you first >> load the data instead of adding it later. > > You're going to want such a trigger anyway, so installing it before > the > COPY step seems like the Obviously Right Thing. Any other approach > implies rewriting the entire table after you've loaded it, with no > compensating advantage that I can see. > > regards, tom lane >
Benjamin Arai <me@benjaminarai.com> writes: > Why is a trigger faster than doing a ALTER after table is created? I > thought a trigger would be slower because it would be invoked every > iteration (a new row is inserted) during the COPY process. Yeah, you'd have the trigger overhead, but the above argument ignores the costs of the full-table UPDATE --- not to mention the VACUUM you'll need after the UPDATE to clean up the dead rows. regards, tom lane
In what order should I : - COPY data - Create indexes - Create Trigger - Vaccum ? Currently I am: 1. Create table 2 . Create trigger for updates 3. Create indexes including gin 4. Vaccum Benjamin On Aug 27, 2007, at 7:59 AM, Tom Lane wrote: > Benjamin Arai <me@benjaminarai.com> writes: >> Why is a trigger faster than doing a ALTER after table is created? I >> thought a trigger would be slower because it would be invoked every >> iteration (a new row is inserted) during the COPY process. > > Yeah, you'd have the trigger overhead, but the above argument ignores > the costs of the full-table UPDATE --- not to mention the VACUUM > you'll need after the UPDATE to clean up the dead rows. > > regards, tom lane >