Обсуждение: Restore v. Running COPY/INDEX seperatly

Поиск
Список
Период
Сортировка

Restore v. Running COPY/INDEX seperatly

От
Benjamin Arai
Дата:
Hi,

So, I built my tables which contains a TSearch2 field by

1. Create table without indexes
2. COPY data into table
3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
5. Index all the fields including the TSearch2 field

The process takes several days.

In contrast, if I backup the table and restore it to a new table it
takes a fraction of the time as running the above operation
manually.  I am building my indexes at the end but I think the step 4
may be causing uneeded overhead.  Can I somehow just copy data into
the idxFTI field during the copy process?  Is there anything else I
can do to get my loading process to perform similar to backup/restore?

Does pg_dump also dump the indexes?  That would explain why it is so
much faster...

Benjamin

Re: Restore v. Running COPY/INDEX seperatly

От
Oleg Bartunov
Дата:
On Sun, 26 Aug 2007, Benjamin Arai wrote:

> Hi,
>
> So, I built my tables which contains a TSearch2 field by
>
> 1. Create table without indexes
> 2. COPY data into table
> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);

vacuum here

> 5. Index all the fields including the TSearch2 field
>
> The process takes several days.
>
> In contrast, if I backup the table and restore it to a new table it takes a
> fraction of the time as running the above operation manually.  I am building
> my indexes at the end but I think the step 4 may be causing uneeded overhead.
> Can I somehow just copy data into the idxFTI field during the copy process?
> Is there anything else I can do to get my loading process to perform similar
> to backup/restore?
>
> Does pg_dump also dump the indexes?  That would explain why it is so much
> faster...
>
> Benjamin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>             http://www.postgresql.org/docs/faq

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Restore v. Running COPY/INDEX seperatly

От
Gregory Stark
Дата:
"Oleg Bartunov" <oleg@sai.msu.su> writes:

> On Sun, 26 Aug 2007, Benjamin Arai wrote:
>
>> Hi,
>>
>> So, I built my tables which contains a TSearch2 field by
>>
>> 1. Create table without indexes
>> 2. COPY data into table
>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
>
> vacuum here

Or you could do something tricky and do the update like this which would avoid
the need to vacuum:

ALTER TABLE tblMessages ALTER COLUMN idxFTI TYPE tsvector USING to_tsvector('default, strMesage);

This only works because ALTER TABLE rewrites the table from scratch any time
it does an operation like this. Don't try this if there are transactions
working against the table at the same time (such as a pg_dump!).

Or you could set up a trigger to generate the tsvector when you first load the
data instead of adding it later.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Restore v. Running COPY/INDEX seperatly

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
>> On Sun, 26 Aug 2007, Benjamin Arai wrote:
>>> So, I built my tables which contains a TSearch2 field by
>>> 1. Create table without indexes
>>> 2. COPY data into table
>>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
>>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);

> Or you could set up a trigger to generate the tsvector when you first
> load the data instead of adding it later.

You're going to want such a trigger anyway, so installing it before the
COPY step seems like the Obviously Right Thing.  Any other approach
implies rewriting the entire table after you've loaded it, with no
compensating advantage that I can see.

            regards, tom lane

Re: Restore v. Running COPY/INDEX seperatly

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> >> On Sun, 26 Aug 2007, Benjamin Arai wrote:
> >>> So, I built my tables which contains a TSearch2 field by
> >>> 1. Create table without indexes
> >>> 2. COPY data into table
> >>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
> >>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
>
> > Or you could set up a trigger to generate the tsvector when you first
> > load the data instead of adding it later.
>
> You're going to want such a trigger anyway, so installing it before the
> COPY step seems like the Obviously Right Thing.  Any other approach
> implies rewriting the entire table after you've loaded it, with no
> compensating advantage that I can see.

Isn't the main speed advantage of the dump the fact that the
to_tsvector() results already come in the COPY data?  The dump already
comes with the idxFTI column contents, instead of having to generate it
from scratch.  That would depend on how expensive that function is, of
course.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"PHP is what I call the "Dumb Monkey" language. [A]ny dumb monkey can code
something in PHP. Python takes actual thought to produce something useful."
                                                               (J. Drake)

Re: Restore v. Running COPY/INDEX seperatly

От
Benjamin Arai
Дата:
Why is a trigger faster than doing a ALTER after table is created?  I
thought a trigger would be slower because it would be invoked every
iteration (a new row is inserted) during the COPY process.

Benjamin

On Aug 26, 2007, at 8:43 PM, Tom Lane wrote:

> Gregory Stark <stark@enterprisedb.com> writes:
>>> On Sun, 26 Aug 2007, Benjamin Arai wrote:
>>>> So, I built my tables which contains a TSearch2 field by
>>>> 1. Create table without indexes
>>>> 2. COPY data into table
>>>> 3. ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
>>>> 4. UPDATE tblMessages SET idxFTI=to_tsvector('default',
>>>> strMessage);
>
>> Or you could set up a trigger to generate the tsvector when you first
>> load the data instead of adding it later.
>
> You're going to want such a trigger anyway, so installing it before
> the
> COPY step seems like the Obviously Right Thing.  Any other approach
> implies rewriting the entire table after you've loaded it, with no
> compensating advantage that I can see.
>
>             regards, tom lane
>


Re: Restore v. Running COPY/INDEX seperatly

От
Tom Lane
Дата:
Benjamin Arai <me@benjaminarai.com> writes:
> Why is a trigger faster than doing a ALTER after table is created?  I
> thought a trigger would be slower because it would be invoked every
> iteration (a new row is inserted) during the COPY process.

Yeah, you'd have the trigger overhead, but the above argument ignores
the costs of the full-table UPDATE --- not to mention the VACUUM
you'll need after the UPDATE to clean up the dead rows.

            regards, tom lane

Re: Restore v. Running COPY/INDEX seperatly

От
Benjamin Arai
Дата:
In what order should I :

- COPY data
- Create indexes
- Create Trigger
- Vaccum

?

Currently I am:

1. Create table
2 . Create trigger for updates
3. Create indexes including gin
4. Vaccum

Benjamin

On Aug 27, 2007, at 7:59 AM, Tom Lane wrote:

> Benjamin Arai <me@benjaminarai.com> writes:
>> Why is a trigger faster than doing a ALTER after table is created?  I
>> thought a trigger would be slower because it would be invoked every
>> iteration (a new row is inserted) during the COPY process.
>
> Yeah, you'd have the trigger overhead, but the above argument ignores
> the costs of the full-table UPDATE --- not to mention the VACUUM
> you'll need after the UPDATE to clean up the dead rows.
>
>             regards, tom lane
>