Обсуждение: Improving backend launch time by preloading relcache

Поиск
Список
Период
Сортировка

Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
I spent the weekend fooling around trying to reduce the time needed to
start a fresh backend.  Profiling seemed to indicate that much of the
time was going into loading entries into the relcache: relcache entry
setup normally requires fetching rows from several different system
catalogs.  The obvious way to fix that is to preload entries somehow.
It turns out we already have a mechanism for this (the pg_internal.init
file), but it was only being used to preload entries for a few critical
system indexes --- "critical" meaning "relcache/catcache initialization
becomes an infinite recursion otherwise".  I rearranged things so that
pg_internal.init could cache entries for both plain relations and
indexes, and then set it up to cache all the system catalogs and indexes
that are referenced by catalog caches.  (This is a somewhat arbitrary
choice, but was easy to implement.)

As near as I can tell, this reduces the user-space CPU time involved in
a backend launch by about a factor of 5; and there's also a very
significant reduction in traffic to shared memory, which should reduce
contention problems when multiple backends are involved.  It's difficult
to measure this stuff, however ... profiling is of limited reliability
when you can only get a few clock samples per process launch.

I'm planning to commit these changes when 7.3 opens, unless I hear
objections.  A possible objection is that caching more system catalog
descriptors makes it more difficult to support user alterations to the
system catalogs; but we don't support those anyway, and I haven't heard
of anyone working to remove the other obstacles to it.  (Note that this
wouldn't completely prevent such things; it would just be necessary to
figure out when to delete the pg_internal.init cache file when making
schema changes.)
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> I spent the weekend fooling around trying to reduce the time needed to
> start a fresh backend.  Profiling seemed to indicate that much of the
> time was going into loading entries into the relcache: relcache entry
> setup normally requires fetching rows from several different system
> catalogs.  The obvious way to fix that is to preload entries somehow.
> It turns out we already have a mechanism for this (the pg_internal.init
> file), but it was only being used to preload entries for a few critical
> system indexes --- "critical" meaning "relcache/catcache initialization
> becomes an infinite recursion otherwise".  I rearranged things so that
> pg_internal.init could cache entries for both plain relations and
> indexes, and then set it up to cache all the system catalogs and indexes
> that are referenced by catalog caches.  (This is a somewhat arbitrary
> choice, but was easy to implement.)

While examining this issue I found the following change
about REINDEX.
Subject: [COMMITTERS] pgsql/src/backend catalog/index.c commands/ind
...   Date: Mon, 19 Nov 2001 21:46:13 -0500 (EST)   From: tgl@postgresql.org     To: pgsql-committers@postgresql.org

CVSROOT:        /cvsroot
Module name:    pgsql
Changes by:     tgl@postgresql.org      01/11/19 21:46:13

Modified files:       src/backend/catalog: index.c        src/backend/commands: indexcmds.c        src/backend/tcop:
utility.c
 

Log message:       Some minor tweaks of REINDEX processing: grab exclusivelock a little earlier, make error checks more
uniform.

The change on tcop/utility.c seems to inhibit the execution
of REINDEX of system indexes under postmaster which I allowed
except some system indexes in 7.1.
Please put it back in 7.2.1.

Inhibited relations are the indexes of the followings.
[Shared relations]
pg_database, pg_shadow, pg_group
[Nailed relations]
pg_class, pg_type, pg_attribute, pg_proc

There are some trial stuff to handle nailed relations
(mostly #ifdef'd ENABLE_REINDEX_NAILED_RELATIONS).
Especially setNewRelfilenode() unlinks the pg_internal.init
file in case the relation is nailed. However I don't rely
on the mechanism so much that I can't feel like removing the
#ifdef's.

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> The change on tcop/utility.c seems to inhibit the execution
> of REINDEX of system indexes under postmaster which I allowed
> except some system indexes in 7.1.

That strikes me as a fairly dangerous idea.  Do you really
believe it's safe?  Also, why would it be safe to allow reindex
at the table level and not at the index level, which is what
the code did before I touched it?

> Especially setNewRelfilenode() unlinks the pg_internal.init
> file in case the relation is nailed.

Probably with this change I'm planning, it'll be necessary to unlink
pg_internal.init for any system relation, not only nailed ones.
Thanks for pointing that out.
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > The change on tcop/utility.c seems to inhibit the execution
> > of REINDEX of system indexes under postmaster which I allowed
> > except some system indexes in 7.1.
> 
> That strikes me as a fairly dangerous idea.  Do you really
> believe it's safe?  Also, why would it be safe to allow reindex
> at the table level and not at the index level, which is what
> the code did before I touched it?

REINDEX uses the relfilenode mechanism since 7.1 which
lets the replacement of index files be under transactional
control. I think it's safe enough. One thing I had to worry
about REINDEX on system indexes is how to tell that the
target index mustn't be used during the REINDEX operation.
Turning off the relhasindex column in pg_class tells
PG system that the indexes are unavailable now. It was 
implemented by me before 7.0. I didn't provided the
way to inactivate indexes individually however.

> 
> > Especially setNewRelfilenode() unlinks the pg_internal.init
> > file in case the relation is nailed.
> 
> Probably with this change I'm planning, it'll be necessary to unlink
> pg_internal.init for any system relation, not only nailed ones.
> Thanks for pointing that out.

What I meant was to confirm if it's really reliable.
Currently e.g. the failure of rename of temporary
init file to pg_internal.init isn't fatal but it
may be fatal if we include many relcache info in
the pg_internal.init file.

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> REINDEX uses the relfilenode mechanism since 7.1 which
> lets the replacement of index files be under transactional
> control. I think it's safe enough.

Okay, in that case tcop/utility is being too picky about all three
cases, no?

> What I meant was to confirm if it's really reliable.
> Currently e.g. the failure of rename of temporary
> init file to pg_internal.init isn't fatal but it
> may be fatal if we include many relcache info in
> the pg_internal.init file.

Certainly not --- it must always be possible for a freshly started
backend to build the pg_internal.init file from scratch.  The reason
for unlinking pg_internal.init after changing a catalog schema tuple
is that future backends won't know you changed it unless
pg_internal.init is rebuilt.

Hmm ... what that says is that unlinking pg_internal.init in
setRelfilenode is the wrong place.  The right place is *after*
committing your transaction and *before* sending shared cache inval
messages.  You can't unlink before you commit, or someone may rebuild
using the old information.  (A backend that's already logged into the
PROC array when you send SI inval will find out about the changes via SI
inval.  One that is not yet logged in must be prevented from reading the
now-obsolete pg_internal.init file.  The startup sequence logs into PROC
before trying to read pg_internal.init, so that part is done in the
right order.)  So we need a flag that will cause the unlink to happen at
the right time in post-commit cleanup.

VACUUM's got this same timing bug, although its change is only one of
updating relpages/reltuples which is not so critical...
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > REINDEX uses the relfilenode mechanism since 7.1 which
> > lets the replacement of index files be under transactional
> > control. I think it's safe enough.
> 
> Okay, in that case tcop/utility is being too picky about all three
> cases, no?

Probably we don't have to keep the relhasindex info in the
db any longer and we had better keep some info about REINDEX
in memory local to the backend. In fact in the current
implementation the relhasindex is local to the backend
and any backend couldn't see the column committed to
the status off. 

> Hmm ... what that says is that unlinking pg_internal.init in
> setRelfilenode is the wrong place.

Possibly. I couldn't find the appropriate place(way) then
and so #ifdef's are still there.

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Oleg Bartunov
Дата:
On Mon, 28 Jan 2002, Tom Lane wrote:

> I spent the weekend fooling around trying to reduce the time needed to
> start a fresh backend.  Profiling seemed to indicate that much of the
> time was going into loading entries into the relcache: relcache entry
> setup normally requires fetching rows from several different system
> catalogs.  The obvious way to fix that is to preload entries somehow.
> It turns out we already have a mechanism for this (the pg_internal.init
> file), but it was only being used to preload entries for a few critical
> system indexes --- "critical" meaning "relcache/catcache initialization
> becomes an infinite recursion otherwise".  I rearranged things so that
> pg_internal.init could cache entries for both plain relations and
> indexes, and then set it up to cache all the system catalogs and indexes
> that are referenced by catalog caches.  (This is a somewhat arbitrary
> choice, but was easy to implement.)
>
> As near as I can tell, this reduces the user-space CPU time involved in
> a backend launch by about a factor of 5; and there's also a very
> significant reduction in traffic to shared memory, which should reduce

Tom, what's about  absolute timings ? It's quite interesting, because
many people have to keep persistent connections to backend and if
statup time would be small ( as in MySQL case ), it'd be possible just
not waste a system resources ( in some situations ).

> contention problems when multiple backends are involved.  It's difficult
> to measure this stuff, however ... profiling is of limited reliability
> when you can only get a few clock samples per process launch.
>
> I'm planning to commit these changes when 7.3 opens, unless I hear
> objections.  A possible objection is that caching more system catalog
> descriptors makes it more difficult to support user alterations to the
> system catalogs; but we don't support those anyway, and I haven't heard
> of anyone working to remove the other obstacles to it.  (Note that this
> wouldn't completely prevent such things; it would just be necessary to
> figure out when to delete the pg_internal.init cache file when making
> schema changes.)
>

>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Probably we don't have to keep the relhasindex info in the
> db any longer and we had better keep some info about REINDEX
> in memory local to the backend.

I never did much care for the "change relhasindex" hack.  Why isn't
IsIgnoringSystemIndexes a sufficient solution?  I don't really care
if REINDEX is a little bit slower than it might be, so just turning
off use of *all* system indexes seems like an adequate answer.

>> Hmm ... what that says is that unlinking pg_internal.init in
>> setRelfilenode is the wrong place.

> Possibly. I couldn't find the appropriate place(way) then
> and so #ifdef's are still there.

Okay.  I'll work on that when I commit the patches I have.
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Probably we don't have to keep the relhasindex info in the
> > db any longer and we had better keep some info about REINDEX
> > in memory local to the backend.
> 
> I never did much care for the "change relhasindex" hack.  Why isn't
> IsIgnoringSystemIndexes a sufficient solution?  I don't really care
> if REINDEX is a little bit slower than it might be, so just turning
> off use of *all* system indexes seems like an adequate answer.

It may be a reasonable solution.
I thought of another idea while reading the thread [HACKERS]
sequence indexes. Currently REINDEX recreates indexes from
the heap relations because the indexes may be corrupted.
However we can recreate indexes from existent ones if
they are sane. It would be a lot faster than the current
way for large tables. 

Comments ?

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> I thought of another idea while reading the thread [HACKERS]
> sequence indexes. Currently REINDEX recreates indexes from
> the heap relations because the indexes may be corrupted.
> However we can recreate indexes from existent ones if
> they are sane. It would be a lot faster than the current
> way for large tables. 

Hmm ... you are thinking about the case where REINDEX is being used
not to recover from corruption, but just to shrink indexes that have
accumulated too much free space.  Okay, that's a reasonable case to
try to optimize, though I'd like to think the problem will go away
in a release or two when we implement VACUUM-time index shrinking.

However, I'm not sure about the "lot faster" part.  The only win
I can see is that when rebuilding a btree index, you could skip
the sort step by reading the old index in index order.  This'd
require hacking things deep in the guts of the btree index method,
not at the level of the present REINDEX code.  And AFAICS it doesn't
translate at all to the other index types.

Not sure it's worth the trouble.  I'd rather see us expend the same
effort on shrinking indexes on-the-fly in VACUUM.
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hmm ... you are thinking about the case where REINDEX is being used
> not to recover from corruption, but just to shrink indexes that have
> accumulated too much free space.

Yes.

> Okay, that's a reasonable case to
> try to optimize, though I'd like to think the problem will go away
> in a release or two when we implement VACUUM-time index shrinking.
> 
> However, I'm not sure about the "lot faster" part.  The only win
> I can see is that when rebuilding a btree index, you could skip
> the sort step by reading the old index in index order.

Don't we have to scan the (possibly larger) heap table ?

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
"Zeugswetter Andreas SB SD"
Дата:
> > Okay, that's a reasonable case to
> > try to optimize, though I'd like to think the problem will go away
> > in a release or two when we implement VACUUM-time index shrinking.
> > 
> > However, I'm not sure about the "lot faster" part.  The only win
> > I can see is that when rebuilding a btree index, you could skip
> > the sort step by reading the old index in index order.
> 
> Don't we have to scan the (possibly larger) heap table ?

Yes, but that is done with a seq scan, but the index has to be read in 
random order, since the index pages are not physically sorted on disk
from lowest to highest value. Of course you can spare the sort, 
but overall ...

Imho spending effort on VACUUM is more fruitful, and has the potential to 
allow much more concurrency than REINDEX, no ?

Andreas


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Zeugswetter Andreas SB SD wrote:
> 
> > > Okay, that's a reasonable case to
> > > try to optimize, though I'd like to think the problem will go away
> > > in a release or two when we implement VACUUM-time index shrinking.
> > >
> > > However, I'm not sure about the "lot faster" part.  The only win
> > > I can see is that when rebuilding a btree index, you could skip
> > > the sort step by reading the old index in index order.
> >
> > Don't we have to scan the (possibly larger) heap table ?
> 
> Yes, but that is done with a seq scan, but the index has to be read in
> random order, since the index pages are not physically sorted on disk
> from lowest to highest value. Of course you can spare the sort,
> but overall ...

Reading a index file is not faster than reading the heap file ?
Does sorting finish in a moment ?
If so we have to use sequential scan much more often.

Anyway there seems no point on changing REINDEX.
The only thing I have to do is to remove the needless
check in tcop/utility.c as soon as 7.2 is released.

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Anyway there seems no point on changing REINDEX.
> The only thing I have to do is to remove the needless
> check in tcop/utility.c as soon as 7.2 is released.

I don't believe it's needless, and I suggest you not remove it,
until we do something about making the pg_internal unlink happen
at the right time.  With the unlink where it is, I think it's quite
unsafe to reindex system indexes in a live database.
        regards, tom lane


Re: Improving backend launch time by preloading relcache

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Anyway there seems no point on changing REINDEX.
> > The only thing I have to do is to remove the needless
> > check in tcop/utility.c as soon as 7.2 is released.
> 
> I don't believe it's needless, and I suggest you not remove it,
> until we do something about making the pg_internal unlink happen
> at the right time.  With the unlink where it is, I think it's quite
> unsafe to reindex system indexes in a live database.

Currently there are just a few relations info kept in
pg_internal.init and they are all nailed. I'm not
allowing REINDEX for nailed relations though there's
a #ifdef'd trial implementation. I'm intending the
change for 7.2.1 not 7.3. If it isn't allowed in 7.2.x
I would strongly object to the 7.2 release itself.

regards,
Hiroshi Inoue


Re: Improving backend launch time by preloading relcache

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Currently there are just a few relations info kept in
> pg_internal.init and they are all nailed. I'm not
> allowing REINDEX for nailed relations 

Oh, okay.

> a #ifdef'd trial implementation. I'm intending the
> change for 7.2.1 not 7.3. If it isn't allowed in 7.2.x
> I would strongly object to the 7.2 release itself.

If you think it should be changed then change it now.  I see no
reason to wait.
        regards, tom lane