Обсуждение: Multi-Versions and Vacuum

Поиск
Список
Период
Сортировка

Multi-Versions and Vacuum

От
Anthony Berglas
Дата:
Thanks to all the people that responed.  Summaries and notes:-

VACUUM

The consensus seems to be that old copies are not removed until a Vacuum is
performed.  When a Vacuum is performed then only versions older than the
start of the oldest transaction are removed.

(Compare with Oracle -- a certain amount of space is allocated for multi
versions ("Rollback Segments").  Old versions are moved there, I think.
When the space is exhausted then then old versions are dropped
automatically.  Occasionally this can cause transactions to be aborted
because the old versions they need are no longer available.)

What actually is the difference between Vaccum and Vacuum Full?  I assume
that the former must release some disk space, otherwise it would be useless?

STORAGE

It appears that the versions are stored in the same blocks as the normal
data.  Also, seems that the whole row is copied (correct?).  This will
reduce the efficiency of the cache to some extent by having blocks
containing non-pertinent data.

LOCKING

The docs mentioned phantoms in the Serializable section, implying that they
were not trapped by Read Committed.  But they are, good.

My note about Oracle is that in its Read Committed mode transactions are
safer without being Serializable.  It is half way between what Postgres call
Read Committed and Serializable.  And if I think that in that regard Oracle
got it right.  Safer transactions without Serializable rollback problems
with no real downside.  Particularly for reports.

USING THE LOGS FOR MULTI VERSIONS

If anyone know why this is a bad idea I'd be interested.  Seems to be a win
win to me.  No need to vacuum, no additional overhead during update (you
need the logs anyway), fast queries, less disk space.  (This is not what
Oracle does.)

TIME TRAVEL

Sounds fun.  But hard to get right.

EXISTING DOCS

When this thread settles I'll add a note to the interactive docs.

(Thanks Neil for taking the trouble to provide doc fragments.  But if you
read them again you will note that they are not at all clear on the actual
question that I asked.  Which is why I asked them.)

Re: Multi-Versions and Vacuum

От
nconway@klamath.dyndns.org (Neil Conway)
Дата:
On Thu, Jul 18, 2002 at 09:23:48PM -0700, Anthony Berglas wrote:
> (Compare with Oracle -- a certain amount of space is allocated for multi
> versions ("Rollback Segments").  Old versions are moved there, I think.
> When the space is exhausted then then old versions are dropped
> automatically.  Occasionally this can cause transactions to be aborted
> because the old versions they need are no longer available.)

IMHO, that is much worse than the scheme we have now -- particularly
since some kind of auto-vacuuming system is on the horizon.

> What actually is the difference between Vaccum and Vacuum Full?  I assume
> that the former must release some disk space, otherwise it would be useless?

http://developer.postgresql.org/docs/postgres/sql-vacuum.html , in
particular:

 VACUUM FULL does more extensive processing, including moving of tuples
 across blocks to try to compact the table to the minimum number of disk
 blocks. This form is much slower and requires an exclusive lock on each
 table while it is being processed.

There is more information on VACUUM FULL here:
http://developer.postgresql.org/docs/postgres/routine-vacuuming.html

> EXISTING DOCS
>
> When this thread settles I'll add a note to the interactive docs.

Might be a better idea to send a patch for the SGML to pgsql-patches. Or
if you'd rather not do that, just send any additions you'd like to me
and I'll make the necessary SGML changes and send that to -patches.

> (Thanks Neil for taking the trouble to provide doc fragments.  But if you
> read them again you will note that they are not at all clear on the actual
> question that I asked.  Which is why I asked them.)

On the contrary, 2 of your 3 questions were clearly answered in the
docs, as is the VACUUM FULL question you just asked. If you think the
docs are unclear or incomplete, can you be more specific?

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Multi-Versions and Vacuum

От
Hiroshi Inoue
Дата:
Anthony Berglas wrote:
>
> Thanks to all the people that responed.  Summaries and notes:-
>
> LOCKING
>
> The docs mentioned phantoms in the Serializable section, implying
> that they were not trapped by Read Committed.  But they are, good.
>
> My note about Oracle is that in its Read Committed mode transactions
> are safer without being Serializable.

What's safer ?

regards,
Hiroshi Inoue
    http://w2422.nsk.ne.jp/~inoue/

Re: Multi-Versions and Vacuum

От
grant
Дата:
If Oracle uses multiple version concurrency, then why does it appear to do
row locking (I use Oracle at work, but I do not admin it)

The only downside I have seen to MVCS is the extra disk space, but with
the performance improvements in a highly concurrent use environment, it is
well worth it.

Q:  Does VACUUM FULL remove the rows that are too many for the FSM?  Or,
are they just stuck?  I know that regular VACUUM will not allow reuse of
overflowed rows.  Also, how does one calculate the propper size for the
FSM?

______________________________________________________________________________

                          Your mouse has moved.
       You must restart Windows for your changes to take effect.

#!/usr/bin/perl
print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);



Re: Multi-Versions and Vacuum

От
Bruce Momjian
Дата:
grant wrote:
> If Oracle uses multiple version concurrency, then why does it appear to do
> row locking (I use Oracle at work, but I do not admin it)
>
> The only downside I have seen to MVCS is the extra disk space, but with
> the performance improvements in a highly concurrent use environment, it is
> well worth it.
>
> Q:  Does VACUUM FULL remove the rows that are too many for the FSM?  Or,
> are they just stuck?  I know that regular VACUUM will not allow reuse of
> overflowed rows.  Also, how does one calculate the propper size for the
> FSM?

VACUUM full removes all unused rows from disk _including_ ones that
aren't in the free space map.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Multi-Versions and Vacuum

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> grant wrote:
>> Q:  Does VACUUM FULL remove the rows that are too many for the FSM?  Or,
>> are they just stuck?  I know that regular VACUUM will not allow reuse of
>> overflowed rows.

You seem to know things that aren't true.  What is an "overflowed row"
anyway?

> VACUUM full removes all unused rows from disk _including_ ones that
> aren't in the free space map.

The FSM isn't interested in individual rows in the first place; it just
records the amount of free space per *page*.

After VACUUM does its thing (and yes, it removes everything it legally
can) it reports the amount of free space it found on each page to the
FSM, and the FSM remembers as much of that info as it has room for ---
preferring pages with more free space, if it has to be selective.
Subsequently, when any backend needs to add a tuple to a relation, it
first looks in the FSM to see if there is an extant page with enough
room.  If there's no suitable entry in the FSM, then try the last extant
page; if no luck there, add a new page.

            regards, tom lane

Re: Multi-Versions and Vacuum

От
Andrew Sullivan
Дата:
On Fri, Jul 19, 2002 at 12:35:23PM -0400, Tom Lane wrote:

> After VACUUM does its thing (and yes, it removes everything it legally
> can) it reports the amount of free space it found on each page to the
> FSM, and the FSM remembers as much of that info as it has room for ---
> preferring pages with more free space, if it has to be selective.

I think this is where the confusion lies.  The ever-growing tables
that people are experiencing is somehow related to the new
non-blocking vacuum and the value of the free space map settings.
Previous posts on the matter have left some (well, me, at least) with
the idea that the non-blocking vacuum doesn't get everything, just in
case the free space map values are too small.

But if I understand now what you're saying, it's this: the vacuum
_does_ get everything, but if the free space map settings are too
small, the free space map doesn't know about that free space;
therefore, the space never gets re-used for future data, because as
far as the postmaster is concerened, the space isn't available.  So
the tables keep growing because they have to append their data at the
end of the file (because the free space map hasn't reported any free
space).  Run vacuum a few times like this, and there is a substantial
amount of unused space in the tables, where the free space map
doesn't have an entry for it.  Is that right?

If so, it should be possible to calculate what one needs for the
max_fsm_pages setting by calculating how much data is replaced or
deleted in any given period, vacuum-to-vacuum, I guess.  What one
would need to do is calculate how much space a row is taking up on
disk, and then set the max_fsm_pages to comprise (almost?) enough
space to keep a map big enough to include all the rows likely to be
recovered by vacuum.  Or am I completely mad?  (I know, I know, "Yes"
is a possible answer there.)

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Free space mapping (was Re: Multi-Versions and Vacuum)

От
Tom Lane
Дата:
Andrew Sullivan <andrew@libertyrms.info> writes:
> But if I understand now what you're saying, it's this: the vacuum
> _does_ get everything, but if the free space map settings are too
> small, the free space map doesn't know about that free space;
> therefore, the space never gets re-used for future data, because as
> far as the postmaster is concerened, the space isn't available.  So
> the tables keep growing because they have to append their data at the
> end of the file (because the free space map hasn't reported any free
> space).  Run vacuum a few times like this, and there is a substantial
> amount of unused space in the tables, where the free space map
> doesn't have an entry for it.  Is that right?

Well, the design theory was that once you free enough space on a
particular page, it *will* get into the FSM and then will be filled up.
The fact that small amounts of free space don't get re-used quickly
doesn't bother me a whole lot; the table size should still stabilize
at some proportion of free to used space that depends on your vacuum
frequency.

However, if the FSM is vastly smaller than the number of
actively-changing pages in your database, then that argument breaks down
--- in particular, if the FSM can't even keep track of all the
completely-empty pages in your database then you are going to suffer
progressive leakage.  I'm not sure where the problem gets bad enough to
worry about, and have not had time to try to do a probabilistic
analysis.

If we could determine where trouble sets in, perhaps we could detect the
situation and print warnings during VACUUM?  Another possibility is to
auto-size FSM during postmaster startup, perhaps as a fraction of the
total DB size measured by "du".  (But that probably doesn't work if
you've been playing symlink games.)  In any case we need more knowledge
about appropriate FSM sizes than we have at the moment.

The whole FSM concept is new in 7.2 and I'm sure it still needs
refinement.  Ideas welcome.

            regards, tom lane

Re: Free space mapping (was Re: Multi-Versions and Vacuum)

От
grant
Дата:
So, then, even if there is an overflow of pages with free space, stepping
up the vacuum schedule may help this eventually catch up.

As for an automated way to set this up, if the FSM size cannot be adjusted
without a restart of the postmaster, then on startup, you could take a
number of pages currently with free space or a percentage of the number of
pages, which ever is greater.  The percentage could have a default, and
also be configurable.


Re: Free space mapping (was Re: Multi-Versions and Vacuum)

От
Jan Wieck
Дата:
grant wrote:
>
> So, then, even if there is an overflow of pages with free space, stepping
> up the vacuum schedule may help this eventually catch up.

Depends.

If you only UPDATE that table, there will never be less freespace again,
unless you VACUUM FULL. In average every UPDATE frees as much space as
it reuses. If your data grows, via INSERTS, then yes, it should catch up
someday if you VACUUM often enough.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Free space mapping (was Re: Multi-Versions and Vacuum)

От
Andrew Sullivan
Дата:
On Fri, Jul 19, 2002 at 02:11:09PM -0400, Tom Lane wrote:

> However, if the FSM is vastly smaller than the number of
> actively-changing pages in your database, then that argument breaks down
> --- in particular, if the FSM can't even keep track of all the
> completely-empty pages in your database then you are going to suffer
> progressive leakage.

That's consistent with the reports people have made, because it was
always in the case of databases with extremely high turnover that it
turned up.  So at the very least, people who anticipate high turnover
on their systems should set the free space map higher (and maybe
vacuum more often -- AFAIK, though, no-one has yet documented the
performance penalty of non-blocking vacuum.  I guess it should be no
worse than any other client, but I'm not sure, and haven't tested
yet).

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Multi-Versions and Vacuum

От
Bruce Momjian
Дата:
Added to TODO:

    * Allow free space map to be auto-sized or warn when it is too small

---------------------------------------------------------------------------

Andrew Sullivan wrote:
> On Fri, Jul 19, 2002 at 12:35:23PM -0400, Tom Lane wrote:
>
> > After VACUUM does its thing (and yes, it removes everything it legally
> > can) it reports the amount of free space it found on each page to the
> > FSM, and the FSM remembers as much of that info as it has room for ---
> > preferring pages with more free space, if it has to be selective.
>
> I think this is where the confusion lies.  The ever-growing tables
> that people are experiencing is somehow related to the new
> non-blocking vacuum and the value of the free space map settings.
> Previous posts on the matter have left some (well, me, at least) with
> the idea that the non-blocking vacuum doesn't get everything, just in
> case the free space map values are too small.
>
> But if I understand now what you're saying, it's this: the vacuum
> _does_ get everything, but if the free space map settings are too
> small, the free space map doesn't know about that free space;
> therefore, the space never gets re-used for future data, because as
> far as the postmaster is concerened, the space isn't available.  So
> the tables keep growing because they have to append their data at the
> end of the file (because the free space map hasn't reported any free
> space).  Run vacuum a few times like this, and there is a substantial
> amount of unused space in the tables, where the free space map
> doesn't have an entry for it.  Is that right?
>
> If so, it should be possible to calculate what one needs for the
> max_fsm_pages setting by calculating how much data is replaced or
> deleted in any given period, vacuum-to-vacuum, I guess.  What one
> would need to do is calculate how much space a row is taking up on
> disk, and then set the max_fsm_pages to comprise (almost?) enough
> space to keep a map big enough to include all the rows likely to be
> recovered by vacuum.  Or am I completely mad?  (I know, I know, "Yes"
> is a possible answer there.)
>
> A
>
> --
> ----
> Andrew Sullivan                               87 Mowat Avenue
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M6K 3E3
>                                          +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Free space mapping (was Re: Multi-Versions and Vacuum)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Andrew Sullivan <andrew@libertyrms.info> writes:
> > But if I understand now what you're saying, it's this: the vacuum
> > _does_ get everything, but if the free space map settings are too
> > small, the free space map doesn't know about that free space;
> > therefore, the space never gets re-used for future data, because as
> > far as the postmaster is concerened, the space isn't available.  So
> > the tables keep growing because they have to append their data at the
> > end of the file (because the free space map hasn't reported any free
> > space).  Run vacuum a few times like this, and there is a substantial
> > amount of unused space in the tables, where the free space map
> > doesn't have an entry for it.  Is that right?
>
> Well, the design theory was that once you free enough space on a
> particular page, it *will* get into the FSM and then will be filled up.
> The fact that small amounts of free space don't get re-used quickly
> doesn't bother me a whole lot; the table size should still stabilize
> at some proportion of free to used space that depends on your vacuum
> frequency.
>
> However, if the FSM is vastly smaller than the number of
> actively-changing pages in your database, then that argument breaks down
> --- in particular, if the FSM can't even keep track of all the
> completely-empty pages in your database then you are going to suffer
> progressive leakage.  I'm not sure where the problem gets bad enough to
> worry about, and have not had time to try to do a probabilistic
> analysis.
>
> If we could determine where trouble sets in, perhaps we could detect the
> situation and print warnings during VACUUM?  Another possibility is to
> auto-size FSM during postmaster startup, perhaps as a fraction of the
> total DB size measured by "du".  (But that probably doesn't work if
> you've been playing symlink games.)  In any case we need more knowledge
> about appropriate FSM sizes than we have at the moment.
>
> The whole FSM concept is new in 7.2 and I'm sure it still needs
> refinement.  Ideas welcome.

Not sure if I have said this before, but I think the auto-sizing needs
to get some feedback from when empty pages overflow the FSM _and_ when
later there is demand for free pages and none exist in the FSM.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073