Re: Frequent Update Project: Design Overview of HOTUpdates

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Frequent Update Project: Design Overview of HOTUpdates
Дата
Msg-id 1163191197.3634.918.camel@silverbirch.site
обсуждение исходный текст
Ответ на Re: Frequent Update Project: Design Overview of HOT Updates  ("Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at>)
Список pgsql-hackers
On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote:

> > I'm not sure this really solves that problem because there 
> > are still DELETEs to consider but it does remove one factor 
> > that exacerbates it unnecessarily.
> 
> Yea, so you still need to vaccum the large table regularly.

HOT covers the use-case of heavy updating, which in many common cases
occurs on tables with few inserts/deletes. HOT would significantly
reduce the need to vacuum since deletes and wraparound issues would be
the only remaining reasons to do this.

[I have some ideas for how to optimize tables with heavy INSERT/DELETE
activity, but that case is much less prevalent than heavy UPDATEs.]

> > I think the vision is that the overflow table would never be 
> > very large because it can be vacuumed very aggressively. It 
> > has only tuples that are busy and will need vacuuming as soon 
> > as a transaction ends. Unlike the main table which is mostly 
> > tuples that don't need vacuuming. 
> 
> Ok, but you have to provide an extra vacuum that does only that then
> (and it randomly touches heap pages, and only does partial work there).

Sure, HOT needs a specially optimised VACUUM.

> > So a heap that's double in size necessary takes twice as 
> > long as necessary to scan. The fact that the overflow tables 
> > are taking up space isn't interesting if they don't have to 
> > be scanned.
> 
> The overflow does have to be read for each seq scan. And it was stated
> that it would
> be accessed with random access (follow tuple chain).
> But maybe we can read the overflow same as if it where an additional
> segment file ?

Not without taking a write-avoiding lock on the table, unfortunately.

> > Hitting the overflow tables should be quite rare, it only 
> > comes into play when looking at concurrently updated tuples. 
> > It certainly happens but most tuples in the table will be 
> > committed and not being concurrently updated by anyone else.
> 
> The first update moves the row to overflow, only the 2nd next might be
> able to pull it back.
> So on average you would have at least 66% of all updated rows after last
> vacuum in the overflow.
> 
> The problem with needing very frequent vacuums is, that you might not be
> able to do any work because of long transactions.

HOT doesn't need more frequent VACUUMs, it is just more efficient and so
can allow them, when needed to avoid I/O. Space usage in the overflow
relation is at its worst in the case of an enormous table with low
volume random updates, but note that it is *never* worse than current
space usage. In the best case, which is actually fairly common in
practice: a small number of rows of a large table are being updated by a
steady stream of concurrent updates, we find the overflow relation needs
only a few 100 tuples, so regular vacuuming will be both easy and
effective.

As an aside, note that HOT works best in real-world situations, not
benchmarks such as TPC where the I/Os are deliberately randomised to
test the scalability of the RDBMS. But even then, HOT works better.

The long-running transaction issue remains unsolved in this proposal,
but I have some ideas for later.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Simon Riggs"
Дата:
Сообщение: Re: Frequent Update Project: Design Overview ofHOTUpdates
Следующее
От: Neil Conway
Дата:
Сообщение: Re: [PATCHES] Indicate disabled triggers in \d