Re: Turning off HOT/Cleanup sometimes

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Turning off HOT/Cleanup sometimes
Дата
Msg-id 20150421150409.GC10101@momjian.us
обсуждение исходный текст
Ответ на Re: Turning off HOT/Cleanup sometimes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: Turning off HOT/Cleanup sometimes  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: Turning off HOT/Cleanup sometimes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Apr 20, 2015 at 07:13:38PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > This seems simple to implement: keep two counters, where the second one
> > > is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
> > > reset the first counter so that further 5 pages will get HOT pruned.  5%
> > > seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
> > > essentially +infinity.)
> > 
> > This would tend to dirty non-sequential heap pages --- it seems best to
> > just clean as many as we are supposed to, then skip the rest, so we can
> > write sequential dirty pages to storage.
> 
> Keep in mind there's a disconnect between dirtying a page and writing it
> to storage.  A page could remain dirty for a long time in the buffer
> cache.  This writing of sequential pages would occur at checkpoint time
> only, which seems the wrong thing to optimize.  If some other process
> needs to evict pages to make room to read some other page in, surely
> it's going to try one page at a time, not write "many sequential dirty
> pages."

Yes, it might be too much optimization to try to get the checkpoint to
flush all those pages sequentially, but I was thinking of our current
behavior where, after an update of all rows, we effectively write out
the entire table because we have dirtied every page.  I guess with later
prune-based writes, we aren't really writing all the pages as we have
the pattern where pages with prunable content is kind of random. I guess
I was just wondering what value there is to your write-then-skip idea,
vs just writing the first X% of pages we find?  Your idea certainly
spreads out the pruning, and doesn't require knowing the size of the
table, though I though that information was easily determined.

One thing to consider is how we handle pruning of index scans that hit
multiple heap pages.  Do we still write X% of the pages in the table, or
%X of the heap pages we actually access via SELECT?  With the
write-then-skip approach, we would do X% of the pages we access, while
with the first-X% approach, we would probably prune all of them as we
would not be accessing most of the table.  I don't think we can do the
first first-X% of pages and have the percentage based on the number of
pages accessed as we have no way to know how many heap pages we will
access from the index.  (We would know for bitmap scans, but that
complexity doesn't seem worth it.)  That would argue, for consistency
with sequential and index-based heap access, that your approach is best.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Freeze avoidance of very large table.
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: WIP Patch for GROUPING SETS phase 1