Re: [Patch] Optimize dropping of relation buffers using dlist

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [Patch] Optimize dropping of relation buffers using dlist
Дата
Msg-id 20200731202332.feicx3miocjfanka@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [Patch] Optimize dropping of relation buffers using dlist  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,

On 2020-07-31 15:50:04 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Indeed. The buffer mapping hashtable already is visible as a major
> > bottleneck in a number of workloads. Even in readonly pgbench if s_b is
> > large enough (so the hashtable is larger than the cache). Not to speak
> > of things like a cached sequential scan with a cheap qual and wide rows.
> 
> To be fair, the added overhead is in buffer allocation not buffer lookup.
> So it shouldn't add cost to fully-cached cases.  As Tomas noted upthread,
> the potential trouble spot is where the working set is bigger than shared
> buffers but still fits in RAM (so there's no actual I/O needed, but we do
> still have to shuffle buffers a lot).

Oh, right, not sure what I was thinking.


> > Wonder if the temporary fix is just to do explicit hashtable probes for
> > all pages iff the size of the relation is < s_b / 500 or so. That'll
> > address the case where small tables are frequently dropped - and
> > dropping large relations is more expensive from the OS and data loading
> > perspective, so it's not gonna happen as often.
> 
> Oooh, interesting idea.  We'd need a reliable idea of how long the
> relation had been (preferably without adding an lseek call), but maybe
> that's do-able.

IIRC we already do smgrnblocks nearby, when doing the truncation (to
figure out which segments we need to remove). Perhaps we can arrange to
combine the two? The layering probably makes that somewhat ugly :(

We could also just use pg_class.relpages. It'll probably mostly be
accurate enough?

Or we could just cache the result of the last smgrnblocks call...


One of the cases where this type of strategy is most intersting to me is
the partial truncations that autovacuum does... There we even know the
range of tables ahead of time.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: Re: Nicer error when connecting to standby with hot_standby=off
Следующее
От: Robert Haas
Дата:
Сообщение: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits