Re: SLRUs in the main buffer pool, redux

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: SLRUs in the main buffer pool, redux
Дата
Msg-id 128709bc-992c-b57a-7174-098433b7faa4@iki.fi
обсуждение исходный текст
Ответ на Re: SLRUs in the main buffer pool, redux  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On 25/07/2022 09:54, Heikki Linnakangas wrote:
> I'll write a separate post with my thoughts on the high-level design of
> this, ...

This patch represents each SLRU as a relation. The CLOG is one relation, 
pg_subtrans is another relations, and so forth. The SLRU relations use a 
different SMGR implementation, which is implemented in slru.c.

As you know, I'd like to make the SMGR implementation replaceable by 
extensions. We need that for Neon, and I'd imagine it to be useful for 
many other things, too, like compression, encryption, or restoring data 
from a backup on-demand. I'd like all file operations to go through the 
smgr API as much as possible, so that an extension can intercept SLRU 
file operations too. If we introduce another internal SMGR 
implementation, then an extension would need to replace both 
implementations separately. I'd prefer to use the current md.c 
implementation for SLRUs too, instead.

Thus I propose:

Let's represent each SLRU *segment* as a separate relation, giving each 
SLRU segment a separate relNumber. Then we can use md.c for SLRUs, too. 
Dropping an SLRU segment can be done by calling smgrunlink(). You won't 
need to deal with missing segments in md.c, because each individual SLRU 
file is a complete file, with no holes. Dropping buffers for one SLRU 
segment can be done with DropRelationBuffers(), instead of introducing 
the new DiscardBuffer() function. You can let md.c handle the caching of 
the file descriptors, you won't need to reimplement that with 
'slru_file_segment'.

SLRUs won't need the segmentation into 1 GB segments that md.c does, 
because each SLRU file is just 256 kB in size. That's OK. (BTW, I 
propose that we bump the SLRU segment size up to a whopping 1 MB or even 
more, while we're at it. But one step at a time.)

SLRUs also won't need the concept of relation forks. That's fine, we can 
just use MAIN_FORKNUM. elated to that, I'm somewhat bothered by the way 
that SMgrRelation currently bundles all the relation forks together. A 
comment in smgr.h says:

> smgr.c maintains a table of SMgrRelation objects, which are essentially
> cached file handles.

But when we introduced relation forks, that got a bit muddled. Each 
SMgrRelation object is now a file handle for a bunch of related relation 
forks, and each fork is a separate file that can be created and 
truncated separately.

That means that an SMGR implementation, like md.c, needs to track the 
file handles for each fork. I think things would be more clear if we 
unbundled the forks at the SMGR level, so that we would have a separate 
SMgrRelation struct for each fork. And let's rename it to SMgrFile to 
make the role more clear. I think that would reduce the confusion when 
we start using it for SLRUs; an SLRU is not a relation, after all. md.c 
would still segment each logical file into 1 GB segments, but it would 
not need to deal with forks.

Attached is a draft patch to refactor it that way, and a refactored 
version of your SLRU patch over that.

The relation cache now needs to hold a separate reference to the 
SMgrFile of each fork of a relation. And smgr cache invalidation still 
works at relation granularity. Doing it per SmgrFile would be more clean 
in smgr.c, but in practice all the forks of a relation are unlinked and 
truncated together, so sending a separate invalidation event for each 
SMgrFile would increase the cache invalidation traffic.

In the passing, I moved the DropRelationBuffers() calls from smgr.c to 
the callers. smgr.c doesn't otherwise make any effort to keep the buffer 
manager in sync with the state on-disk, that responsibility is normally 
with the code that *uses* the smgr functions, so I think that's more 
logical.

The first patch currently causes the '018_wal_optimize.pl' test to fail. 
I guess I messed up something in the relation truncation code, but I 
haven't investigated it yet. I wanted to post this to get comments on 
the design, before spending more time on that.

What do you think?

- Heikki
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: Proposal to provide the facility to set binary format output for specific OID's per session
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger