Reducing the size of BufferTag & remodeling forks

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Reducing the size of BufferTag & remodeling forks
Дата	2 июля 2015 г. 16:36:30
Msg-id	20150702133619.GB16267@alap3.anarazel.de обсуждение исходный текст
Ответы	Re: Reducing the size of BufferTag & remodeling forks (Tom Lane <tgl@sss.pgh.pa.us>) Re: Reducing the size of BufferTag & remodeling forks (Alvaro Herrera <alvherre@2ndquadrant.com>) Re: Reducing the size of BufferTag & remodeling forks (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

I've complained a number of times that our BufferTag is ridiculously
large:
typedef struct buftag
{ RelFileNode rnode; /* physical relation identifier */ ForkNumber forkNum; BlockNumber blockNum;
/*blknum relative to begin of reln */

} BufferTag;

typedef struct RelFileNode
{ Oid spcNode; /* tablespace */ Oid dbNode; /* database */ Oid relNode;
/* relation */

} RelFileNode;

that amounts to 20 bytes. That's problematic because we frequently have
to compare or hash the entire buffer tag. Comparing 20bytes is rather
branch intensive, and shows up noticably on profiles. It's also a
stumbling block on the way to a smarter buffer mapping data structure,
because it makes e.g. trees rather deep.

The buffer tag is currently used in two situations:

1) Dealing with the buffer mapping, we need to identify the underlying file uniquely and we need the block number (8
bytes).

2) When writing out the a block we need, in addition to 1), have information about where to store the file. That
requiresthe tablespace and database.

You may know that a filenode (RelFileNode->relNode) is currently *not*
unique across databases and tablespaces.

Additionally you might have noticed that the above description also
disregards relation forks.

I think we should work towards 1) being sufficient for its purpose. My
suggestion to get there is twofold:

1) Introduce a shared pg_relfilenode table. Every table, even shared/nailed ones, get an entry therein. It's there to
makeit possibly to uniquely allocate relfilenodes across databases & tablespaces.

2) Replace relation forks, with the exception of the init fork which is special anyway, with separate relfilenodes.
Storedin seperate columns in pg_class.

This scheme has a number of advantages: We don't need to look at the
filesystem anymore to find out whether a relfilenode exists. The buffer
tags are 8 bytes. The number of stats doesn't scale O(#forks *
#relations) anymore, allowing us to add additional forks more easily.

I think something akin to init forks is going to survive because they've
to be copied without access to the catalogs - but that's fine, they just
aren't allowed to go through shared buffers. Afaics that's not a
problem.

Obviously this is a rather high-level description, but right now this
sounds doable to me.

Thoughts?

- Andres

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 02 июля 2015 г., 16:34:55
Сообщение: Re: WALWriter active during recovery

Следующее

От: Andres Freund
Дата: 02 июля 2015 г., 16:39:01
Сообщение: Re: WALWriter active during recovery

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Reducing the size of BufferTag & remodeling forks

Предыдущее

Следующее