Re: generalized conveyor belt storage

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: generalized conveyor belt storage
Дата
Msg-id CAH2-Wz=4h8L=+h_FOrRSvBQNP6ifNiwCxCtXbMT5kMyTm9gQNQ@mail.gmail.com
обсуждение исходный текст
Ответ на generalized conveyor belt storage  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: generalized conveyor belt storage  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Tue, Dec 14, 2021 at 3:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I got interested in this problem again because of the
> idea discussed in
> https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com
> of having a "dead TID" relation fork in which to accumulate TIDs that
> have been marked as dead in the table but not yet removed from the
> indexes, so as to permit a looser coupling between table vacuum and
> index vacuum. That's yet another case where you accumulate new data
> and then at a certain point the oldest data can be thrown away because
> its intended purpose has been served.

Thanks for working on this! It seems very strategically important to me.

> So here's a patch. Basically, it lets you initialize a relation fork
> as a "conveyor belt," and then you can add pages of basically
> arbitrary data to the conveyor belt and then throw away old ones and,
> modulo bugs, it will take care of recycling space for you. There's a
> fairly detailed README in the patch if you want a more detailed
> description of how the whole thing works.

How did you test this? I ask because it would be nice if there was a
convenient way to try this out, as somebody with a general interest.
Even just a minimal test module, that you used for development work.

> When I was chatting with Andres about this, he jumped to the question
> of whether this could be used to replace SLRUs. To be honest, it's not
> really designed for applications that are quite that intense.

I personally think that SLRUs (and the related problem of hint bits)
are best addressed by tracking which transactions have modified what
heap blocks (perhaps only approximately), and then eagerly cleaning up
aborted transaction IDs, using a specialized version of VACUUM that
does something like heap pruning for aborted xacts. It just seems
weird that we keep around clog in order to not have to run VACUUM too
frequently, which (among other things) already cleans up after aborted
transactions. Having a more efficient data structure for commit status
information doesn't seem all that promising, because the problem is
actually our insistence on remembering which XIDs aborted almost
indefinitely. There is no fundamental reason why it has to work that
way. I don't mean that it could in principle be changed (that's almost
always true); I mean that it seems like an accident of history, that
could have easily gone another way: the ancestral design of clog
existing in a world without MVCC. It seems like a totally vestigial
thing to me, which I wouldn't say about a lot of other things (just
clog and freezing).

Something like the conveyor belt seems like it would help with this
other kind of VACUUM, mind you. We probably don't want to do anything
like index vacuuming for these aborted transactions (actually maybe we
want to do retail deletion, which is much more likely to work out
there). But putting the TIDs into a store used for dead TIDs could
make sense. Especially in extreme cases.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Life cycles of tuple descriptors