Re: POC: Cleaning up orphaned files using undo logs

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: POC: Cleaning up orphaned files using undo logs
Дата
Msg-id CAFiTN-s2fFcegTBbUGABREZtG=cR0rAAX1diTbNXF0dSXttDSQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Mar 12, 2019 at 6:51 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sun, Feb 3, 2019 at 11:09 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2018-12-03 18:43:04 +1300, Thomas Munro wrote:
> > > Sorry for my silence... I got stuck on a design problem with the lower
> > > level undo log management code that I'm now close to having figured
> > > out.  I'll have a new patch soon.
>
> Hello all,
>
> Here's a new WIP version of this patch set.  It builds on a fairly
> deep stack of patches being developed by several people.  As mentioned
> before, it's a useful crash-test dummy for a whole stack of technology
> we're working on, but it's also aiming to solve a real problem.
>
> It currently fails in one regression test for a well understood
> reason, fix on the way (see end), and there are some other stability
> problems being worked on.
>
> Here's a quick tour of the observable behaviour, having installed the
> pg_buffercache and test_undorecord extensions:
>
> ==================
>
> postgres=# begin;
> BEGIN
> postgres=# create table foo ();
> CREATE TABLE
>
> Check if our transaction has generated undo data:
>
> postgres=# select logno, discard, insert, xid, pid from pg_stat_undo_logs ;
>  logno |     discard      |      insert      | xid |  pid
> -------+------------------+------------------+-----+-------
>      0 | 0000000000002CD9 | 0000000000002D1A | 476 | 39169
> (1 row)
>
> Here, we see that undo log number 0 has some undo data because discard
> < insert.  We can find out what it says:
>
> postgres=# call dump_undo_records(0);
> NOTICE:  0000000000002CD9: Storage: CREATE dbid=12916, tsid=1663,
> relfile=16386; xid=476, next xact=0
> CALL
>
> The undo record shown there lives in shared buffers, and we can see
> that it's in there with pg_buffercache (the new column smgrid 1 means
> undo data; 0 is regular relation data):
>
> postgres=# select bufferid, smgrid, relfilenode, relblocknumber,
> isdirty, usagecount from pg_buffercache where smgrid = 1;
>  bufferid | smgrid | relfilenode | relblocknumber | isdirty | usagecount
> ----------+--------+-------------+----------------+---------+------------
>         3 |      1 |           0 |              1 | t       |          5
> (1 row)
>
> Even though that's just a dirty page in shared buffers, if we crash
> now and recover, it'll be recreated by a new WAL record that was
> flushed *before* creating the relation file.  We can see that with
> pg_waldump:
>
> rmgr: Storage ... PRECREATE base/12916/16384, blkref #0: smgr 1 rel
> 1663/0/0 blk 1 FPW
> rmgr: Storage ... CREATE base/12916/16384
>
> The PRECREATE record dirtied block 1 of undo log 0.  In this case it
> happened to include a FPW of the undo log page too, following the
> usual rules.  FPWs are rare for undo pages because of the
> REGBUF_WILL_INIT optimisation that applies to the zeroed out pages
> (which is most undo pages, due to the append-mostly access pattern).
>
> Finally, we if commit we see the undo data is discarded by a
> background worker, and if we roll back explicitly or crash and run
> recovery, the file is unlinked.  Here's an example of the crash case:
>
> postgres=# begin;
> BEGIN
> postgres=# create table foo ();
> CREATE TABLE
> postgres=# select relfilenode from pg_class where relname = 'foo';
>  relfilenode
> -------------
>        16395
> (1 row)
>
> postgres=# select pg_backend_pid();
>  pg_backend_pid
> ----------------
>           39169
> (1 row)
>
> $ kill -9 39169
>
> ... server restarts, recovers ...
>
> $ ls pgdata/base/12916/16395
> pgdata/base/12916/16395
>
> It's still there, though it's been truncated by an undo worker (see
> end of email).  And finally, after the next checkpoint:
>
> $ ls pgdata/base/12916/16395
> ls: pgdata/base/12916/16395: No such file or directory
>
> That's the end of the quick tour.
>
> Most of these patches should probably be discussed in other threads,
> but I'm posting a snapshot of the full stack here anyway.  Here's a
> patch-by-patch summary:
>
> === 0001 "Refactor the fsync mechanism to support future SMGR
> implementations." ===
>
> The 0001 patch has its own CF thread
> https://commitfest.postgresql.org/22/1829/ and is from Shawn Debnath
> (based on earlier work by me), but I'm including a copy here for
> convenience/cfbot.
>
> === 0002 "Add SmgrId to smgropen() and BufferTag." ===
>
> This is new, and is based on the discussion from another recent
> thread[1] about how we should identify buffers belonging to different
> storage managers.  In earlier versions of the patch-set I had used a
> special reserved DB OID for undo data.  Tom Lane didn't like that idea
> much, and Anton Shyrabokau (via Shawn Debnath) suggested making
> ForkNumber narrower so we can add a new field to BufferTag, and Andres
> Freund +1'd my proposal to add the extra value as a parameter to
> smgropen().  So, here is a patch that tries those ideas.
>
> Another way to do this would be to widen RelFileNode instead, to avoid
> having to pass around the SMGR ID separately in various places.
> Looking at the number of places that have to chance, you can probably
> see why we wanted to use a magic DB OID instead, and I'm not entirely
> convinced that it wasn't better that way, or that I've found all the
> places that need to carry an smgrid alongside a RelFileNode.
>
> Archeological note: smgropen() was like that ~15 years ago before
> commit 87bd9563, but buffer tags didn't include the SMGR ID.
>
> I decided to call md.c's ID "SMGR_RELATION", describing what it really
> holds -- regular relations -- rather than perpetuating the doubly
> anachronistic "magnetic disk" name.
>
> While here, I resurrected the ancient notion of a per-SMGR 'open'
> routine, so that a small amount of md.c-specific stuff could be kicked
> out of smgr.c and future implementations can do their own thing here
> too.
>
> While doing that work I realised that at least pg_rewind needs to
> learn about how different storage managers map blocks to files, so
> that's a new TODO item requiring more thought.  I wonder what other
> places know how to map { RelFileNode, ForkNumber, BlockNumber } to a
> path + offset, and I wonder what to think about the fact that some of
> them may be non-backend code...
>
> === 0003 "Add undo log manager." ===
>
> This and the next couple of patches live in CF thread
> https://commitfest.postgresql.org/22/1649/ but here's a much newer
> snapshot that hasn't been posted there yet.
>
> Manages a set of undo logs in shared memory, manages undo segment
> files, tracks discard, insert, end pointers visible in
> pg_stat_undo_logs.  With this patch you can allocate and discard space
> in undo logs using the UndoRecPtr type to refer to addresses, but
> there is no access to the data yet.  Improvements since the last
> version are not requiring DSM segments, proper FPW support and reduced
> WAL traffic.  Previously there were extra per-xact and per-checkpoint
> records requiring retry-loops in code that inserted undo data.
>
> === 0004 "Provide access to undo log data via the buffer manager." ===
>
> Provide SMGR_UNDO.  While the 0003 patch deals with allocating and
> discarding undo address space and makes sure that backing files exist,
> this patch lets you read and write buffered data in them.
>
> === 0005 "Allow WAL record data on first modification after a checkpoint." ===
>
> Provide a way for data to be attached to a WAL-registered block that
> is only included if this turns out to be the first WAL record that
> touches the block after a checkpoint.  This is a bit like FPW images,
> except that it's arbitrary extra data and happens even if FPW is off.
> This is used to capture a copy of the (tiny) undo log meta-data
> (primary the insertion pointer) to fix a consistency problem when
> recovering from an online checkpoint.
>
> === 0006 + 0007 "Provide interfaces to store and fetch undo records." ===
>
> This is a snapshot of work by my colleagues Dilip, Rafia and others
> based on earlier prototyping by Robert.  While the earlier patches
> give you buffered binary undo data, this patch introduces the concept
> of high level undo records that can be inserted, and read back given
> an UndoRecPtr.  This is a version presented on another thread already;
> here it's lightly changed due to rebasing by me.
>
> Undo-aware modules should design a set of undo record types, and
> insert exactly the same ones at do and undo time.
>
> The 0007 patch is fixups from me to bring that code into line with
> changes to the lower level patches.  Future versions will be squashed
> and tidied up; still working on that.

Currently, undo branch[1] contain an older version of the (undo
interface + some fixup).  Now, I have merged the latest changes from
the zheap branch[2] to the undo branch[1]
which can be applied on top of the undo storage commit[3].  For
merging those changes, I need to add some changes to the undo log
storage patch as well for handling the multi log transaction.  So I
have attached two patches, 1) improvement to undo log storage 2)
complete undo interface patch which include 0006+0007 from undo
branch[1] + new changes on the zheap branch.

[1] https://github.com/EnterpriseDB/zheap/tree/undo
[2] https://github.com/EnterpriseDB/zheap
[3] https://github.com/EnterpriseDB/zheap/tree/undo
(b397d96176879ed5b09cf7322b8d6f2edd8043a5)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: clean up docs for v12