Re: Undo logs

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Undo logs
Дата
Msg-id CAA4eK1L3JDwDobi8yBeD5vt76V+9W0nfc4-kf2ViiOJF4KGe6Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Undo logs  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: Undo logs  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sun, Sep 2, 2018 at 12:19 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>
> On Fri, Aug 31, 2018 at 10:24 PM Dilip Kumar
> <dilip.kumar@enterprisedb.com> wrote:
> > On Fri, Aug 31, 2018 at 3:08 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > As Thomas has already mentioned upthread that we are working on an
> > > undo-log based storage and he has posted the patch sets for the lowest
> > > layer called undo-log-storage.
> > >
> > > This is the next layer which sits on top of the undo log storage,
> > > which will provide an interface for prepare, insert, or fetch the undo
> > > records. This layer will use undo-log-storage to reserve the space for
> > > the undo records and buffer management routine to write and read the
> > > undo records.
>
> I have also pushed a new WIP version of the lower level undo log
> storage layer patch set to a public branch[1].  I'll leave the earlier
> branch[2] there because the record-level patch posted by Dilip depends
> on it for now.
>

I have started reading the patch and have a few assorted comments
which are mentioned below.  I have been involved in the high-level
design of this module and I have also shared given some suggestions
during development, but this is mainly Thomas's work with some help
from Dilip.  It would be good if other members of the community also
review the design or participate in the discussion.

Comments
------------------
undo/README
-----------------------
1.
+The undo log subsystem provides a way to store data that is needed for
+a limited time.  Undo data is generated
whenever zheap relations are
+modified, but it is only useful until (1) the generating transaction
+is committed or
rolled back and (2) there is no snapshot that might
+need it for MVCC purposes.

I think the snapshots need it for MVCC purpose and we need it till the
transaction is committed and all-visible.

2.
+* the tablespace that holds its segment files
+* the persistence level (permanent, unlogged, temporary)
+* the
"discard" pointer; data before this point has been discarded
+* the "insert" pointer: new data will be written here
+*
the "end" pointer: a new undo segment file will be needed at this point
+
+The three pointers discard, insert and end
move strictly forwards
+until the whole undo log has been exhausted.  At all times discard <=
+insert <= end.  When
discard == insert, the undo log is empty
+(everything that has ever been inserted has since been discarded).
+The
insert pointer advances when regular backends allocate new space,
+and the discard pointer usually advances when an
undo worker process
+determines that no session could need the data either for rollback or
+for finding old versions of
tuples to satisfy a snapshot.  In some
+special cases including single-user mode and temporary undo logs the
+discard
pointer might also be advanced synchronously by a foreground
+session.

Here, the use of insert and discard pointer are explained nicely.  Can
you elaborate the usage of end pointer as well?

3.
+UndoLogControl objects corresponding to the current set of active undo
+logs are held in a fixed-sized pool in shared
memory.  The size of
+the array is a multiple of max_connections, and limits the total size of
+transactions.

Here, it is mentioned the array is multiple of max_connections, but
the code uses MaxBackends.  Can you sync them?

4.
+The meta-data for all undo logs is written to disk at every
+checkpoint.  It is stored in files under
PGDATA/pg_undo/, using the
+checkpoint's redo point (a WAL LSN) as its filename.  At startup time,
+the redo point's
file can be used to restore all undo logs' meta-data
+as of the moment of the redo point into shared memory.  Changes
to the
+discard pointer and end pointer are WAL-logged by undolog.c and will
+bring the in-memory meta-data up to date
in the event of recovery
+after a crash.  Changes to insert pointers are included in other WAL
+records (see below).

I see one inconvenience for using checkpoint's redo point for meta
file name which is what if someone uses pg_resetxlog to truncate the
redo?  Is there any reason we can't keep a different name for the meta
file?

5.
+stabilize on one undo log per active writing backend (or more if
+different tablespaces are persistence levels are
used).

/tablespaces are persistence levels/tablespaces and persistence levels

I think due to the above design, we can now reach the maximum number
of undo logs quickly as the patch now uses fixed shared memory to
represent them.  I am not sure if there is an easy way to avoid that.
Can we try to expose guc for a maximum number of undo slots such that
instead of MaxBackends * 4, it could be MaxBackends * <new_guc>?

undo-log-manager patch
------------------------------------
6.
@@ -127,6 +128,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
  size = add_size(size,
ProcGlobalShmemSize());
  size = add_size(size, XLOGShmemSize());
  size = add_size(size,
CLOGShmemSize());
+ size = add_size(size, UndoLogShmemSize());
  size = add_size(size,
CommitTsShmemSize());
  size = add_size(size, SUBTRANSShmemSize());
  size = add_size(size,
TwoPhaseShmemSize());
@@ -219,6 +221,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
  */

XLOGShmemInit();
  CLOGShmemInit();
+ UndoLogShmemInit();

It seems to me that we will always allocate shared memory for undo
logs irrespective of whether someone wants to use them or not.  Am I
right?  If so, isn't it better if we find some way that this memory is
allocated only when someone has a need for it?

7.
+/*
+ * How many undo logs can be active at a time?  This creates a theoretical
+ * maximum transaction size, but it we
set it to a factor the maximum number
+ * of backends it will be a very high limit.  Alternative designs involving
+ *
demand paging or dynamic shared memory could remove this limit but
+ * introduce other problems.
+ */
+static inline
size_t
+UndoLogNumSlots(void)
+{+ return MaxBackends * 4;
+}

Seems like typos in above comment
/but it we/but if we
/factor the maximum number  -- the sentence is not completely clear.

8.
+ * Extra shared memory will be managed using DSM segments.
+ */
+Size
+UndoLogShmemSize(void)

You told in the email that the patch doesn't use DSM segments anymore,
but the comment seems to indicate otherwise.

9.
/*
+ * TODO: Should this function be usable in a critical section?
+
 * Woudl it make sense to detect that we are in a critical

Typo
/Woudl/Would

10.
+static void
+undolog_xlog_attach(XLogReaderState *record)
+{
+ xl_undolog_attach *xlrec = (xl_undolog_attach *)
XLogRecGetData(record);
+ UndoLogControl *log;
+
+ undolog_xid_map_add(xlrec->xid, xlrec->logno);
+
+ /*
+
 * Whatever follows is the first record for this transaction.  Zheap will
+ * use this to add
UREC_INFO_TRANSACTION.
+ */
+ log = get_undo_log(xlrec->logno, false);
+ /* TODO */

There are a lot of TODO's in the code, among them, above is not at all clear.

11.
+ UndoRecPtr oldest_data;
+
+} UndoLogControl;

Extra space after the last member looks odd.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Refactoring the checkpointer's fsync request queue
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - add pseudo-random permutation function