Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

Поиск
Список
Период
Сортировка
От Steve Singer
Тема Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Дата
Msg-id BLU0-SMTP67D92025ED330B4714F8DDCE00@phx.gbl
обсуждение исходный текст
Ответ на Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 12-06-21 04:37 AM, Andres Freund wrote:
> Hi Steve,
> Thanks!
>

Attached is a detailed review of the patch.

Very good analysis, thanks!
> Another reasons why we cannot easily do 1) is that subtransactions aren't
> discernible from top-level transactions before the top-level commit happens,
> we can only properly merge in the right order (by "sorting" via lsn) once we
> have seen the commit record which includes a list of all committed
> subtransactions.
>

Based on that and your comments further down in your reply (and that no 
one spoke up and disagreed with you) It sounds like that doing (1) isn't 
going to be practical.

> I also don't think 1) would be particularly welcome by people trying to
> replicate into foreign systems.
>

They could still sort the changes into transaction groups before 
applying to the foreign system.


> I planned to have some cutoff 'max_changes_in_memory_per_txn' value. 
> If it has
> been reached for one transaction all existing changes are spilled to disk. New
> changes again can be kept in memory till its reached again.
>
Do you want max_changes_per_in_memory_txn or do you want to put a limit 
on the total amount of memory that the cache is able to use? How are you 
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how 
much memory their system has and that they can devote to the apply cache 
versus other things, giving them guidance on how estimating how much 
open transactions they might have at a point in time  and how many
WAL change records each transaction generates seems like a step 
backwards from the progress we've been making in getting Postgresql to 
be easier to tune.  The maximum number of transactions that could be 
opened at a time is governed by max_connections on the master at the 
time the WAL was generated , so I don't even see how the machine 
processing the WAL records could autotune/guess that.



> We need to support serializing the cache for crash recovery + shutdown of the
> receiving side as well. Depending on how we do the wal decoding we will need
> it more frequently...
>

Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state 
everytime it calls cache->commit() to ensure that consumers don't get 
invoked twice on the same transaction.

If the apply module is making changes to the same backend that the apply 
cache serializes to then both the state for the apply cache and the 
changes that committed changes/transactions make will be persisted (or 
not persisted) together.   What if I am replicating from x86 to x86_64 
via a apply module that does textout conversions?

x86         Proxy                                 x86_64
----WAL------> apply                     cache                      |   (proxy catalog)                      |
          apply module                      textout  --------------------->                                      SQL
statements


How do we ensure that the commits are all visible(or not visible)  on 
the catalog on the proxy instance used for decoding WAL, the destination 
database, and the state + spill files of the apply cache stay consistent 
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd 
rather we didn't).  Can we come up with a way of avoiding the need for 
them to be consistent with each other?

I think apply modules will need to be able to be passed the same 
transaction twice (once before the crash and again after) and come up 
with a  way of deciding if that transaction has  a) Been applied to the 
translation/proxy catalog and b) been applied on the replica instance.   
How is the walreceiver going to decide which WAL sgements it needs to 
re-process after a crash?  I would want to see more of these details 
worked out before we finalize the interface between the apply cache and 
the apply modules and how the serialization works.


Code Review
=========

applycache.h
-----------------------
+typedef struct ApplyCacheTupleBuf
+{
+    /* position in preallocated list */
+    ilist_s_node node;
+
+    HeapTupleData tuple;
+    HeapTupleHeaderData header;
+    char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big 
the data in the transaction is? Wouldn't workloads with inserts of lots 
of small rows in a transaction eat up lots of memory that is allocated 
but storing nothing?  The only alternative I can think of is dynamically 
allocating these and I don't know what the cost/benefit of that overhead 
will be versus spilling to disk sooner.

+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

ApplyCacheReleaseChange(...) ?  I keep thinking of 'Return' as us 
returning the data somewhere not the memory.


applycache.c:
-------------------

I've taken a quick look through this file and I don't see any issues 
other than the many FIXME's and other issues you've identified already, 
which I don't expect you to address in this CF.

> Andres
>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WAL format changes
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: WAL format changes