Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
От | Bossart, Nathan |
---|---|
Тема | Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes |
Дата | |
Msg-id | CB7412F4-4E03-4E79-A29F-E91C9CA55FEE@amazon.com обсуждение исходный текст |
Ответ на | Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>) |
Ответы |
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
(Konstantin Knizhnik <knizhnik@garret.ru>)
|
Список | pgsql-hackers |
I noticed this thread and thought I'd share my experiences building something similar for Multi-AZ DB clusters [0]. It's not a strict RPO mechanism, but it does throttle backends in an effort to keep the replay lag below a configured maximum. I can share the code if there is interest. I wrote it as a new extension, and except for one piece that I'll go into later, I was able to avoid changes to core PostgreSQL code. The extension manages a background worker that periodically assesses the state of the designated standbys and updates an atomic in shared memory that indicates how long to delay. A transaction callback checks this value and sleeps as necessary. Delay can be injected for write-enabled transactions on the primary, read-only transactions on the standbys, or both. The extension is heavily configurable so that it can meet the needs of a variety of workloads. One interesting challenge I encountered was accurately determining the amount of replay lag. The problem was twofold. First, if there is no activity on the primary, there will be nothing to replay on the standbys, so the replay lag will appear to grow unbounded. To work around this, the extension's background worker periodically creates an empty COMMIT record. Second, if a standby reconnects after a long time, the replay lag won't be accurate for some time. Instead, the replay lag will slowly increase until it reaches the correct value. Since the delay calculation looks at the trend of the replay lag, this apparent unbounded growth causes it to inject far more delay than is necessary. My guess is that this is related to 9ea3c64, and maybe it is worth rethinking that logic. For now, the extension just periodically reports the value of GetLatestXTime() from the standbys to the primary to get an accurate reading. This is done via a new replication callback mechanism (which requires core PostgreSQL changes). I can share this patch along with the extension, as I bet there are other applications for it. I should also note that the extension only considers "active" standbys and primaries. That is, ones with an active WAL sender or WAL receiver. This avoids the need to guess what should be done during a network partition, but it also means that we must gracefully handle standbys reconnecting with massive amounts of lag. The extension is designed to slowly ramp up the amount of injected delay until the standby's apply lag is trending down at a sufficient rate. I see that an approach was suggested upthread for throttling based on WAL distance instead of per-transaction. While the transaction approach works decently well for certain workloads (e.g., many small transactions like those from pgbench), it might require further tuning for very large transactions or workloads with a variety of transaction sizes. For that reason, I would definitely support building a way to throttle based on WAL generation. It might be a good idea to avoid throttling critical activity such as anti-wraparound vacuuming, too. Nathan [0] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts.html
В списке pgsql-hackers по дате отправления: