Re: Report reorder buffer size
От | Ashutosh Bapat |
---|---|
Тема | Re: Report reorder buffer size |
Дата | |
Msg-id | CAExHW5stjbY_jqcP=PAj6w93dX_zTMtjrtm2S7bZvcjQkL81dw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Report reorder buffer size (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Ответы |
Re: Report reorder buffer size
|
Список | pgsql-hackers |
Hi Bertrand, Thanks for your response. I am glad that you have found the proposal to be generally useful. On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > > Using these trendlines, a user may decide to keep > > logical_decoding_work_mem to minimum and thus let all the large > > transactions spill to disk or be streamed. > > Right. Also, the issue with spill files is that when they are read back from disk > then the reorder buffer can consume a lot of memory. That's why I think tracking > the xid, sub xid and LSNs could be useful too. That could help diagnose why > it was using that memory. When the replication is stuck or is slow or WAL builds up a detailed look at the reorder buffer's contents would be useful. It will need to be done outside of pg_stat_replication or pg_stat_replication_slots. I think it will be useful to provide a function which will trigger a given WAL sender to dump the information about the contents of the reorder buffer to the server error log or send them to the requesting client. > > I think pg_stat_replication is a good place to record "real time" activities. > Ok. Thanks for confirming. > Maybe we could log 2 metrics: the reorder buffer size before creating the spill > files and the reorder buffer size while reading back the spill files? That would > help make the distinction when logical_decoding_work_mem is involved (i.e before > creating the spill files) and when it is not (reading them back). > > pg_stat_replication_slots could be used to add some counters too: like the > number of times logical_decoding_work_mem has been reached while decoding from > that slot. I think pg_stat_replication_slots::spill_count and pg_stat_replication_slots::stream_count give that value. > Also maybe record the max value the reorder buffer reached and > the associated xid, number of sub xids, and LSN? That could be useful to diagnose > high memory consumption and/or OOM issues due to logical decoding. A function which dumps the information about reorder buffers can be used along with the trendline for this purpose. For example, if a user notices WAL accumulation and an upward trend in the metric exposed by my patch, they can trigger a dump of reorder buffer contents at that time and diagnose the problem and even fix it proactively. > > Also, should we log some of information to the server log? > Information in server error logs is hard to access and analyse, especially in cloud environment. Having a SQL construct is better. > > But before I work on those, I would like to know others' opinions, > > usefulness and acceptance. > > Thanks for working on it. I like the idea and also think that is an area where > more details/metrics should be provided. Thanks. -- Best Wishes, Ashutosh Bapat
В списке pgsql-hackers по дате отправления: