Обсуждение: Add last_commit_lsn to pg_stat_database

Поиск
Список
Период
Сортировка

Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
I've previously noted in "Add last commit LSN to
pg_last_committed_xact()" [1] that it's not possible to monitor how
many bytes of WAL behind a logical replication slot is (computing such
is obviously trivial for physical slots) because the slot doesn't need
to replicate beyond the last commit. In some cases it's possible for
the current WAL location to be far beyond the last commit. A few
examples:

- An idle server with checkout_timeout at a lower value (so lots of
empty WAL advance).
- Very large transactions: particularly insidious because committing a
1 GB transaction after a small transaction may show almost zero time
lag even though quite a bit of data needs to be processed and sent
over the wire (so time to replay is significantly different from
current lag).
- A cluster with multiple databases complicates matters further,
because while physical replication is cluster-wide, the LSNs that
matter for logical replication are database specific.

Since we don't expose the most recent commit's LSN there's no way to
say "the WAL is currently 1250, the last commit happened at 1000, the
slot has flushed up to 800, therefore there are at most 200 bytes
replication needs to read through to catch up.

In the aforementioned thread [1] I'd proposed a patch that added a SQL
function pg_last_commit_lsn() to expose the most recent commit's LSN.
Robert Haas didn't think the initial version's modifications to
commit_ts made sense, and a subsequent approach adding the value to
PGPROC didn't have strong objections, from what I can see, but it also
didn't generate any enthusiasm.

As I was thinking about how to improve things, I realized that this
information (since it's for monitoring anyway) fits more naturally
into the stats system. I'd originally thought of exposing it in
pg_stat_wal, but that's per-cluster rather than per-database (indeed,
this is a flaw I hadn't considered in the original patch), so I think
pg_stat_database is the correct location.

I've attached a patch to track the latest commit's LSN in pg_stat_database.

Regards,
James Coleman

1: https://www.postgresql.org/message-id/flat/CAAaqYe9QBiAu+j8rBun_JKBRe-3HeKLUhfVVsYfsxQG0VqLXsA@mail.gmail.com

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
Aleksander Alekseev
Дата:
Hi,

> [...]
> As I was thinking about how to improve things, I realized that this
> information (since it's for monitoring anyway) fits more naturally
> into the stats system. I'd originally thought of exposing it in
> pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> this is a flaw I hadn't considered in the original patch), so I think
> pg_stat_database is the correct location.
>
> I've attached a patch to track the latest commit's LSN in pg_stat_database.

Thanks for the patch. It was marked as "Needs Review" so I decided to
take a brief look.

All in all the code seems to be fine but I have a couple of nitpicks:

- If you are modifying pg_stat_database the corresponding changes to
the documentation are needed.
- pg_stat_get_db_last_commit_lsn() has the same description as
pg_stat_get_db_xact_commit() which is confusing.
- pg_stat_get_db_last_commit_lsn() is marked as STABLE which is
probably correct. But I would appreciate a second opinion on this.
- Wouldn't it be appropriate to add a test or two?
- `if (!XLogRecPtrIsInvalid(commit_lsn))` - I suggest adding
XLogRecPtrIsValid() macro for better readability

-- 
Best regards,
Aleksander Alekseev



Re: Add last_commit_lsn to pg_stat_database

От
vignesh C
Дата:
On Sat, 10 Jun 2023 at 07:57, James Coleman <jtc331@gmail.com> wrote:
>
> I've previously noted in "Add last commit LSN to
> pg_last_committed_xact()" [1] that it's not possible to monitor how
> many bytes of WAL behind a logical replication slot is (computing such
> is obviously trivial for physical slots) because the slot doesn't need
> to replicate beyond the last commit. In some cases it's possible for
> the current WAL location to be far beyond the last commit. A few
> examples:
>
> - An idle server with checkout_timeout at a lower value (so lots of
> empty WAL advance).
> - Very large transactions: particularly insidious because committing a
> 1 GB transaction after a small transaction may show almost zero time
> lag even though quite a bit of data needs to be processed and sent
> over the wire (so time to replay is significantly different from
> current lag).
> - A cluster with multiple databases complicates matters further,
> because while physical replication is cluster-wide, the LSNs that
> matter for logical replication are database specific.
>
> Since we don't expose the most recent commit's LSN there's no way to
> say "the WAL is currently 1250, the last commit happened at 1000, the
> slot has flushed up to 800, therefore there are at most 200 bytes
> replication needs to read through to catch up.
>
> In the aforementioned thread [1] I'd proposed a patch that added a SQL
> function pg_last_commit_lsn() to expose the most recent commit's LSN.
> Robert Haas didn't think the initial version's modifications to
> commit_ts made sense, and a subsequent approach adding the value to
> PGPROC didn't have strong objections, from what I can see, but it also
> didn't generate any enthusiasm.
>
> As I was thinking about how to improve things, I realized that this
> information (since it's for monitoring anyway) fits more naturally
> into the stats system. I'd originally thought of exposing it in
> pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> this is a flaw I hadn't considered in the original patch), so I think
> pg_stat_database is the correct location.
>
> I've attached a patch to track the latest commit's LSN in pg_stat_database.

I have changed the status of commitfest entry to "Returned with
Feedback" as Aleksander's comments have not yet been resolved. Please
feel free to post an updated version of the patch and update the
commitfest entry accordingly.

Regards,
Vignesh



Re: Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
Hello,

Thanks for reviewing!

On Tue, Sep 19, 2023 at 8:16 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > [...]
> > As I was thinking about how to improve things, I realized that this
> > information (since it's for monitoring anyway) fits more naturally
> > into the stats system. I'd originally thought of exposing it in
> > pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> > this is a flaw I hadn't considered in the original patch), so I think
> > pg_stat_database is the correct location.
> >
> > I've attached a patch to track the latest commit's LSN in pg_stat_database.
>
> Thanks for the patch. It was marked as "Needs Review" so I decided to
> take a brief look.
>
> All in all the code seems to be fine but I have a couple of nitpicks:
>
> - If you are modifying pg_stat_database the corresponding changes to
> the documentation are needed.

Updated.

> - pg_stat_get_db_last_commit_lsn() has the same description as
> pg_stat_get_db_xact_commit() which is confusing.

I've fixed this.

> - pg_stat_get_db_last_commit_lsn() is marked as STABLE which is
> probably correct. But I would appreciate a second opinion on this.

Sounds good.

> - Wouldn't it be appropriate to add a test or two?

Added.

> - `if (!XLogRecPtrIsInvalid(commit_lsn))` - I suggest adding
> XLogRecPtrIsValid() macro for better readability

We have 36 usages of !XLogRecPtrIsInvalid(...) already, so I think we
should avoid making this change in this patch.

v2 is attached.

Regards,
James Coleman

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
On Sun, Jan 14, 2024 at 6:01 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, 10 Jun 2023 at 07:57, James Coleman <jtc331@gmail.com> wrote:
> >
> > I've previously noted in "Add last commit LSN to
> > pg_last_committed_xact()" [1] that it's not possible to monitor how
> > many bytes of WAL behind a logical replication slot is (computing such
> > is obviously trivial for physical slots) because the slot doesn't need
> > to replicate beyond the last commit. In some cases it's possible for
> > the current WAL location to be far beyond the last commit. A few
> > examples:
> >
> > - An idle server with checkout_timeout at a lower value (so lots of
> > empty WAL advance).
> > - Very large transactions: particularly insidious because committing a
> > 1 GB transaction after a small transaction may show almost zero time
> > lag even though quite a bit of data needs to be processed and sent
> > over the wire (so time to replay is significantly different from
> > current lag).
> > - A cluster with multiple databases complicates matters further,
> > because while physical replication is cluster-wide, the LSNs that
> > matter for logical replication are database specific.
> >
> > Since we don't expose the most recent commit's LSN there's no way to
> > say "the WAL is currently 1250, the last commit happened at 1000, the
> > slot has flushed up to 800, therefore there are at most 200 bytes
> > replication needs to read through to catch up.
> >
> > In the aforementioned thread [1] I'd proposed a patch that added a SQL
> > function pg_last_commit_lsn() to expose the most recent commit's LSN.
> > Robert Haas didn't think the initial version's modifications to
> > commit_ts made sense, and a subsequent approach adding the value to
> > PGPROC didn't have strong objections, from what I can see, but it also
> > didn't generate any enthusiasm.
> >
> > As I was thinking about how to improve things, I realized that this
> > information (since it's for monitoring anyway) fits more naturally
> > into the stats system. I'd originally thought of exposing it in
> > pg_stat_wal, but that's per-cluster rather than per-database (indeed,
> > this is a flaw I hadn't considered in the original patch), so I think
> > pg_stat_database is the correct location.
> >
> > I've attached a patch to track the latest commit's LSN in pg_stat_database.
>
> I have changed the status of commitfest entry to "Returned with
> Feedback" as Aleksander's comments have not yet been resolved. Please
> feel free to post an updated version of the patch and update the
> commitfest entry accordingly.

Thanks for reminding me; I'd lost track of this patch.

Regards,
James Coleman



Re: Add last_commit_lsn to pg_stat_database

От
Peter Smith
Дата:
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review", but it seems like
there was some CFbot test failure last time it was run [1]. Please
have a look and post an updated version if necessary.

======
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4355

Kind Regards,
Peter Smith.



Re: Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
On Sun, Jan 21, 2024 at 10:26 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review", but it seems like
> there was some CFbot test failure last time it was run [1]. Please
> have a look and post an updated version if necessary.
>
> ======
> [1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4355
>
> Kind Regards,
> Peter Smith.

Updated patch attached,

Thanks,
James Coleman

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
Tomas Vondra
Дата:
Hi James,

Thanks for the updated patch. I don't have a clear opinion on the
feature and whether this is the way to implement it, but I have two
simple questions.

1) Do we really need to modify RecordTransactionCommitPrepared and
XactLogCommitRecord to return the LSN of the commit record? Can't we
just use XactLastRecEnd? It's good enough for advancing
replorigin_session_origin_lsn, why wouldn't it be good enough for this?


2) Earlier in this thread it was claimed the function returning the
last_commit_lsn is STABLE, but I have to admit it's not clear to me why
that's the case :-( All the pg_stat_get_db_* functions are marked as
stable, so I guess it's thanks to the pgstat system ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add last_commit_lsn to pg_stat_database

От
Michael Paquier
Дата:
On Sun, Feb 18, 2024 at 02:28:06AM +0100, Tomas Vondra wrote:
> Thanks for the updated patch. I don't have a clear opinion on the
> feature and whether this is the way to implement it, but I have two
> simple questions.

Some users I know of would be really happy to be able to get this
information for each database in a single view, so that feels natural
to plug the information into pg_stat_database.

> 1) Do we really need to modify RecordTransactionCommitPrepared and
> XactLogCommitRecord to return the LSN of the commit record? Can't we
> just use XactLastRecEnd? It's good enough for advancing
> replorigin_session_origin_lsn, why wouldn't it be good enough for this?

Or XactLastCommitEnd?  The changes in AtEOXact_PgStat() are not really
attractive for what's a static variable in all the backends.

> 2) Earlier in this thread it was claimed the function returning the
> last_commit_lsn is STABLE, but I have to admit it's not clear to me why
> that's the case :-( All the pg_stat_get_db_* functions are marked as
> stable, so I guess it's thanks to the pgstat system ...

The consistency of the shared stats data depends on
stats_fetch_consistency.  The default of "cache" makes sure that the
values don't change across a scan, until the end of a transaction.

I have not paid much attention about that until now, but note that it
would not be the case of "none" where the data is retrieved each time
it is requested.  So it seems to me that these functions should be
actually volatile, not stable, because they could deliver different
values across the same scan as an effect of the design behind
pgstat_fetch_entry() and a non-default stats_fetch_consistency.  I may
be missing something, of course.
--
Michael

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
Tomas Vondra
Дата:
On 2/19/24 07:57, Michael Paquier wrote:
> On Sun, Feb 18, 2024 at 02:28:06AM +0100, Tomas Vondra wrote:
>> Thanks for the updated patch. I don't have a clear opinion on the
>> feature and whether this is the way to implement it, but I have two
>> simple questions.
> 
> Some users I know of would be really happy to be able to get this
> information for each database in a single view, so that feels natural
> to plug the information into pg_stat_database.
> 

OK

>> 1) Do we really need to modify RecordTransactionCommitPrepared and
>> XactLogCommitRecord to return the LSN of the commit record? Can't we
>> just use XactLastRecEnd? It's good enough for advancing
>> replorigin_session_origin_lsn, why wouldn't it be good enough for this?
> 
> Or XactLastCommitEnd?

But that's not set in RecordTransactionCommitPrepared (or twophase.c in
general), and the patch seems to need that.

> The changes in AtEOXact_PgStat() are not really
> attractive for what's a static variable in all the backends.
> 

I'm sorry, I'm not sure what "changes not being attractive" means :-(

>> 2) Earlier in this thread it was claimed the function returning the
>> last_commit_lsn is STABLE, but I have to admit it's not clear to me why
>> that's the case :-( All the pg_stat_get_db_* functions are marked as
>> stable, so I guess it's thanks to the pgstat system ...
> 
> The consistency of the shared stats data depends on
> stats_fetch_consistency.  The default of "cache" makes sure that the
> values don't change across a scan, until the end of a transaction.
> 
> I have not paid much attention about that until now, but note that it
> would not be the case of "none" where the data is retrieved each time
> it is requested.  So it seems to me that these functions should be
> actually volatile, not stable, because they could deliver different
> values across the same scan as an effect of the design behind
> pgstat_fetch_entry() and a non-default stats_fetch_consistency.  I may
> be missing something, of course.

Right. If I do this:

create or replace function get_db_lsn(oid) returns pg_lsn as $$
declare
  v_lsn pg_lsn;
begin
  select last_commit_lsn into v_lsn from pg_stat_database
   where datid = $1;
  return v_lsn;
end; $$ language plpgsql;

select min(l), max(l) from (select i, get_db_lsn(16384) AS l from
generate_series(1,100000) s(i)) foo;

and run this concurrently with a pgbench on the same database (with OID
16384), I get:

- stats_fetch_consistency=cache: always the same min/max value

- stats_fetch_consistency=none: min != max

Which would suggest you're right and this should be VOLATILE and not
STABLE. But then again - the existing pg_stat_get_db_* functions are all
marked as STABLE, so (a) is that correct and (b) why should this
function be marked different?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add last_commit_lsn to pg_stat_database

От
Michael Paquier
Дата:
On Mon, Feb 19, 2024 at 10:26:43AM +0100, Tomas Vondra wrote:
> On 2/19/24 07:57, Michael Paquier wrote:
> > On Sun, Feb 18, 2024 at 02:28:06AM +0100, Tomas Vondra wrote:
>>> 1) Do we really need to modify RecordTransactionCommitPrepared and
>>> XactLogCommitRecord to return the LSN of the commit record? Can't we
>>> just use XactLastRecEnd? It's good enough for advancing
>>> replorigin_session_origin_lsn, why wouldn't it be good enough for this?
>>
>> Or XactLastCommitEnd?
>
> But that's not set in RecordTransactionCommitPrepared (or twophase.c in
> general), and the patch seems to need that.

Hmm.  Okay.

> > The changes in AtEOXact_PgStat() are not really
> > attractive for what's a static variable in all the backends.
>
> I'm sorry, I'm not sure what "changes not being attractive" means :-(

It just means that I am not much a fan of the signature changes done
for RecordTransactionCommit() and AtEOXact_PgStat_Database(), adding a
dependency to a specify LSN.  Your suggestion to switching to
XactLastRecEnd should avoid that.

> - stats_fetch_consistency=cache: always the same min/max value
>
> - stats_fetch_consistency=none: min != max
>
> Which would suggest you're right and this should be VOLATILE and not
> STABLE. But then again - the existing pg_stat_get_db_* functions are all
> marked as STABLE, so (a) is that correct and (b) why should this
> function be marked different?

This can look like an oversight of 5891c7a8ed8f to me.  I've skimmed
through the threads related to this commit and messages around [1]
explain why this GUC exists and why we have both behaviors, but I'm
not seeing a discussion about the volatibility of these functions.
The current situation is not bad either, the default makes these
functions stable, and I'd like to assume that users of this GUC know
what they do.  Perhaps Andres or Horiguchi-san can comment on that.

https://www.postgresql.org/message-id/382908.1616343275@sss.pgh.pa.us
--
Michael

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
Heikki Linnakangas
Дата:
> I've previously noted in "Add last commit LSN to
> pg_last_committed_xact()" [1] that it's not possible to monitor how
> many bytes of WAL behind a logical replication slot is (computing such
> is obviously trivial for physical slots) because the slot doesn't need
> to replicate beyond the last commit. In some cases it's possible for
> the current WAL location to be far beyond the last commit. A few
> examples:
> 
> - An idle server with checkout_timeout at a lower value (so lots of
> empty WAL advance).
> - Very large transactions: particularly insidious because committing a
> 1 GB transaction after a small transaction may show almost zero time
> lag even though quite a bit of data needs to be processed and sent
> over the wire (so time to replay is significantly different from
> current lag).
> - A cluster with multiple databases complicates matters further,
> because while physical replication is cluster-wide, the LSNs that
> matter for logical replication are database specific.
> 
> 
> Since we don't expose the most recent commit's LSN there's no way to
> say "the WAL is currently 1250, the last commit happened at 1000, the
> slot has flushed up to 800, therefore there are at most 200 bytes
> replication needs to read through to catch up.

I'm not sure I fully understand the problem. What are you doing 
currently to measure the lag? If you look at pg_replication_slots today, 
confirmed_flush_lsn advances also when you do pg_switch_wal(), so 
looking at the diff between confirmed_flush_lsn and pg_current_wal_lsn() 
works, no?

And on the other hand, even if you expose the database's last commit 
LSN, you can have an publication that includes only a subset of tables. 
Or commits that don't write to any table at all. So I'm not sure why the 
database's last commit LSN is relevant. Getting the last LSN that did 
something that needs to be replicated through the publication might be 
useful, but that's not what what this patch does.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Add last_commit_lsn to pg_stat_database

От
Cary Huang
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hello

I think it is convenient to know the last commit LSN of a database for troubleshooting and monitoring purposes similar
tothe "pd_lsn" field in a page header that records the last LSN that modifies this page. I am not sure if it can help
determinethe WAL location to resume / catch up in logical replication as the "confirmed_flush_lsn" and "restart_lsn" in
alogical replication slot are already there to figure out the resume / catchup point as I understand. Anyhow, I had a
lookat the patch, in general it looks good to me. Also ran it against the CI bot, which also turned out fine. Just one
smallcomment though. The patch supports the recording of last commit lsn from 2 phase commit as well, but the test does
notseem to have a test on 2 phase commit. In my opinion, it should test whether the last commit lsn increments when a
preparedtransaction is committed in addition to a regular transaction.
 

thank you
-----------------
Cary Huang
www.highgo.ca

Re: Add last_commit_lsn to pg_stat_database

От
Kirill Reshke
Дата:

Hi James,

There are some review in the thread that need to be addressed.
it seems that we need to mark this entry "Waiting on Author" and move to the next CF [0].

Thanks

[0] https://commitfest.postgresql.org/47/4355/


On Sat, 10 Jun 2023 at 05:27, James Coleman <jtc331@gmail.com> wrote:
I've previously noted in "Add last commit LSN to
pg_last_committed_xact()" [1] that it's not possible to monitor how
many bytes of WAL behind a logical replication slot is (computing such
is obviously trivial for physical slots) because the slot doesn't need
to replicate beyond the last commit. In some cases it's possible for
the current WAL location to be far beyond the last commit. A few
examples:

- An idle server with checkout_timeout at a lower value (so lots of
empty WAL advance).
- Very large transactions: particularly insidious because committing a
1 GB transaction after a small transaction may show almost zero time
lag even though quite a bit of data needs to be processed and sent
over the wire (so time to replay is significantly different from
current lag).
- A cluster with multiple databases complicates matters further,
because while physical replication is cluster-wide, the LSNs that
matter for logical replication are database specific.

Since we don't expose the most recent commit's LSN there's no way to
say "the WAL is currently 1250, the last commit happened at 1000, the
slot has flushed up to 800, therefore there are at most 200 bytes
replication needs to read through to catch up.

In the aforementioned thread [1] I'd proposed a patch that added a SQL
function pg_last_commit_lsn() to expose the most recent commit's LSN.
Robert Haas didn't think the initial version's modifications to
commit_ts made sense, and a subsequent approach adding the value to
PGPROC didn't have strong objections, from what I can see, but it also
didn't generate any enthusiasm.

As I was thinking about how to improve things, I realized that this
information (since it's for monitoring anyway) fits more naturally
into the stats system. I'd originally thought of exposing it in
pg_stat_wal, but that's per-cluster rather than per-database (indeed,
this is a flaw I hadn't considered in the original patch), so I think
pg_stat_database is the correct location.

I've attached a patch to track the latest commit's LSN in pg_stat_database.

Regards,
James Coleman

1: https://www.postgresql.org/message-id/flat/CAAaqYe9QBiAu+j8rBun_JKBRe-3HeKLUhfVVsYfsxQG0VqLXsA@mail.gmail.com

Re: Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
On Thu, Mar 7, 2024 at 10:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> > I've previously noted in "Add last commit LSN to
> > pg_last_committed_xact()" [1] that it's not possible to monitor how
> > many bytes of WAL behind a logical replication slot is (computing such
> > is obviously trivial for physical slots) because the slot doesn't need
> > to replicate beyond the last commit. In some cases it's possible for
> > the current WAL location to be far beyond the last commit. A few
> > examples:
> >
> > - An idle server with checkout_timeout at a lower value (so lots of
> > empty WAL advance).
> > - Very large transactions: particularly insidious because committing a
> > 1 GB transaction after a small transaction may show almost zero time
> > lag even though quite a bit of data needs to be processed and sent
> > over the wire (so time to replay is significantly different from
> > current lag).
> > - A cluster with multiple databases complicates matters further,
> > because while physical replication is cluster-wide, the LSNs that
> > matter for logical replication are database specific.
> >
> >
> > Since we don't expose the most recent commit's LSN there's no way to
> > say "the WAL is currently 1250, the last commit happened at 1000, the
> > slot has flushed up to 800, therefore there are at most 200 bytes
> > replication needs to read through to catch up.
>
> I'm not sure I fully understand the problem. What are you doing
> currently to measure the lag? If you look at pg_replication_slots today,
> confirmed_flush_lsn advances also when you do pg_switch_wal(), so
> looking at the diff between confirmed_flush_lsn and pg_current_wal_lsn()
> works, no?

No, it's not that simple because of the "large, uncommitted
transaction" case I noted in the OP. Suppose I have a batch system,
and a job in that system has written 1 GB of data but not yet
committed (or rolled back). In this case confirmed_flush_lsn cannot
advance, correct?

And so having a "last commit LSN" allows me to know how far back the
"last possibly replicatable write"

> And on the other hand, even if you expose the database's last commit
> LSN, you can have an publication that includes only a subset of tables.
> Or commits that don't write to any table at all. So I'm not sure why the
> database's last commit LSN is relevant. Getting the last LSN that did
> something that needs to be replicated through the publication might be
> useful, but that's not what what this patch does.

I think that's fine,  because as you noted earlier the
confirmed_flush_lsn could advance beyond that point anyway (if there's
nothing to replicate), but in the case where:

1. confirmed_flush_lsn is not advancing, and
2. last_commit_lsn is not advancing, and
3. pg_current_wal_lsn() has advanced a lot

then you can probably infer that there's a large amount of data that
simply cannot be completed by the subscriber, and so there's no "real"
delay. It also gives you an idea of how much data you will need to
churn through (even if not replicated) once the transaction commits.

Certainly understanding the data here will be simplest in the case
where 1.) there's a single database and 2.) all tables are in the
replication set, but I don't think the value is limited to that
situation either.

Regards,
James Coleman



Re: Add last_commit_lsn to pg_stat_database

От
James Coleman
Дата:
On Mon, Feb 19, 2024 at 3:56 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Feb 19, 2024 at 10:26:43AM +0100, Tomas Vondra wrote:
> > On 2/19/24 07:57, Michael Paquier wrote:
> > > On Sun, Feb 18, 2024 at 02:28:06AM +0100, Tomas Vondra wrote:
> >>> 1) Do we really need to modify RecordTransactionCommitPrepared and
> >>> XactLogCommitRecord to return the LSN of the commit record? Can't we
> >>> just use XactLastRecEnd? It's good enough for advancing
> >>> replorigin_session_origin_lsn, why wouldn't it be good enough for this?
> >>
> >> Or XactLastCommitEnd?
> >
> > But that's not set in RecordTransactionCommitPrepared (or twophase.c in
> > general), and the patch seems to need that.
>
> Hmm.  Okay.
>
> > > The changes in AtEOXact_PgStat() are not really
> > > attractive for what's a static variable in all the backends.
> >
> > I'm sorry, I'm not sure what "changes not being attractive" means :-(
>
> It just means that I am not much a fan of the signature changes done
> for RecordTransactionCommit() and AtEOXact_PgStat_Database(), adding a
> dependency to a specify LSN.  Your suggestion to switching to
> XactLastRecEnd should avoid that.

Attached is an updated patch that uses XactLastCommitEnd and therefore
avoids all of those signature changes.

We can't use XactLastCommitEnd because it gets set to 0 immediately
after RecordTransactionCommit() sets XactLastCommitEnd to its value.

I added a test for two-phase commit, as well, and in so doing I
discovered something that I found curious: when I do "COMMIT PREPARED
't1'", not only does RecordTransactionCommitPrepared() get called, but
eventually RecordTransactionCommit() is called as well before the
command returns (despite the comments for those functions describing
them as two equivalents you need to change at the same time).

So it appears that we don't need to set XactLastCommitEnd in
RecordTransactionCommitPrepared() for this to work, and, indeed,
adding some logging to verify, the value of XactLastRecEnd we'd use to
update XactLastCommitEnd is the same at the end of both of those
functions during COMMIT PREPARED.

I'd like to have someone weigh in on whether relying on
RecordTransactionCommit() being called during COMMIT PREPARED is
correct or not (if perchance there are non-obvious reasons why we
shouldn't).

Regards,
James Coleman

Вложения

Re: Add last_commit_lsn to pg_stat_database

От
Kyotaro Horiguchi
Дата:
At Tue, 20 Feb 2024 07:56:28 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Mon, Feb 19, 2024 at 10:26:43AM +0100, Tomas Vondra wrote:
> It just means that I am not much a fan of the signature changes done
> for RecordTransactionCommit() and AtEOXact_PgStat_Database(), adding a
> dependency to a specify LSN.  Your suggestion to switching to
> XactLastRecEnd should avoid that.
> 
> > - stats_fetch_consistency=cache: always the same min/max value
> > 
> > - stats_fetch_consistency=none: min != max
> > 
> > Which would suggest you're right and this should be VOLATILE and not
> > STABLE. But then again - the existing pg_stat_get_db_* functions are all
> > marked as STABLE, so (a) is that correct and (b) why should this
> > function be marked different?
> 
> This can look like an oversight of 5891c7a8ed8f to me.  I've skimmed
> through the threads related to this commit and messages around [1]
> explain why this GUC exists and why we have both behaviors, but I'm
> not seeing a discussion about the volatibility of these functions.
> The current situation is not bad either, the default makes these
> functions stable, and I'd like to assume that users of this GUC know
> what they do.  Perhaps Andres or Horiguchi-san can comment on that.
> 
> https://www.postgresql.org/message-id/382908.1616343275@sss.pgh.pa.us

I agree that we cannot achieve, nor do we need, perfect MVCC behavior,
and that completely volatile behavior is unusable. I believe the
functions are kept marked as stable, as this is the nearest and most
usable volatility for the default behavior, since function volatility
is not switchable on-the-fly. This approach seems least trouble-prone
to me.

The consistency of the functions are discussed here.

https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-STATS-VIEWS

> This is a feature, not a bug, because ... Conversely, if it's known
> that statistics are only accessed once, caching accessed statistics is
> unnecessary and can be avoided by setting stats_fetch_consistency to
> none.

It seems to me that this description already implies such an
incongruity in the functions' behavior from the "stable" behavior, but
we might want to explicitly mention that incongruity.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center