Обсуждение: pg_walinspect - a new extension to get raw WAL data and WAL stats
Hi, While working on one of the internal features, we found that it is a bit difficult to run pg_waldump for a normal user to know WAL info and stats of a running postgres database instance in the cloud. Many a times users or DBAs or developers would want to get and analyze following: 1) raw WAL record associated with an LSN or raw WAL records between a start LSN and end LSN for feeding to some other functionality 2) WAL statistics associated with an LSN or between start LSN and end LSN for debugging or analytical purposes. The WAL stats are the number of inserts, updates, deletes, index inserts, commits, checkpoints, aborts, wal record sizes, FPI (Full Page Image) count etc. which are basically everything that we get with pg_waldump --stats option plus some other information as we may feel will be useful. An available option is to use pg_waldump, a standalone program emitting human readable WAL info into a standard output/file. This works well when users have access to the system on which postgres is running. But for a postgres database instance running in the cloud environments, starting the pg_waldump, fetching and presenting its output to the users in a structured way may be a bit hard to do. How about we create a new extension, called pg_walinspect (synonymous to pageinspect extension) with a bunch of SQL-callable functions that get the raw WAL records or stats out of a running postgres database instance in a more structured way that is easily consumable by all the users or DBAs or developers? We can also provide these functionalities into the core postgres (in xlogfuncs.c) instead of a new extension, but we would like it to be pluggable so that the functions will be used only if required. [1] shows a rough sketch of the functions that the new pg_walinspect extension can provide. These are not exhaustive; we can add/remove/modify as we move further. We would like to invite more thoughts from the hackers. Credits: Thanks to Satya Narlapuram, Chen Liang (for some initial work), Tianyu Zhang and Ashutosh Sharma (copied in cc) for internal discussions. [1] a) bytea pg_get_wal_record(pg_lsn lsn); and bytea pg_get_wal_record(pg_lsn lsn, text wal_dir); - Returns a single row of raw WAL record of bytea type. WAL data is read from pg_wal or specified wal_dir directory. b) bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn); and bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir); - Returns multiple rows of raw WAL records of bytea type, one row per each WAL record. WAL data is read from pg_wal or specified wal_dir directory. CREATE TYPE walinspect_stats_type AS (stat1, stat2, stat3 …. statN); c) walinspect_stats_type pg_get_wal_stats(pg_lsn lsn); and walinspect_stats_type pg_get_wal_stats(pg_lsn lsn, text wal_dir); - Returns a single row of WAL record’s stats of walinspect_stats_type type. WAL data is read from pg_wal or specified wal_dir directory. d) walinspect_stats_type[] pg_get_wal_stats(pg_lsn start_lsn, pg_lsn end_lsn); and walinspect_stats_type[] pg_get_wal_stats(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir); - Returns multiple rows of WAL record stats of walinspect_stats_type type, one row per each WAL record. WAL data is read from pg_wal or specified wal_dir directory. e) walinspect_stats_type pg_get_wal_stats(bytea wal_record); - Returns a single row of provided WAL record (wal_record) stats. f) walinspect_stats_type pg_get_wal_stats_aggr(pg_lsn start_lsn, pg_lsn end_lsn); and walinspect_stats_type pg_get_wal_stats_aggr(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir); - Returns a single row of aggregated stats of all the WAL records between start_lsn and end_lsn. WAL data is read from pg_wal or specified wal_dir directory. CREATE TYPE walinspect_lsn_range_type AS (pg_lsn start_lsn, pg_lsn end_lsn); g) walinspect_lsn_range_type walinspect_get_lsn_range(text wal_dir); - Returns a single row of start LSN and end LSN of the WAL records available under pg_wal or specified wal_dir directory. Regards, Bharath Rupireddy.
On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > How about we create a new extension, called pg_walinspect (synonymous > to pageinspect extension) with a bunch of SQL-callable functions that > get the raw WAL records or stats out of a running postgres database > instance in a more structured way that is easily consumable by all the > users or DBAs or developers? We can also provide these functionalities > into the core postgres (in xlogfuncs.c) instead of a new extension, > but we would like it to be pluggable so that the functions will be > used only if required. +1 Nathan
On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote: > +1 A backend approach has the advantage that you can use the proper locks to make sure that a segment is not recycled or removed by a concurrent checkpoint, so that would be reliable. -- Michael
Вложения
On 9/10/21 12:49 AM, Bossart, Nathan wrote: > On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: >> How about we create a new extension, called pg_walinspect (synonymous >> to pageinspect extension) with a bunch of SQL-callable functions that >> get the raw WAL records or stats out of a running postgres database >> instance in a more structured way that is easily consumable by all the >> users or DBAs or developers? We can also provide these functionalities >> into the core postgres (in xlogfuncs.c) instead of a new extension, >> but we would like it to be pluggable so that the functions will be >> used only if required. > +1 > > Nathan > +1 Bertrand
On Fri, Sep 10, 2021 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote: > > +1 > > A backend approach has the advantage that you can use the proper locks > to make sure that a segment is not recycled or removed by a concurrent > checkpoint, so that would be reliable. Thanks for sharing your thoughts. IMO, using locks for showing WAL stats isn't a good way, because these new functions may block the checkpointer from removing/recycling the WAL files. We don't want to do that. If at all, user has asked stats of an LSN/range of LSNs if it is/they are available in the pg_wal directory, we provide the info otherwise we can throw warnings/errors. This behaviour is pretty much in sycn with what pg_waldump does right now. And, some users may not need these new functions at all, so in such cases going with an extension way makes it more usable. Regards, Bharath Rupireddy.
On Wed, Sep 8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote: > Hi, > > While working on one of the internal features, we found that it is a > bit difficult to run pg_waldump for a normal user to know WAL info and > stats of a running postgres database instance in the cloud. Many a > times users or DBAs or developers would want to get and analyze > following: Uh, are we going to implement everything that is only available at the command line as an extension just for people who are using managed cloud services where the command line is not available and the cloud provider has not made that information accessible? Seems like this might lead to a lot of duplicated effort. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 10/05/21 18:07, Bruce Momjian wrote: > Uh, are we going to implement everything that is only available at the > command line as an extension just for people who are using managed cloud > services One extension that runs a curated menu of command-line tools for you and returns their stdout? Regards, -Chap
On 10/5/21 15:07, Bruce Momjian wrote: > On Wed, Sep 8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote: >> While working on one of the internal features, we found that it is a >> bit difficult to run pg_waldump for a normal user to know WAL info and >> stats of a running postgres database instance in the cloud. Many a >> times users or DBAs or developers would want to get and analyze >> following: > > Uh, are we going to implement everything that is only available at the > command line as an extension just for people who are using managed cloud > services where the command line is not available and the cloud provider > has not made that information accessible? Seems like this might lead to > a lot of duplicated effort. No. For most command line utilities, there's no reason to expose them in SQL or they already are exposed in SQL. For example, everything in pg_controldata is already available via SQL functions. Specifically exposing pg_waldump functionality in SQL could speed up finding bugs in the PG logical replication code. We found and fixed a few over this past year, but there are more lurking out there. Having the extension in core is actually the only way to avoid duplicated effort, by having shared code which the pg_dump binary and the extension both wrap or call. -Jeremy -- http://about.me/jeremy_schneider
On Tue, Oct 5, 2021 at 06:22:19PM -0400, Chapman Flack wrote: > On 10/05/21 18:07, Bruce Momjian wrote: > > Uh, are we going to implement everything that is only available at the > > command line as an extension just for people who are using managed cloud > > services > > One extension that runs a curated menu of command-line tools for you > and returns their stdout? Yes, that would make sense, and something the cloud service providers would write. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Oct 5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote: > On 10/5/21 15:07, Bruce Momjian wrote: > > On Wed, Sep 8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote: > >> While working on one of the internal features, we found that it is a > >> bit difficult to run pg_waldump for a normal user to know WAL info and > >> stats of a running postgres database instance in the cloud. Many a > >> times users or DBAs or developers would want to get and analyze > >> following: > > > > Uh, are we going to implement everything that is only available at the > > command line as an extension just for people who are using managed cloud > > services where the command line is not available and the cloud provider > > has not made that information accessible? Seems like this might lead to > > a lot of duplicated effort. > > No. For most command line utilities, there's no reason to expose them in > SQL or they already are exposed in SQL. For example, everything in > pg_controldata is already available via SQL functions. That's a good example. > Specifically exposing pg_waldump functionality in SQL could speed up > finding bugs in the PG logical replication code. We found and fixed a > few over this past year, but there are more lurking out there. Uh, why is pg_waldump more important than other command line tool information? > Having the extension in core is actually the only way to avoid > duplicated effort, by having shared code which the pg_dump binary and > the extension both wrap or call. Uh, aren't you duplicating code by having pg_waldump as a command-line tool and an extension? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 10/5/21 17:43, Bruce Momjian wrote: > On Tue, Oct 5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote: >> Specifically exposing pg_waldump functionality in SQL could speed up >> finding bugs in the PG logical replication code. We found and fixed a >> few over this past year, but there are more lurking out there. > > Uh, why is pg_waldump more important than other command line tool > information? Going down the list of all other utilities in src/bin: * pg_amcheck, pg_config, pg_controldata: already available in SQL * psql, pgbench, pg_dump: already available as client-side apps * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl, pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any possible use case outside server OS access, most of these are too low level and don't even make sense in SQL * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL, don't feel any deep interest myself Speaking selfishly, there are a few reasons I would be specifically interested in pg_waldump (the only remaining one on the list). . First, to better support existing features around logical replication and decoding. In particular, it seems inconsistent to me that all the replication management SQL functions take LSNs as arguments - and yet there's no SQL-based way to find the LSNs that you are supposed to pass into these functions. https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION Over the past few years I've been pulled in to help several large PG users who ran into these bugs, and it's very painful - because the only real remediation is to drop and recreate the replication slot, which means either re-copying all the data to the downstream system or figuring out a way to resync it. Some PG users have 3rd party tools like HVR which can do row-by-row resync (IIUC), but no matter how you slice it, we're talking about a lot of pain for people replicating large data sets between multiple systems. In most cases, the only/best option even with very large tables is to just make a fresh copy of the data - which can translate to a business outage of hours or even days. My favorite example is the SQL function pg_replication_slot_advance() - this could really help PG users find less painful solutions to broken decoding, however it's not really possible to /know/ an LSN to advance to without inspecting WAL. ISTM there's a strong use case here for a SQL interface on WAL inspection. . Second: debugging and troubleshooting logical replication and decoding bugs. I helped track down a few logical replication bugs and get fixed into code at postgresql.org this past year. (But I give credit to others who are much better at C than I am, and who did a lot more work than I did on these bugs!) Logical decoding bugs are some of the hardest to fix - because all you have is a WAL stream, but you don't know the SQL or workload patterns or PG code paths which created that WAL stream. Dumping the WAL - knowing which objects and which types of operations are involved and stats like number of changes or number of subtransactions - this helps identify which transaction and what SQL in the application triggered the failure, which can help find short-term workarounds. Businesses need those short-term workarounds so they can keep going while we work on finding and fixing bugs, which can take some time. This is another place where I think a SQL interface to WAL would be helpful to PG users. Especially the ability to filter and trace a single transaction through a large number of WAL files in the directory. . Third: statistics on WAL - especially full page writes. Giving users the full power of SQL allows much more sophisticated analysis of the WAL records. Personally, I'd probably find myself importing all the WAL stats into the DB anyway because SQL is my preferred data manipulation language. >> Having the extension in core is actually the only way to avoid >> duplicated effort, by having shared code which the pg_dump binary and >> the extension both wrap or call. > > Uh, aren't you duplicating code by having pg_waldump as a command-line > tool and an extension? Well this whole conversation is just theoretical anyway until the code is shared. :) But if Bharath is writing functions to decode WAL, then wouldn't we just have pg_waldump use these same functions in order to avoid duplicating code? Bharath - was some code already posted and I just missed it? Looking at the proposed API from the initial email, I like that there's both stats functionality and WAL record inspection functionality (similar to pg_waldump). I like that there's the ability to pull the records as raw bytea data, however I think we're also going to want an ability in v1 of the patch to decode it (similar to pageinspect heap_page_item_attrs, etc). Another feature that might be interesting down the road would be the ability to provide filtering of WAL records for security purposes. For example, allowing a user to only dump raw WAL records for one particular database, or maybe excluding WAL records that change system catalogs or the like. But I probably wouldn't start here, personally. Now then.... as Blaise Pascal said in 1657 (and as was also said by Winston Churchill, Mark Twain, etc).... "I'm sorry I wrote you such a long letter; I didn't have time to write a short one." -Jeremy -- http://about.me/jeremy_schneider
On Wed, Oct 6, 2021 at 09:56:34AM -0700, Jeremy Schneider wrote: > On 10/5/21 17:43, Bruce Momjian wrote: > > On Tue, Oct 5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote: > >> Specifically exposing pg_waldump functionality in SQL could speed up > >> finding bugs in the PG logical replication code. We found and fixed a > >> few over this past year, but there are more lurking out there. > > > > Uh, why is pg_waldump more important than other command line tool > > information? > > Going down the list of all other utilities in src/bin: > > * pg_amcheck, pg_config, pg_controldata: already available in SQL > * psql, pgbench, pg_dump: already available as client-side apps > * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl, > pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any > possible use case outside server OS access, most of these are too low > level and don't even make sense in SQL > * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL, > don't feel any deep interest myself > > Speaking selfishly, there are a few reasons I would be specifically > interested in pg_waldump (the only remaining one on the list). This is the analysis I was looking for to understand if copying the features of command-line tools in extensions was a wise direction. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 2021-Oct-06, Jeremy Schneider wrote: > Well this whole conversation is just theoretical anyway until the code > is shared. :) But if Bharath is writing functions to decode WAL, then > wouldn't we just have pg_waldump use these same functions in order to > avoid duplicating code? Actually, a lot of the code is already shared, since the rmgrdesc routines are in src/backend. Keep in mind that it was there before pg_xlogdump existed, to support WAL_DEBUG. When pg_xlogdump was added, what we did was allow that backend-only code be compilable in a frontend environment. Also, we already have xlogreader. So pg_waldump itself is mostly scaffolding to let the frontend environment get argument values to pass to backend-enabled code. The only really interesting, novel thing is the --stats mode ... and I bet you can write that with some SQL-level aggregation of the raw data, no need for any C code. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
On Wed, Oct 6, 2021 at 10:26 PM Jeremy Schneider <schneider@ardentperf.com> wrote: > > On 10/5/21 17:43, Bruce Momjian wrote: > > On Tue, Oct 5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote: > >> Specifically exposing pg_waldump functionality in SQL could speed up > >> finding bugs in the PG logical replication code. We found and fixed a > >> few over this past year, but there are more lurking out there. > > > > Uh, why is pg_waldump more important than other command line tool > > information? > > Going down the list of all other utilities in src/bin: > > * pg_amcheck, pg_config, pg_controldata: already available in SQL > * psql, pgbench, pg_dump: already available as client-side apps > * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl, > pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any > possible use case outside server OS access, most of these are too low > level and don't even make sense in SQL > * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL, > don't feel any deep interest myself > > Speaking selfishly, there are a few reasons I would be specifically > interested in pg_waldump (the only remaining one on the list). Thanks Jeremy for the analysis. > First, to better support existing features around logical replication > and decoding. > > In particular, it seems inconsistent to me that all the replication > management SQL functions take LSNs as arguments - and yet there's no > SQL-based way to find the LSNs that you are supposed to pass into these > functions. > > https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION > > Over the past few years I've been pulled in to help several large PG > users who ran into these bugs, and it's very painful - because the only > real remediation is to drop and recreate the replication slot, which > means either re-copying all the data to the downstream system or > figuring out a way to resync it. Some PG users have 3rd party tools like > HVR which can do row-by-row resync (IIUC), but no matter how you slice > it, we're talking about a lot of pain for people replicating large data > sets between multiple systems. In most cases, the only/best option even > with very large tables is to just make a fresh copy of the data - which > can translate to a business outage of hours or even days. > > My favorite example is the SQL function pg_replication_slot_advance() - > this could really help PG users find less painful solutions to broken > decoding, however it's not really possible to /know/ an LSN to advance > to without inspecting WAL. ISTM there's a strong use case here for a SQL > interface on WAL inspection. > > Second: debugging and troubleshooting logical replication and decoding bugs. > > I helped track down a few logical replication bugs and get fixed into > code at postgresql.org this past year. (But I give credit to others who > are much better at C than I am, and who did a lot more work than I did > on these bugs!) > > Logical decoding bugs are some of the hardest to fix - because all you > have is a WAL stream, but you don't know the SQL or workload patterns or > PG code paths which created that WAL stream. > > Dumping the WAL - knowing which objects and which types of operations > are involved and stats like number of changes or number of > subtransactions - this helps identify which transaction and what SQL in > the application triggered the failure, which can help find short-term > workarounds. Businesses need those short-term workarounds so they can > keep going while we work on finding and fixing bugs, which can take some > time. This is another place where I think a SQL interface to WAL would > be helpful to PG users. Especially the ability to filter and trace a > single transaction through a large number of WAL files in the directory. > > Third: statistics on WAL - especially full page writes. Giving users the > full power of SQL allows much more sophisticated analysis of the WAL > records. Personally, I'd probably find myself importing all the WAL > stats into the DB anyway because SQL is my preferred data manipulation > language. Just to add to the above points, with the new extension pg_walinspect we will have following advantages: 1) Usability - SQL callable functions will be easier to use for the users/admins/developers. 2) Access Control - we can provide better access control for the WAL data/stats. 3) Emitting the actual WAL data(as bytea structure) and stats via SQL callable functions will help to analyze and answer questions like how much WAL data is being generated in the system, what kind of WAL data it is, how many FPWs are happening and so on. Jermey has already given more realistic use cases. 4) I came across this - there's a similar capability in SQL server - https://www.mssqltips.com/sqlservertip/3076/how-to-read-the-sql-server-database-transaction-log/ > >> Having the extension in core is actually the only way to avoid > >> duplicated effort, by having shared code which the pg_dump binary and > >> the extension both wrap or call. > > > > Uh, aren't you duplicating code by having pg_waldump as a command-line > > tool and an extension? > > Well this whole conversation is just theoretical anyway until the code > is shared. :) But if Bharath is writing functions to decode WAL, then > wouldn't we just have pg_waldump use these same functions in order to > avoid duplicating code? > > Bharath - was some code already posted and I just missed it? > > Looking at the proposed API from the initial email, I like that there's > both stats functionality and WAL record inspection functionality > (similar to pg_waldump). I like that there's the ability to pull the > records as raw bytea data, however I think we're also going to want an > ability in v1 of the patch to decode it (similar to pageinspect > heap_page_item_attrs, etc). I'm yet to start working on the patch. I will be doing it soon. > Another feature that might be interesting down the road would be the > ability to provide filtering of WAL records for security purposes. For > example, allowing a user to only dump raw WAL records for one particular > database, or maybe excluding WAL records that change system catalogs or > the like. But I probably wouldn't start here, personally. +1. Regards, Bharath Rupireddy.
On Thu, Oct 7, 2021 at 10:43 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Looking at the proposed API from the initial email, I like that there's
> > both stats functionality and WAL record inspection functionality
> > (similar to pg_waldump). I like that there's the ability to pull the
> > records as raw bytea data, however I think we're also going to want an
> > ability in v1 of the patch to decode it (similar to pageinspect
> > heap_page_item_attrs, etc).
>
> I'm yet to start working on the patch. I will be doing it soon.
Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentation part now, I will be doing it a bit later.
Please feel free to review and provide your thoughts.
Regards,
Bharath Rupireddy.
> > Looking at the proposed API from the initial email, I like that there's
> > both stats functionality and WAL record inspection functionality
> > (similar to pg_waldump). I like that there's the ability to pull the
> > records as raw bytea data, however I think we're also going to want an
> > ability in v1 of the patch to decode it (similar to pageinspect
> > heap_page_item_attrs, etc).
>
> I'm yet to start working on the patch. I will be doing it soon.
Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentation part now, I will be doing it a bit later.
Please feel free to review and provide your thoughts.
Regards,
Bharath Rupireddy.
Вложения
On Thu, Nov 18, 2021 at 6:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Oct 7, 2021 at 10:43 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Looking at the proposed API from the initial email, I like that there's > > > both stats functionality and WAL record inspection functionality > > > (similar to pg_waldump). I like that there's the ability to pull the > > > records as raw bytea data, however I think we're also going to want an > > > ability in v1 of the patch to decode it (similar to pageinspect > > > heap_page_item_attrs, etc). > > > > I'm yet to start working on the patch. I will be doing it soon. > > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentation partnow, I will be doing it a bit later. > > Please feel free to review and provide your thoughts. The v1 patch set was failing to compile on Windows. Here's the v2 patch set fixing that. Regards, Bharath Rupireddy.
Вложения
On Thu, Nov 25, 2021 at 3:49 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentationpart now, I will be doing it a bit later. > > > > Please feel free to review and provide your thoughts. > > The v1 patch set was failing to compile on Windows. Here's the v2 > patch set fixing that. I forgot to specify this: the v1 patch set was failing to compile on Windows with errors shown at [1]. Thanks to Julien Rouhaud who suggested to use PGDLLIMPORT in an off-list discussion. [1] (Link target) -> pg_walinspect.obj : error LNK2001: unresolved external symbol forkNames [C:\Users\bhara\postgres\pg_walinspect.vcxproj] pg_walinspect.obj : error LNK2001: unresolved external symbol pg_comp_crc32c [C:\Users\bhara\postgres\pg_walinspect.vcxproj] pg_walinspect.obj : error LNK2001: unresolved external symbol wal_segment_size [C:\Users\bhara\postgres\pg_walinspect.vcxproj] pg_walinspect.obj : error LNK2001: unresolved external symbol RmgrTable [C:\Users\bhara\postgres\pg_walinspect.vcxproj] .\Release\pg_walinspect\pg_walinspect.dll : fatal error LNK1120: 4 unresolved externals [C:\Users\bhara\postgres\pg_walinspect.vcxproj] 5 Error(s) Regards, Bharath Rupireddy.
On Thu, Nov 25, 2021 at 5:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Nov 25, 2021 at 3:49 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentationpart now, I will be doing it a bit later. > > > > > > Please feel free to review and provide your thoughts. > > > > The v1 patch set was failing to compile on Windows. Here's the v2 > > patch set fixing that. > > I forgot to specify this: the v1 patch set was failing to compile on > Windows with errors shown at [1]. Thanks to Julien Rouhaud who > suggested to use PGDLLIMPORT in an off-list discussion. > > [1] (Link target) -> > pg_walinspect.obj : error LNK2001: unresolved external symbol > forkNames [C:\Users\bhara\postgres\pg_walinspect.vcxproj] > pg_walinspect.obj : error LNK2001: unresolved external symbol > pg_comp_crc32c [C:\Users\bhara\postgres\pg_walinspect.vcxproj] > pg_walinspect.obj : error LNK2001: unresolved external symbol > wal_segment_size [C:\Users\bhara\postgres\pg_walinspect.vcxproj] > pg_walinspect.obj : error LNK2001: unresolved external symbol > RmgrTable [C:\Users\bhara\postgres\pg_walinspect.vcxproj] > .\Release\pg_walinspect\pg_walinspect.dll : fatal error LNK1120: 4 > unresolved externals [C:\Users\bhara\postgres\pg_walinspect.vcxproj] > > 5 Error(s) Here's the v3 patch-set with fixes for the compiler warnings reported in the cf bot at https://cirrus-ci.com/task/4979131497578496?logs=gcc_warning#L506. Please review. Regards, Bharath Rupireddy.
Вложения
So I looked at this patch and I have the same basic question as Bruce. Do we really want to expose every binary tool associated with Postgres as an extension? Obviously this is tempting for cloud provider users which is not an unreasonable argument. But it does have consequences. 1) Some things like pg_waldump are running code that is not normally under user control. This could have security issues or reliability issues. On that front I'm especially concerned that pg_verify_raw_wal_record() for example would let an attacker feed arbitrary hand crafted xlog records into the parser which is not normally something a user can do. If they feed it something it's not expecting it might be easy to cause a crash and server restart. There's also a bit of concern about data retention. Generally in Postgres when rows are deleted there's very weak guarantees about the data really being wiped. We definitely don't wipe it from disk of course. And things like pageinspect could expose it long after it's been deleted. But one might imagine after pageinspect shows it's gone and/or after a vacuum full the data is actually purged. But then something like pg_walinspect would make even that insufficient. 2) There's no documentation. I'm guessing you hesitated to write documentation until the interface is settled but actually sometimes writing documentation helps expose things in the interface that look strange when you try to explain them. 3) And the interface does look a bit strange. Like what's the deal with pg_get_wal_record_info_2() ? I gather it's just a SRF version of pg_get_wal_record_info() but that's a strange name. And then what's the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be sufficient even for the specific case of a single record? And the stats functions seem a bit out of place to me. If the SRF returned the data in the right format the user should be able to do aggregate queries to generate these stats easily enough. If anything a simple SQL function to do the aggregations could be provided. Now this is starting to get into the realm of bikeshedding but... Some of the code is taken straight from pg_waldump and does things like: + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u", But that's kind of out of place for an SQL interface. It makes it hard to write queries since things like the relid, block number etc are in the string. If I wanted to use these functions I would expect to be doing something like "select all the decoded records pertaining to block n". All that said, I don't want to gatekeep based on this kind of criticism. The existing code is based on pg_waldump and if we want an extension to expose that then that's a reasonable place to start. We can work on a better format for the data later it doesn't mean we shouldn't start with something we have today. 4) This isn't really an issue with your patch at all but why on earth do we have a bitvector for WAL compression methods?! Like, what does it mean to have multiple compression methods set? That should just be a separate field with values for each type of compression surely? I suppose this raises the issue of what happens if someone fixes that. They'll now have to update pg_waldump *and* pg_walinspect. I don't think that would actually be a lot of work but it's definitely more than just one. Also, perhaps they should be in the same contrib directory so at least people won't forget there are two.
Additionally I've looked at the tests and I'm not sure but I don't think this arrangement is going to work. I don't have the time to run CLOBBER_CACHE and CLOBBER_CACHE_ALWAYS tests but I know they run *really* slowly. So the test can't just do a CHECKPOINT and then trust that the next few transactions will still be in the wal to decode later. There could have been many more timed checkpoints in between. I think the way to do it is to create either a backup label or a replication slot. Then you can inspect the lsn of the label or slot and decode all transactions between that point and the current point. I also think you should try to have a wider set of wal records in that range to test decoding records with and without full page writes, with DDL records, etc. I do like that the tests don't actually have the decoded record info in the test though. But they can do a minimal effort to check that the records they think they're testing are actually being tested. Insert into a temporary table and then run a few queries with WHERE clauses to test for a heap insert, btree insert test the right relid is present, and test that a full page write is present (if full page writes are enabled I guess). You don't need an exhaustive set of checks because you're not testing that wal logging works properly, just that the tests aren't accidentally passing because they're not finding any interesting records.
On Tue, Feb 1, 2022 at 3:10 AM Greg Stark <stark@mit.edu> wrote: > > So I looked at this patch and I have the same basic question as Bruce. Thanks a lot for the comments. > Do we really want to expose every binary tool associated with Postgres > as an extension? Obviously this is tempting for cloud provider users > which is not an unreasonable argument. But it does have consequences. Perhaps not every tool needs to be exposed, but given the advantages that pg_walinspect can provide it's a good candidate to have it as a core extension. Some of the advantages are - debugging, WAL analysis, feeding WAL stats and info to dashboards to show customers and answer their queries, RCA etc., for educational purposes - one can understand the WAL structure, stats, different record types etc. Another nice thing is getting raw WAL data out of the running server (of course all the users can't get it only the allowed ones, currently users with pg_monitor role, if required we can change it to some other predefined role). For instance, the raw WAL data can be fed to external page repair tools to apply on a raw page (one can get this from pageinspect extension). > 1) Some things like pg_waldump are running code that is not normally > under user control. This could have security issues or reliability > issues. I understand this and also I think the same concern applies to pageinspect tool which exposes getting raw page data. The pg_walinspect functions are currently accessible by the users with pg_monitor role, if required we can change this to some other predefined role. > On that front I'm especially concerned that pg_verify_raw_wal_record() > for example would let an attacker feed arbitrary hand crafted xlog > records into the parser which is not normally something a user can do. > If they feed it something it's not expecting it might be easy to cause > a crash and server restart. This function does nothing (no writes) to the server but just checking the CRC of the WAL record. If at all one can make the server crash with an input, then that should be a problem with the server code which needs to be fixed. But IMO this function doesn't have a concern as such. > There's also a bit of concern about data retention. Generally in > Postgres when rows are deleted there's very weak guarantees about the > data really being wiped. We definitely don't wipe it from disk of > course. And things like pageinspect could expose it long after it's > been deleted. But one might imagine after pageinspect shows it's gone > and/or after a vacuum full the data is actually purged. But then > something like pg_walinspect would make even that insufficient. The idea of pg_walinspect is to get the WAL info, data and stats out of a running postgres server, if the WAL isn't available, the functions would say that. > 2) There's no documentation. I'm guessing you hesitated to write > documentation until the interface is settled but actually sometimes > writing documentation helps expose things in the interface that look > strange when you try to explain them. I will send out the new patch set with documentation soon. > 3) And the interface does look a bit strange. Like what's the deal > with pg_get_wal_record_info_2() ? I gather it's just a SRF version of > pg_get_wal_record_info() but that's a strange name. And then what's > the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be > sufficient even for the specific case of a single record? I agree, pg_get_wal_record_info_2 is a poor naming. pg_get_wal_record_info_2 takes range of LSN (start and end) to give the wal info, whereas pg_get_wal_record_info just takes one LSN. Maybe I will change pg_get_wal_record_info_2 to pg_get_wal_record_info_range or pg_get_wal_records_info or someother namign is better? If the suggestion is to overload pg_get_wal_record_info one with single LSN and another with start and end LSNs, I'm okay with that too. Otherwise, I can have pg_get_wal_record_info with start and end LSN (end LSN default to NULL) and return setof record. > And the stats functions seem a bit out of place to me. If the SRF > returned the data in the right format the user should be able to do > aggregate queries to generate these stats easily enough. If anything a > simple SQL function to do the aggregations could be provided. > > Now this is starting to get into the realm of bikeshedding but... Some > of the code is taken straight from pg_waldump and does things like: > > + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u", > > But that's kind of out of place for an SQL interface. It makes it hard > to write queries since things like the relid, block number etc are in > the string. If I wanted to use these functions I would expect to be > doing something like "select all the decoded records pertaining to > block n". I will think more about this and change it in the next version of the patch set, perhaps I will add more columns to the functions. > All that said, I don't want to gatekeep based on this kind of > criticism. The existing code is based on pg_waldump and if we want an > extension to expose that then that's a reasonable place to start. We > can work on a better format for the data later it doesn't mean we > shouldn't start with something we have today. IMO, we can always extend the functions in future, once the pg_walinspect extension gets in with minimum number of much-required and basic functions. > I suppose this raises the issue of what happens if someone fixes that. > They'll now have to update pg_waldump *and* pg_walinspect. I don't > think that would actually be a lot of work but it's definitely more > than just one. Also, perhaps they should be in the same contrib > directory so at least people won't forget there are two. Currently, all the tools are placed in src/bin and extensions are in contrib directory. I don't think we ever keep the extension in src/bin or vice versa. Having said, this maybe we can add comments on having to change/fix in both pg_waldump and pg_walinspect. We also have to deal with this situation in some of the existing tools such as pg_controldata. Regards, Bharath Rupireddy.
On Mon, Jan 31, 2022 at 4:40 PM Greg Stark <stark@mit.edu> wrote: > So I looked at this patch and I have the same basic question as Bruce. > Do we really want to expose every binary tool associated with Postgres > as an extension? Obviously this is tempting for cloud provider users > which is not an unreasonable argument. But it does have consequences. > > 1) Some things like pg_waldump are running code that is not normally > under user control. This could have security issues or reliability > issues. For what it's worth, I am generally in favor of having something like this in PostgreSQL. I think it's wrong of us to continue assuming that everyone has command-line access. Even when that's true, it's not necessarily convenient. If you choose to use a relational database, you may be the sort of person who likes SQL. And if you are, you may want to have the database tell you what's going on via SQL rather than command-line tools or operating system utilities. Imagine if we didn't have pg_stat_activity and you had to get that information by running a separate binary. Would anyone like that? Why is this case any different? A few years ago we exposed data from pg_control via SQL and similar concerns were raised - but it turns out to be pretty useful. I don't know why this shouldn't be equally useful. Sure, there's some duplication in functionality, but it's not a huge maintenance burden for the project, and people (including me) like having it available. I think the same things will be true here. If decoding WAL causes security problems, that's something we better fix, because WAL is constantly decoded on standbys and via logical decoding on systems all over the place. I agree that we can't let users supply their own hand-crafted WAL records to be decoded without causing more trouble than we can handle, but if it's not safe to decode the WAL the system generated than we are in a lot of trouble already. I hasten to say that I'm not endorsing every detail or indeed any detail of the proposed patch, and some of the concerns you mention later sound well-founded to me. But I disagree with the idea that we shouldn't have both a command-line utility that roots through files on disk and an SQL interface that works with a running system. -- Robert Haas EDB: http://www.enterprisedb.com
On Sun, Feb 6, 2022 at 9:15 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jan 31, 2022 at 4:40 PM Greg Stark <stark@mit.edu> wrote: > > So I looked at this patch and I have the same basic question as Bruce. > > Do we really want to expose every binary tool associated with Postgres > > as an extension? Obviously this is tempting for cloud provider users > > which is not an unreasonable argument. But it does have consequences. > > > > 1) Some things like pg_waldump are running code that is not normally > > under user control. This could have security issues or reliability > > issues. > > For what it's worth, I am generally in favor of having something like > this in PostgreSQL. I think it's wrong of us to continue assuming that > everyone has command-line access. Even when that's true, it's not > necessarily convenient. If you choose to use a relational database, > you may be the sort of person who likes SQL. And if you are, you may > want to have the database tell you what's going on via SQL rather than > command-line tools or operating system utilities. Imagine if we didn't > have pg_stat_activity and you had to get that information by running a > separate binary. Would anyone like that? Why is this case any > different? > > A few years ago we exposed data from pg_control via SQL and similar > concerns were raised - but it turns out to be pretty useful. I don't > know why this shouldn't be equally useful. Sure, there's some > duplication in functionality, but it's not a huge maintenance burden > for the project, and people (including me) like having it available. I > think the same things will be true here. > > If decoding WAL causes security problems, that's something we better > fix, because WAL is constantly decoded on standbys and via logical > decoding on systems all over the place. I agree that we can't let > users supply their own hand-crafted WAL records to be decoded without > causing more trouble than we can handle, but if it's not safe to > decode the WAL the system generated than we are in a lot of trouble > already. > > I hasten to say that I'm not endorsing every detail or indeed any > detail of the proposed patch, and some of the concerns you mention > later sound well-founded to me. But I disagree with the idea that we > shouldn't have both a command-line utility that roots through files on > disk and an SQL interface that works with a running system. Thanks Robert for your comments. > + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u", > > But that's kind of out of place for an SQL interface. It makes it hard > to write queries since things like the relid, block number etc are in > the string. If I wanted to use these functions I would expect to be > doing something like "select all the decoded records pertaining to > block n". Thanks Greg for your review of the patches. Since there can be multiple blkref for WAL record type HEAP2 (for multi inserts basically) [1], I couldn't find a better way to break it and represent it as a non-text column. IMO this is simpler and users can easily find out answers to "how many WAL records my relation generated between lsn1 and lsn2 or how many WAL records of type Heap exist and so on?", see [2]. I've also added a test case to just show this in 0002 patch. Here's the v4 patch set that has the following changes along with Greg's review comments addressed: 1) Added documentation as 0003 patch. 2) Removed CHECKPOINT commands from tests as it is unnecessary. 3) Added input validation code and tests. 4) A few more comments have been added. 5) Currently, only superusers can create the extension, but users with the pg_monitor role can use the functions. 6) Test cases are basic yet they cover all the functions, error cases with input validations, I don't think we need to add many more test cases as suggested upthread, but I'm open to add a few more if I miss any use-case. Please review the v4 patch set further and let me know your thoughts. [1] rmgr: Heap2 len (rec/tot): 64/ 8256, tx: 0, lsn: 0/014A9070, prev 0/014A8FF8, desc: VISIBLE cutoff xid 709 flags 0x01, blkref #0: rel 1663/12757/16384 fork vm blk 0 FPW, blkref #1: rel 1663/12757/16384 blk 0 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 0/014AB0C8, prev 0/014A9070, desc: VISIBLE cutoff xid 709 flags 0x01, blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel 1663/12757/16384 blk 1 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 0/014AB108, prev 0/014AB0C8, desc: VISIBLE cutoff xid 709 flags 0x01, blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel 1663/12757/16384 blk 2 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 0/014AB148, prev 0/014AB108, desc: VISIBLE cutoff xid 709 flags 0x01, blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel 1663/12757/16384 blk 3 rmgr: Heap2 len (rec/tot): 59/ 59, tx: 0, lsn: 0/014AB188, prev 0/014AB148, desc: VISIBLE cutoff xid 709 flags 0x01, blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel 1663/12757/16384 blk 4 [2] postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98', '0/0157A160') where block_ref like '%16384%' and rmgr like 'Heap'; count ------- 10100 (1 row) postgres=# select count(*) from t1; count ------- 10100 (1 row) postgres=# postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98', '0/0157A160') where block_ref like '%FPW%'; count ------- 78 (1 row) postgres=# Regards, Bharath Rupireddy.
Вложения
On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > For what it's worth, I am generally in favor of having something like > this in PostgreSQL. I think it's wrong of us to continue assuming that > everyone has command-line access. Even when that's true, it's not > necessarily convenient. If you choose to use a relational database, > you may be the sort of person who likes SQL. And if you are, you may > want to have the database tell you what's going on via SQL rather than > command-line tools or operating system utilities. Imagine if we didn't > have pg_stat_activity and you had to get that information by running a > separate binary. Would anyone like that? Why is this case any > different? +1. An SQL interface is significantly easier to work with. Especially because it can use the built-in LSN type, pg_lsn. I don't find the slippery slope argument convincing. There aren't that many other things that are like pg_waldump, but haven't already been exposed via an SQL interface. Offhand, I can't think of any. -- Peter Geoghegan
On 2/6/22 10:45, Robert Haas wrote: > For what it's worth, I am generally in favor of having something like > this in PostgreSQL. I think it's wrong of us to continue assuming that > everyone has command-line access. Even when that's true, it's not > necessarily convenient. If you choose to use a relational database, > you may be the sort of person who likes SQL. Almost completely off topic, but this reminded me of an incident about 30 years ago at my first gig as an SA/DBA. There was an application programmer who insisted on loading a set of values from a text file into a temp table (it was Ingres, anyone remember that?). Why? Because he knew how to write "Select * from mytable order by mycol" but didn't know how to drive the Unix sort utility at the command line. When I was unable to restrain myself from smiling at this he got very angry and yelled at me loudly. So, yes, some people do like SQL and hate the command line. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Feb 10, 2022 at 9:55 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > For what it's worth, I am generally in favor of having something like > > this in PostgreSQL. I think it's wrong of us to continue assuming that > > everyone has command-line access. Even when that's true, it's not > > necessarily convenient. If you choose to use a relational database, > > you may be the sort of person who likes SQL. And if you are, you may > > want to have the database tell you what's going on via SQL rather than > > command-line tools or operating system utilities. Imagine if we didn't > > have pg_stat_activity and you had to get that information by running a > > separate binary. Would anyone like that? Why is this case any > > different? > > +1. An SQL interface is significantly easier to work with. Especially > because it can use the built-in LSN type, pg_lsn. > > I don't find the slippery slope argument convincing. There aren't that > many other things that are like pg_waldump, but haven't already been > exposed via an SQL interface. Offhand, I can't think of any. On Sat, Feb 12, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > Almost completely off topic, but this reminded me of an incident about > 30 years ago at my first gig as an SA/DBA. There was an application > programmer who insisted on loading a set of values from a text file into > a temp table (it was Ingres, anyone remember that?). Why? Because he > knew how to write "Select * from mytable order by mycol" but didn't know > how to drive the Unix sort utility at the command line. When I was > unable to restrain myself from smiling at this he got very angry and > yelled at me loudly. > > So, yes, some people do like SQL and hate the command line. Thanks a lot for the comments. I'm looking forward to the review of the latest v4 patches posted at [1]. [1] https://www.postgresql.org/message-id/CALj2ACUS9%2B54QGPtUjk76dcYW-AMKp3hPe-U%2BpQo2-GpE4kjtA%40mail.gmail.com Regards, Bharath Rupireddy.
Here are few comments: +/* + * Verify the authenticity of the given raw WAL record. + */ +Datum +pg_verify_raw_wal_record(PG_FUNCTION_ARGS) +{ Do we really need this function? I see that whenever the record is read, we verify it. So could there be a scenario where any of these functions would return an invalid WAL record? -- Should we add a function that returns the pointer to the first and probably the last WAL record in the WAL segment? This would help users to inspect the wal records in the entire wal segment if they wish to do so. -- +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); I think we should allow all these functions to be executed in wait and *nowait* mode. If a user specifies nowait mode, the function should return if no WAL data is present, rather than waiting for new WAL data to become available, default behaviour could be anything you like. -- +Datum +pg_get_wal_records_info(PG_FUNCTION_ARGS) +{ +#define PG_GET_WAL_RECORDS_INFO_COLS 10 We could probably have another variant of this function that would work even if the end pointer is not specified, in which case the default end pointer would be the last WAL record in the WAL segment. Currently it mandates the use of an end pointer which slightly reduces flexibility. -- + +/* + * Get the first valid raw WAL record lsn. + */ +Datum +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) I think this function should return a pointer to the nearest valid WAL record which can be the previous WAL record to the LSN entered by the user or the next WAL record. If a user unknowingly enters an lsn that does not exist then in such cases we should probably return the lsn of the previous WAL record instead of hanging or waiting for the new WAL record to arrive. -- Another important point I would like to mention here is - have we made an attempt to ensure that we try to share as much of code with pg_waldump as possible so that if any changes happens in the pg_waldump in future it gets applied here as well and additionally it will also reduce the code duplication. I haven't yet looked into the code in detail. I will have a look at it asap. thanks. -- With Regards, Ashutosh Sharma. On Sat, Feb 12, 2022 at 5:03 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Feb 10, 2022 at 9:55 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > For what it's worth, I am generally in favor of having something like > > > this in PostgreSQL. I think it's wrong of us to continue assuming that > > > everyone has command-line access. Even when that's true, it's not > > > necessarily convenient. If you choose to use a relational database, > > > you may be the sort of person who likes SQL. And if you are, you may > > > want to have the database tell you what's going on via SQL rather than > > > command-line tools or operating system utilities. Imagine if we didn't > > > have pg_stat_activity and you had to get that information by running a > > > separate binary. Would anyone like that? Why is this case any > > > different? > > > > +1. An SQL interface is significantly easier to work with. Especially > > because it can use the built-in LSN type, pg_lsn. > > > > I don't find the slippery slope argument convincing. There aren't that > > many other things that are like pg_waldump, but haven't already been > > exposed via an SQL interface. Offhand, I can't think of any. > > On Sat, Feb 12, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > > > Almost completely off topic, but this reminded me of an incident about > > 30 years ago at my first gig as an SA/DBA. There was an application > > programmer who insisted on loading a set of values from a text file into > > a temp table (it was Ingres, anyone remember that?). Why? Because he > > knew how to write "Select * from mytable order by mycol" but didn't know > > how to drive the Unix sort utility at the command line. When I was > > unable to restrain myself from smiling at this he got very angry and > > yelled at me loudly. > > > > So, yes, some people do like SQL and hate the command line. > > Thanks a lot for the comments. I'm looking forward to the review of > the latest v4 patches posted at [1]. > > [1] https://www.postgresql.org/message-id/CALj2ACUS9%2B54QGPtUjk76dcYW-AMKp3hPe-U%2BpQo2-GpE4kjtA%40mail.gmail.com > > Regards, > Bharath Rupireddy.
On Mon, Feb 14, 2022 at 8:32 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Here are few comments: Thanks for reviewing the patches. > +/* > + * Verify the authenticity of the given raw WAL record. > + */ > +Datum > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS) > +{ > > > Do we really need this function? I see that whenever the record is > read, we verify it. So could there be a scenario where any of these > functions would return an invalid WAL record? Yes, this function can be useful. Imagine a case where raw WAL records are fetched from one server using pg_get_wal_record_info and sent over the network to another server (for fixing some of the corrupted data pages or for whatever reasons), using pg_verify_raw_wal_record one can verify authenticity. > Should we add a function that returns the pointer to the first and > probably the last WAL record in the WAL segment? This would help users > to inspect the wal records in the entire wal segment if they wish to > do so. Good point. One can do this already with pg_get_wal_records_info and pg_walfile_name_offset. Usually, the LSN format itself can give an idea about the WAL file it is in. postgres=# select lsn, pg_walfile_name_offset(lsn) from pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc limit 1; lsn | pg_walfile_name_offset -----------+------------------------------- 0/5000038 | (000000010000000000000005,56) (1 row) postgres=# select lsn, pg_walfile_name_offset(lsn) from pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc limit 1; lsn | pg_walfile_name_offset -----------+------------------------------------- 0/5FFFFC0 | (000000010000000000000005,16777152) (1 row) Having said that, we can always add a function or a view (with the above sort of queries) to pg_walinspect - given an LSN can give the valid start record in that wal file (by following previous lsn links) and valid end record lsn. IMO, that's not required now, maybe later once the initial version of pg_walinspect gets committed, as we already have a way to achieve what we wanted here. > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > I think we should allow all these functions to be executed in wait and > *nowait* mode. If a user specifies nowait mode, the function should > return if no WAL data is present, rather than waiting for new WAL data > to become available, default behaviour could be anything you like. Currently, pg_walinspect uses read_local_xlog_page which waits in the while(1) loop if a future LSN is specified. As read_local_xlog_page is an implementation of XLogPageReadCB, which doesn't have a wait/nowait parameter, if we really need a wait/nowait mode behaviour, we need to do extra things(either add a backend-level global wait variable, set before XLogReadRecord, if set, read_local_xlog_page can just exit without waiting and reset after the XLogReadRecord or add an extra bool wait variable to XLogReaderState and use it in read_local_xlog_page). Another problem with the wait mode is - wait until when? Because we don't want to wait forever by specifying a really really future LSN, maybe you could think of adding a timeout (if the future LSN hasn't generated the given timeout, then just return). As I said upthread, I think all of these functions can be parked to future pg_walinspect versions once it gets committed with most-useful functions as proposed in the v4 patch set. > +Datum > +pg_get_wal_records_info(PG_FUNCTION_ARGS) > +{ > +#define PG_GET_WAL_RECORDS_INFO_COLS 10 > > > We could probably have another variant of this function that would > work even if the end pointer is not specified, in which case the > default end pointer would be the last WAL record in the WAL segment. > Currently it mandates the use of an end pointer which slightly reduces > flexibility. Last WAL record in the WAL segment may not be of much use(one can figure out the last valid WAL record in a wal file as mentioned above), but the WAL records info till the latest current flush LSN of the server would be a useful functionality. But that too, can be found using something like "select lsn, prev_lsn, resource_manager from pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());" > + > +/* > + * Get the first valid raw WAL record lsn. > + */ > +Datum > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) > > > I think this function should return a pointer to the nearest valid WAL > record which can be the previous WAL record to the LSN entered by the > user or the next WAL record. If a user unknowingly enters an lsn that > does not exist then in such cases we should probably return the lsn of > the previous WAL record instead of hanging or waiting for the new WAL > record to arrive. Is it useful? If there's a strong reason, how about naming pg_get_next_valid_wal_record_lsn returning the next valid wal record LSN and pg_get_previous_valid_wal_record_lsn returning the previous valid wal record LSN ? If you think having two functions is too much, then, how about pg_get_first_valid_wal_record_lsn returning both the next valid wal record LSN and its previous wal record LSN? > Another important point I would like to mention here is - have we made > an attempt to ensure that we try to share as much of code with > pg_waldump as possible so that if any changes happens in the > pg_waldump in future it gets applied here as well and additionally it > will also reduce the code duplication. I tried, please have a look at the patch. Also, I added a note at the beginning of pg_walinspect and pg_waldump to consider fixing issues/changing the code in both the places also. > I haven't yet looked into the code in detail. I will have a look at it > asap. thanks. That will be great. Regards, Bharath Rupireddy.
On Tue, Feb 15, 2022 at 2:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > +/* > > + * Verify the authenticity of the given raw WAL record. > > + */ > > +Datum > > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS) > > +{ > > > > > > Do we really need this function? I see that whenever the record is > > read, we verify it. So could there be a scenario where any of these > > functions would return an invalid WAL record? > > Yes, this function can be useful. Imagine a case where raw WAL records > are fetched from one server using pg_get_wal_record_info and sent over > the network to another server (for fixing some of the corrupted data > pages or for whatever reasons), using pg_verify_raw_wal_record one can > verify authenticity. As I also said before, and so did Greg, I think giving the user a way to supply WAL records that we will then try to decode is never going to be OK. It's going to be a recipe for security bugs and crash bugs, and there's no compelling use case for it that I can see. I support this patch set only to the extent that it decodes locally generated WAL read directly from the WAL stream. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Feb 16, 2022 at 1:57 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Feb 15, 2022 at 2:31 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > +/* > > > + * Verify the authenticity of the given raw WAL record. > > > + */ > > > +Datum > > > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS) > > > +{ > > > > > > > > > Do we really need this function? I see that whenever the record is > > > read, we verify it. So could there be a scenario where any of these > > > functions would return an invalid WAL record? > > > > Yes, this function can be useful. Imagine a case where raw WAL records > > are fetched from one server using pg_get_wal_record_info and sent over > > the network to another server (for fixing some of the corrupted data > > pages or for whatever reasons), using pg_verify_raw_wal_record one can > > verify authenticity. > > As I also said before, and so did Greg, I think giving the user a way > to supply WAL records that we will then try to decode is never going > to be OK. It's going to be a recipe for security bugs and crash bugs, > and there's no compelling use case for it that I can see. I support > this patch set only to the extent that it decodes locally generated > WAL read directly from the WAL stream. Agreed, I will remove pg_verify_raw_wal_record function in the next version of the patch set. Thanks. Regards, Bharath Rupireddy.
On Wed, Feb 16, 2022 at 1:01 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Feb 14, 2022 at 8:32 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > Here are few comments: > > Thanks for reviewing the patches. > > > +/* > > + * Verify the authenticity of the given raw WAL record. > > + */ > > +Datum > > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS) > > +{ > > > > > > Do we really need this function? I see that whenever the record is > > read, we verify it. So could there be a scenario where any of these > > functions would return an invalid WAL record? > > Yes, this function can be useful. Imagine a case where raw WAL records > are fetched from one server using pg_get_wal_record_info and sent over > the network to another server (for fixing some of the corrupted data > pages or for whatever reasons), using pg_verify_raw_wal_record one can > verify authenticity. > I don't think that's the use case of this patch. Unless there is some other valid reason, I would suggest you remove it. > > Should we add a function that returns the pointer to the first and > > probably the last WAL record in the WAL segment? This would help users > > to inspect the wal records in the entire wal segment if they wish to > > do so. > > Good point. One can do this already with pg_get_wal_records_info and > pg_walfile_name_offset. Usually, the LSN format itself can give an > idea about the WAL file it is in. > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc > limit 1; > lsn | pg_walfile_name_offset > -----------+------------------------------- > 0/5000038 | (000000010000000000000005,56) > (1 row) > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc > limit 1; > lsn | pg_walfile_name_offset > -----------+------------------------------------- > 0/5FFFFC0 | (000000010000000000000005,16777152) > (1 row) > The workaround you are suggesting is not very user friendly and FYKI pg_wal_records_info simply hangs at times when we specify the higher and lower limit of lsn in a wal file. To make things easier for the end users I would suggest we add a function that can return a valid first and last lsn in a walfile. The output of this function can be used to inspect the wal records in the entire wal file if they wish to do so and I am sure they will. So, it should be something like this: select first_valid_lsn, last_valid_lsn from pg_get_first_last_valid_wal_record('wal-segment-name'); And above function can directly be used with pg_get_wal_records_info() like select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment')); I think this is a pretty basic ASK that we expect to be present in the module like this. > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > I think we should allow all these functions to be executed in wait and > > *nowait* mode. If a user specifies nowait mode, the function should > > return if no WAL data is present, rather than waiting for new WAL data > > to become available, default behaviour could be anything you like. > > Currently, pg_walinspect uses read_local_xlog_page which waits in the > while(1) loop if a future LSN is specified. As read_local_xlog_page is > an implementation of XLogPageReadCB, which doesn't have a wait/nowait > parameter, if we really need a wait/nowait mode behaviour, we need to > do extra things(either add a backend-level global wait variable, set > before XLogReadRecord, if set, read_local_xlog_page can just exit > without waiting and reset after the XLogReadRecord or add an extra > bool wait variable to XLogReaderState and use it in > read_local_xlog_page). > I am not asking to do any changes in the backend code. Please check - how pg_waldump does this when a user requests to stop once the endptr has reached. If not for all functions at least for a few functions we can do this if it is doable. > > > +Datum > > +pg_get_wal_records_info(PG_FUNCTION_ARGS) > > +{ > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10 > > > > > > We could probably have another variant of this function that would > > work even if the end pointer is not specified, in which case the > > default end pointer would be the last WAL record in the WAL segment. > > Currently it mandates the use of an end pointer which slightly reduces > > flexibility. > > Last WAL record in the WAL segment may not be of much use(one can > figure out the last valid WAL record in a wal file as mentioned > above), but the WAL records info till the latest current flush LSN of > the server would be a useful functionality. But that too, can be found > using something like "select lsn, prev_lsn, resource_manager from > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());" > What if a user wants to inspect all the valid wal records from a startptr (startlsn) and he doesn't know the endptr? Why should he/she be mandated to get the endptr and supply it to this function? I don't think we should force users to do that. I think this is again a very basic ASK that can be done in this version itself. It is not at all any advanced thing that we can think of doing in the future. > > + > > +/* > > + * Get the first valid raw WAL record lsn. > > + */ > > +Datum > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) > > > > > > I think this function should return a pointer to the nearest valid WAL > > record which can be the previous WAL record to the LSN entered by the > > user or the next WAL record. If a user unknowingly enters an lsn that > > does not exist then in such cases we should probably return the lsn of > > the previous WAL record instead of hanging or waiting for the new WAL > > record to arrive. > > Is it useful? It is useful in the same way as returning the next valid wal pointer is. Why should a user wait for the next valid wal pointer to be available instead the function should identify the previous valid wal record and return it and put an appropriate message to the user. If there's a strong reason, how about naming > pg_get_next_valid_wal_record_lsn returning the next valid wal record > LSN and pg_get_previous_valid_wal_record_lsn returning the previous > valid wal record LSN ? If you think having two functions is too much, > then, how about pg_get_first_valid_wal_record_lsn returning both the > next valid wal record LSN and its previous wal record LSN? > The latter one looks better. -- With Regards, Ashutosh Sharma.
On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > I don't think that's the use case of this patch. Unless there is some > other valid reason, I would suggest you remove it. Removed the function pg_verify_raw_wal_record. Robert and Greg also voted for removal upthread. > > > Should we add a function that returns the pointer to the first and > > > probably the last WAL record in the WAL segment? This would help users > > > to inspect the wal records in the entire wal segment if they wish to > > > do so. > > > > Good point. One can do this already with pg_get_wal_records_info and > > pg_walfile_name_offset. Usually, the LSN format itself can give an > > idea about the WAL file it is in. > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc > > limit 1; > > lsn | pg_walfile_name_offset > > -----------+------------------------------- > > 0/5000038 | (000000010000000000000005,56) > > (1 row) > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc > > limit 1; > > lsn | pg_walfile_name_offset > > -----------+------------------------------------- > > 0/5FFFFC0 | (000000010000000000000005,16777152) > > (1 row) > > > > The workaround you are suggesting is not very user friendly and FYKI > pg_wal_records_info simply hangs at times when we specify the higher > and lower limit of lsn in a wal file. > > To make things easier for the end users I would suggest we add a > function that can return a valid first and last lsn in a walfile. The > output of this function can be used to inspect the wal records in the > entire wal file if they wish to do so and I am sure they will. So, it > should be something like this: > > select first_valid_lsn, last_valid_lsn from > pg_get_first_last_valid_wal_record('wal-segment-name'); > > And above function can directly be used with pg_get_wal_records_info() like > > select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment')); > > I think this is a pretty basic ASK that we expect to be present in the > module like this. Added a new function that returns the first and last valid WAL record LSN of a given WAL file. > > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > > > I think we should allow all these functions to be executed in wait and > > > *nowait* mode. If a user specifies nowait mode, the function should > > > return if no WAL data is present, rather than waiting for new WAL data > > > to become available, default behaviour could be anything you like. > > > > Currently, pg_walinspect uses read_local_xlog_page which waits in the > > while(1) loop if a future LSN is specified. As read_local_xlog_page is > > an implementation of XLogPageReadCB, which doesn't have a wait/nowait > > parameter, if we really need a wait/nowait mode behaviour, we need to > > do extra things(either add a backend-level global wait variable, set > > before XLogReadRecord, if set, read_local_xlog_page can just exit > > without waiting and reset after the XLogReadRecord or add an extra > > bool wait variable to XLogReaderState and use it in > > read_local_xlog_page). > > > > I am not asking to do any changes in the backend code. Please check - > how pg_waldump does this when a user requests to stop once the endptr > has reached. If not for all functions at least for a few functions we > can do this if it is doable. I've added a new function read_local_xlog_page_2 (similar to read_local_xlog_page but works in wait and no wait mode) and the callers can specify whether to wait or not wait using private_data. Actually, I wanted to use the private_data structure of read_local_xlog_page but the logical decoding already has context as private_data, that is why I had to have a new function. I know it creates a bit of duplicate code, but its cleaner than using backend-local variables or additional flags in XLogReaderState or adding wait/no-wait boolean to page_read callback. Any other suggestions are welcome here. With this, I'm able to have wait/no wait versions for any functions. But for now, I'm having wait/no wait for two functions (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more sense. > > > +Datum > > > +pg_get_wal_records_info(PG_FUNCTION_ARGS) > > > +{ > > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10 > > > > > > > > > We could probably have another variant of this function that would > > > work even if the end pointer is not specified, in which case the > > > default end pointer would be the last WAL record in the WAL segment. > > > Currently it mandates the use of an end pointer which slightly reduces > > > flexibility. > > > > Last WAL record in the WAL segment may not be of much use(one can > > figure out the last valid WAL record in a wal file as mentioned > > above), but the WAL records info till the latest current flush LSN of > > the server would be a useful functionality. But that too, can be found > > using something like "select lsn, prev_lsn, resource_manager from > > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());" > > > > What if a user wants to inspect all the valid wal records from a > startptr (startlsn) and he doesn't know the endptr? Why should he/she > be mandated to get the endptr and supply it to this function? I don't > think we should force users to do that. I think this is again a very > basic ASK that can be done in this version itself. It is not at all > any advanced thing that we can think of doing in the future. Agreed. Added new functions that emits wal records info/stats till the end of the WAL at the moment. > > > + > > > +/* > > > + * Get the first valid raw WAL record lsn. > > > + */ > > > +Datum > > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) > > > > > > > > > I think this function should return a pointer to the nearest valid WAL > > > record which can be the previous WAL record to the LSN entered by the > > > user or the next WAL record. If a user unknowingly enters an lsn that > > > does not exist then in such cases we should probably return the lsn of > > > the previous WAL record instead of hanging or waiting for the new WAL > > > record to arrive. > > > > Is it useful? > > It is useful in the same way as returning the next valid wal pointer > is. Why should a user wait for the next valid wal pointer to be > available instead the function should identify the previous valid wal > record and return it and put an appropriate message to the user. > > If there's a strong reason, how about naming > > pg_get_next_valid_wal_record_lsn returning the next valid wal record > > LSN and pg_get_previous_valid_wal_record_lsn returning the previous > > valid wal record LSN ? If you think having two functions is too much, > > then, how about pg_get_first_valid_wal_record_lsn returning both the > > next valid wal record LSN and its previous wal record LSN? > > > > The latter one looks better. Modified. Attaching v5 patch set, please review it further. Regards, Bharath Rupireddy.
Вложения
Some review comments on v5 patch (v5-0001-pg_walinspect.patch) +-- +-- pg_get_wal_records_info() +-- +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, + IN end_lsn pg_lsn, + IN wait_for_wal boolean DEFAULT false, + OUT lsn pg_lsn, What does the wait_for_wal flag mean here when one has already specified the start and end lsn? AFAIU, If a user has specified a start and stop LSN, it means that the user knows the extent to which he/she wants to display the WAL records in which case we need to stop once the end lsn has reached . So what is the meaning of wait_for_wal flag? Does it look sensible to have the wait_for_wal flag here? To me it doesn't. == +-- +-- pg_get_wal_records_info_till_end_of_wal() +-- +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, + OUT lsn pg_lsn, + OUT prev_lsn pg_lsn, + OUT xid xid, Why is this function required? Is pg_get_wal_records_info() alone not enough? I think it is. See if we can make end_lsn optional in pg_get_wal_records_info() and lets just have it alone. I think it can do the job of pg_get_wal_records_info_till_end_of_wal function. == +-- +-- pg_get_wal_stats_till_end_of_wal() +-- +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, + OUT resource_manager text, + OUT count int8, Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? == + if (loc <= read_upto) + break; + + /* Let's not wait for WAL to be available if indicated */ + if (loc > read_upto && + state->private_data != NULL) + { Why loc > read_upto? The first if condition is (loc <= read_upto) followed by the second if condition - (loc > read_upto). Is the first if condition (loc <= read_upto) not enough to indicate that loc > read_upto? == +#define IsEndOfWALReached(state) \ + (state->private_data != NULL && \ + (((ReadLocalXLOGPage2Private *) xlogreader->private_data)->no_wait == true) && \ + (((ReadLocalXLOGPage2Private *) xlogreader->private_data)->reached_end_of_wal == true)) I think we should either use state or xlogreader. First line says state->private_data and second line xlogreader->private_data. == + (((ReadLocalXLOGPage2Private *) xlogreader->private_data)->reached_end_of_wal == true)) + There is a new patch coming to make the end of WAL messages less scary. It introduces the EOW flag in xlogreaderstate maybe we can use that instead of introducing new flags in private area to represent the end of WAL. == +/* + * XLogReaderRoutine->page_read callback for reading local xlog files + * + * This function is same as read_local_xlog_page except that it works in both + * wait and no wait mode. The callers can specify about waiting in private_data + * of XLogReaderState. + */ +int +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, + int reqLen, XLogRecPtr targetRecPtr, char *cur_page) +{ + XLogRecPtr read_upto, Do we really need this function? Can't we make use of an existing WAL reader function - read_local_xlog_page()? -- With Regards, Ashutosh Sharma. On Fri, Feb 25, 2022 at 4:33 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > I don't think that's the use case of this patch. Unless there is some > > other valid reason, I would suggest you remove it. > > Removed the function pg_verify_raw_wal_record. Robert and Greg also > voted for removal upthread. > > > > > Should we add a function that returns the pointer to the first and > > > > probably the last WAL record in the WAL segment? This would help users > > > > to inspect the wal records in the entire wal segment if they wish to > > > > do so. > > > > > > Good point. One can do this already with pg_get_wal_records_info and > > > pg_walfile_name_offset. Usually, the LSN format itself can give an > > > idea about the WAL file it is in. > > > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc > > > limit 1; > > > lsn | pg_walfile_name_offset > > > -----------+------------------------------- > > > 0/5000038 | (000000010000000000000005,56) > > > (1 row) > > > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc > > > limit 1; > > > lsn | pg_walfile_name_offset > > > -----------+------------------------------------- > > > 0/5FFFFC0 | (000000010000000000000005,16777152) > > > (1 row) > > > > > > > The workaround you are suggesting is not very user friendly and FYKI > > pg_wal_records_info simply hangs at times when we specify the higher > > and lower limit of lsn in a wal file. > > > > To make things easier for the end users I would suggest we add a > > function that can return a valid first and last lsn in a walfile. The > > output of this function can be used to inspect the wal records in the > > entire wal file if they wish to do so and I am sure they will. So, it > > should be something like this: > > > > select first_valid_lsn, last_valid_lsn from > > pg_get_first_last_valid_wal_record('wal-segment-name'); > > > > And above function can directly be used with pg_get_wal_records_info() like > > > > select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment')); > > > > I think this is a pretty basic ASK that we expect to be present in the > > module like this. > > Added a new function that returns the first and last valid WAL record > LSN of a given WAL file. > > > > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > > > > > I think we should allow all these functions to be executed in wait and > > > > *nowait* mode. If a user specifies nowait mode, the function should > > > > return if no WAL data is present, rather than waiting for new WAL data > > > > to become available, default behaviour could be anything you like. > > > > > > Currently, pg_walinspect uses read_local_xlog_page which waits in the > > > while(1) loop if a future LSN is specified. As read_local_xlog_page is > > > an implementation of XLogPageReadCB, which doesn't have a wait/nowait > > > parameter, if we really need a wait/nowait mode behaviour, we need to > > > do extra things(either add a backend-level global wait variable, set > > > before XLogReadRecord, if set, read_local_xlog_page can just exit > > > without waiting and reset after the XLogReadRecord or add an extra > > > bool wait variable to XLogReaderState and use it in > > > read_local_xlog_page). > > > > > > > I am not asking to do any changes in the backend code. Please check - > > how pg_waldump does this when a user requests to stop once the endptr > > has reached. If not for all functions at least for a few functions we > > can do this if it is doable. > > I've added a new function read_local_xlog_page_2 (similar to > read_local_xlog_page but works in wait and no wait mode) and the > callers can specify whether to wait or not wait using private_data. > Actually, I wanted to use the private_data structure of > read_local_xlog_page but the logical decoding already has context as > private_data, that is why I had to have a new function. I know it > creates a bit of duplicate code, but its cleaner than using > backend-local variables or additional flags in XLogReaderState or > adding wait/no-wait boolean to page_read callback. Any other > suggestions are welcome here. > > With this, I'm able to have wait/no wait versions for any functions. > But for now, I'm having wait/no wait for two functions > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more > sense. > > > > > +Datum > > > > +pg_get_wal_records_info(PG_FUNCTION_ARGS) > > > > +{ > > > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10 > > > > > > > > > > > > We could probably have another variant of this function that would > > > > work even if the end pointer is not specified, in which case the > > > > default end pointer would be the last WAL record in the WAL segment. > > > > Currently it mandates the use of an end pointer which slightly reduces > > > > flexibility. > > > > > > Last WAL record in the WAL segment may not be of much use(one can > > > figure out the last valid WAL record in a wal file as mentioned > > > above), but the WAL records info till the latest current flush LSN of > > > the server would be a useful functionality. But that too, can be found > > > using something like "select lsn, prev_lsn, resource_manager from > > > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());" > > > > > > > What if a user wants to inspect all the valid wal records from a > > startptr (startlsn) and he doesn't know the endptr? Why should he/she > > be mandated to get the endptr and supply it to this function? I don't > > think we should force users to do that. I think this is again a very > > basic ASK that can be done in this version itself. It is not at all > > any advanced thing that we can think of doing in the future. > > Agreed. Added new functions that emits wal records info/stats till the > end of the WAL at the moment. > > > > > + > > > > +/* > > > > + * Get the first valid raw WAL record lsn. > > > > + */ > > > > +Datum > > > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) > > > > > > > > > > > > I think this function should return a pointer to the nearest valid WAL > > > > record which can be the previous WAL record to the LSN entered by the > > > > user or the next WAL record. If a user unknowingly enters an lsn that > > > > does not exist then in such cases we should probably return the lsn of > > > > the previous WAL record instead of hanging or waiting for the new WAL > > > > record to arrive. > > > > > > Is it useful? > > > > It is useful in the same way as returning the next valid wal pointer > > is. Why should a user wait for the next valid wal pointer to be > > available instead the function should identify the previous valid wal > > record and return it and put an appropriate message to the user. > > > > If there's a strong reason, how about naming > > > pg_get_next_valid_wal_record_lsn returning the next valid wal record > > > LSN and pg_get_previous_valid_wal_record_lsn returning the previous > > > valid wal record LSN ? If you think having two functions is too much, > > > then, how about pg_get_first_valid_wal_record_lsn returning both the > > > next valid wal record LSN and its previous wal record LSN? > > > > > > > The latter one looks better. > > Modified. > > Attaching v5 patch set, please review it further. > > Regards, > Bharath Rupireddy.
On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch) Thanks for reviewing. > +-- > +-- pg_get_wal_records_info() > +-- > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, > + IN end_lsn pg_lsn, > + IN wait_for_wal boolean DEFAULT false, > + OUT lsn pg_lsn, > > What does the wait_for_wal flag mean here when one has already > specified the start and end lsn? AFAIU, If a user has specified a > start and stop LSN, it means that the user knows the extent to which > he/she wants to display the WAL records in which case we need to stop > once the end lsn has reached . So what is the meaning of wait_for_wal > flag? Does it look sensible to have the wait_for_wal flag here? To me > it doesn't. Users can always specify a future end_lsn and set wait_for_wal to true, then the pg_get_wal_records_info/pg_get_wal_stats functions can wait for the WAL. IMO, this is useful. If you remember you were okay with wait/nowait versions for some of the functions upthread [1]. I'm not going to retain this behaviour for both pg_get_wal_records_info/pg_get_wal_stats as it is similar to pg_waldump's --follow option. > == > > +-- > +-- pg_get_wal_records_info_till_end_of_wal() > +-- > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > + OUT lsn pg_lsn, > + OUT prev_lsn pg_lsn, > + OUT xid xid, > > Why is this function required? Is pg_get_wal_records_info() alone not > enough? I think it is. See if we can make end_lsn optional in > pg_get_wal_records_info() and lets just have it alone. I think it can > do the job of pg_get_wal_records_info_till_end_of_wal function. > > == > > +-- > +-- pg_get_wal_stats_till_end_of_wal() > +-- > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, > + OUT resource_manager text, > + OUT count int8, > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? I'm doing the following input validations for these functions to not cause any issues with invalid LSN. If I were to have the default value for end_lsn as 0/0, I can't perform input validations right? That is the reason I'm having separate functions {pg_get_wal_records_info, pg_get_wal_stats}_till_end_of_wal() versions. /* Validate input. */ if (XLogRecPtrIsInvalid(start_lsn)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid WAL record start LSN"))); if (XLogRecPtrIsInvalid(end_lsn)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid WAL record end LSN"))); if (start_lsn >= end_lsn) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("WAL record start LSN must be less than end LSN"))); > == > > > + if (loc <= read_upto) > + break; > + > + /* Let's not wait for WAL to be available if > indicated */ > + if (loc > read_upto && > + state->private_data != NULL) > + { > > Why loc > read_upto? The first if condition is (loc <= read_upto) > followed by the second if condition - (loc > read_upto). Is the first > if condition (loc <= read_upto) not enough to indicate that loc > > read_upto? Yeah, that's unnecessary, I improved the comment there and removed loc > read_upto. > == > > +#define IsEndOfWALReached(state) \ > + (state->private_data != NULL && \ > + (((ReadLocalXLOGPage2Private *) > xlogreader->private_data)->no_wait == true) && \ > + (((ReadLocalXLOGPage2Private *) > xlogreader->private_data)->reached_end_of_wal == true)) > > I think we should either use state or xlogreader. First line says > state->private_data and second line xlogreader->private_data. I've changed it to use state instead of xlogreader. > == > > + (((ReadLocalXLOGPage2Private *) > xlogreader->private_data)->reached_end_of_wal == true)) > + > > There is a new patch coming to make the end of WAL messages less > scary. It introduces the EOW flag in xlogreaderstate maybe we can use > that instead of introducing new flags in private area to represent the > end of WAL. Yeah that would be great. But we never know which one gets committed first. Until then it's not good to have dependencies on two "on-going" patches. Later, we can change. > == > > +/* > + * XLogReaderRoutine->page_read callback for reading local xlog files > + * > + * This function is same as read_local_xlog_page except that it works in both > + * wait and no wait mode. The callers can specify about waiting in private_data > + * of XLogReaderState. > + */ > +int > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > + int reqLen, XLogRecPtr > targetRecPtr, char *cur_page) > +{ > + XLogRecPtr read_upto, > > Do we really need this function? Can't we make use of an existing WAL > reader function - read_local_xlog_page()? I clearly explained the reasons upthread [2]. Please let me know if you have more thoughts/doubts here, we can connect offlist. Attaching v6 patch set with above review comments addressed. Please review it further. [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); I think we should allow all these functions to be executed in wait and *nowait* mode. If a user specifies nowait mode, the function should return if no WAL data is present, rather than waiting for new WAL data to become available, default behaviour could be anything you like. [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com I've added a new function read_local_xlog_page_2 (similar to read_local_xlog_page but works in wait and no wait mode) and the callers can specify whether to wait or not wait using private_data. Actually, I wanted to use the private_data structure of read_local_xlog_page but the logical decoding already has context as private_data, that is why I had to have a new function. I know it creates a bit of duplicate code, but its cleaner than using backend-local variables or additional flags in XLogReaderState or adding wait/no-wait boolean to page_read callback. Any other suggestions are welcome here. With this, I'm able to have wait/no wait versions for any functions. But for now, I'm having wait/no wait for two functions (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more sense. Regards, Bharath Rupireddy.
Вложения
On Wed, Mar 2, 2022 at 10:37 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch) > > Thanks for reviewing. > > > +-- > > +-- pg_get_wal_records_info() > > +-- > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, > > + IN end_lsn pg_lsn, > > + IN wait_for_wal boolean DEFAULT false, > > + OUT lsn pg_lsn, > > > > What does the wait_for_wal flag mean here when one has already > > specified the start and end lsn? AFAIU, If a user has specified a > > start and stop LSN, it means that the user knows the extent to which > > he/she wants to display the WAL records in which case we need to stop > > once the end lsn has reached . So what is the meaning of wait_for_wal > > flag? Does it look sensible to have the wait_for_wal flag here? To me > > it doesn't. > > Users can always specify a future end_lsn and set wait_for_wal to > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can > wait for the WAL. IMO, this is useful. If you remember you were okay > with wait/nowait versions for some of the functions upthread [1]. I'm > not going to retain this behaviour for both > pg_get_wal_records_info/pg_get_wal_stats as it is similar to > pg_waldump's --follow option. > It is not at all similar to pg_waldumps behaviour. Please check the behaviour of pg_waldump properly. Does it wait for any wal records when a user has specified a stop pointer? It doesn't and it shouldn't. I mean does it even make sense to wait for the WAL when a stop pointer is specified? And it's quite understandable that if a user has asked pg_walinspect to stop at a certain point, it must. Also, What if there are already WAL records after the stop pointer, in that case does it even make sense to have a wait flag. WHat would be the meaning of the wait flag in that case? Further, have you checked wait_for_wal flag behaviour, is it even working? > > > > +-- > > +-- pg_get_wal_records_info_till_end_of_wal() > > +-- > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > > + OUT lsn pg_lsn, > > + OUT prev_lsn pg_lsn, > > + OUT xid xid, > > > > Why is this function required? Is pg_get_wal_records_info() alone not > > enough? I think it is. See if we can make end_lsn optional in > > pg_get_wal_records_info() and lets just have it alone. I think it can > > do the job of pg_get_wal_records_info_till_end_of_wal function. > > > > == > > > > +-- > > +-- pg_get_wal_stats_till_end_of_wal() > > +-- > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, > > + OUT resource_manager text, > > + OUT count int8, > > > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? > > I'm doing the following input validations for these functions to not > cause any issues with invalid LSN. If I were to have the default value > for end_lsn as 0/0, I can't perform input validations right? That is > the reason I'm having separate functions {pg_get_wal_records_info, > pg_get_wal_stats}_till_end_of_wal() versions. > You can do it. Please check pg_waldump to understand how it is done there. You cannot have multiple functions doing different things when one single function can do all the job. > > == > > > > > > + if (loc <= read_upto) > > + break; > > + > > + /* Let's not wait for WAL to be available if > > indicated */ > > + if (loc > read_upto && > > + state->private_data != NULL) > > + { > > > > Why loc > read_upto? The first if condition is (loc <= read_upto) > > followed by the second if condition - (loc > read_upto). Is the first > > if condition (loc <= read_upto) not enough to indicate that loc > > > read_upto? > > Yeah, that's unnecessary, I improved the comment there and removed loc > > read_upto. > > > == > > > > +#define IsEndOfWALReached(state) \ > > + (state->private_data != NULL && \ > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->no_wait == true) && \ > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->reached_end_of_wal == true)) > > > > I think we should either use state or xlogreader. First line says > > state->private_data and second line xlogreader->private_data. > > I've changed it to use state instead of xlogreader. > > > == > > > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->reached_end_of_wal == true)) > > + > > > > There is a new patch coming to make the end of WAL messages less > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use > > that instead of introducing new flags in private area to represent the > > end of WAL. > > Yeah that would be great. But we never know which one gets committed > first. Until then it's not good to have dependencies on two "on-going" > patches. Later, we can change. > > > == > > > > +/* > > + * XLogReaderRoutine->page_read callback for reading local xlog files > > + * > > + * This function is same as read_local_xlog_page except that it works in both > > + * wait and no wait mode. The callers can specify about waiting in private_data > > + * of XLogReaderState. > > + */ > > +int > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > > + int reqLen, XLogRecPtr > > targetRecPtr, char *cur_page) > > +{ > > + XLogRecPtr read_upto, > > > > Do we really need this function? Can't we make use of an existing WAL > > reader function - read_local_xlog_page()? > > I clearly explained the reasons upthread [2]. Please let me know if > you have more thoughts/doubts here, we can connect offlist. > > Attaching v6 patch set with above review comments addressed. Please > review it further. > > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > I think we should allow all these functions to be executed in wait and > *nowait* mode. If a user specifies nowait mode, the function should > return if no WAL data is present, rather than waiting for new WAL data > to become available, default behaviour could be anything you like. > > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com > I've added a new function read_local_xlog_page_2 (similar to > read_local_xlog_page but works in wait and no wait mode) and the > callers can specify whether to wait or not wait using private_data. > Actually, I wanted to use the private_data structure of > read_local_xlog_page but the logical decoding already has context as > private_data, that is why I had to have a new function. I know it > creates a bit of duplicate code, but its cleaner than using > backend-local variables or additional flags in XLogReaderState or > adding wait/no-wait boolean to page_read callback. Any other > suggestions are welcome here. > > With this, I'm able to have wait/no wait versions for any functions. > But for now, I'm having wait/no wait for two functions > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more > sense. > > Regards, > Bharath Rupireddy.
Hi. +#ifdef FRONTEND +/* + * Functions that are currently not needed in the backend, but are better + * implemented inside xlogreader.c because of the internal facilities available + * here. + */ + #endif /* FRONTEND */ Why didn't you remove the emptied #ifdef section? +int +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, + int reqLen, XLogRecPtr targetRecPtr, char *cur_page) The difference with the original function is this function has one additional if-block amid. I think we can insert the code directly in the original function. + /* + * We are trying to read future WAL. Let's not wait for WAL to be + * available if indicated. + */ + if (state->private_data != NULL) However, in the first place it seems to me there's not need for the function to take care of no_wait affairs. If, for expample, pg_get_wal_record_info() with no_wait = true, it is enough that the function identifies the bleeding edge of WAL then loop until the LSN. So I think no need for the new function, nor for any modification on the origical function. The changes will reduce the footprint of the patch largely, I think. At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch) > > Thanks for reviewing. > > > +-- > > +-- pg_get_wal_records_info() > > +-- > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, > > + IN end_lsn pg_lsn, > > + IN wait_for_wal boolean DEFAULT false, > > + OUT lsn pg_lsn, > > > > What does the wait_for_wal flag mean here when one has already > > specified the start and end lsn? AFAIU, If a user has specified a > > start and stop LSN, it means that the user knows the extent to which > > he/she wants to display the WAL records in which case we need to stop > > once the end lsn has reached . So what is the meaning of wait_for_wal > > flag? Does it look sensible to have the wait_for_wal flag here? To me > > it doesn't. > > Users can always specify a future end_lsn and set wait_for_wal to > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can > wait for the WAL. IMO, this is useful. If you remember you were okay > with wait/nowait versions for some of the functions upthread [1]. I'm > not going to retain this behaviour for both > pg_get_wal_records_info/pg_get_wal_stats as it is similar to > pg_waldump's --follow option. I agree to this for now. However, I prefer that NULL or invalid end_lsn is equivalent to pg_current_wal_lsn(). > > == > > > > +-- > > +-- pg_get_wal_records_info_till_end_of_wal() > > +-- > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > > + OUT lsn pg_lsn, > > + OUT prev_lsn pg_lsn, > > + OUT xid xid, > > > > Why is this function required? Is pg_get_wal_records_info() alone not > > enough? I think it is. See if we can make end_lsn optional in > > pg_get_wal_records_info() and lets just have it alone. I think it can > > do the job of pg_get_wal_records_info_till_end_of_wal function. I rather agree to Ashutosh. This feature can be covered by pg_get_wal_records(start_lsn, NULL, false). > > == > > > > +-- > > +-- pg_get_wal_stats_till_end_of_wal() > > +-- > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, > > + OUT resource_manager text, > > + OUT count int8, > > > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? > > I'm doing the following input validations for these functions to not > cause any issues with invalid LSN. If I were to have the default value > for end_lsn as 0/0, I can't perform input validations right? That is > the reason I'm having separate functions {pg_get_wal_records_info, > pg_get_wal_stats}_till_end_of_wal() versions. > > /* Validate input. */ > if (XLogRecPtrIsInvalid(start_lsn)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("invalid WAL record start LSN"))); > > if (XLogRecPtrIsInvalid(end_lsn)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("invalid WAL record end LSN"))); > > if (start_lsn >= end_lsn) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("WAL record start LSN must be less than end LSN"))); I don't think that validations are worth doing at least for the first two, as far as that value has a special meaning. I see it useful if pg_get_wal_records_info() means dump the all available records for the moment, or records of the last segment, page or something. > > == > > > > > > + if (loc <= read_upto) > > + break; > > + > > + /* Let's not wait for WAL to be available if > > indicated */ > > + if (loc > read_upto && > > + state->private_data != NULL) > > + { > > > > Why loc > read_upto? The first if condition is (loc <= read_upto) > > followed by the second if condition - (loc > read_upto). Is the first > > if condition (loc <= read_upto) not enough to indicate that loc > > > read_upto? > > Yeah, that's unnecessary, I improved the comment there and removed loc > > read_upto. > > > == > > > > +#define IsEndOfWALReached(state) \ > > + (state->private_data != NULL && \ > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->no_wait == true) && \ > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->reached_end_of_wal == true)) > > > > I think we should either use state or xlogreader. First line says > > state->private_data and second line xlogreader->private_data. > > I've changed it to use state instead of xlogreader. > > > == > > > > + (((ReadLocalXLOGPage2Private *) > > xlogreader->private_data)->reached_end_of_wal == true)) > > + > > > > There is a new patch coming to make the end of WAL messages less > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use > > that instead of introducing new flags in private area to represent the > > end of WAL. > > Yeah that would be great. But we never know which one gets committed > first. Until then it's not good to have dependencies on two "on-going" > patches. Later, we can change. > > > == > > > > +/* > > + * XLogReaderRoutine->page_read callback for reading local xlog files > > + * > > + * This function is same as read_local_xlog_page except that it works in both > > + * wait and no wait mode. The callers can specify about waiting in private_data > > + * of XLogReaderState. > > + */ > > +int > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > > + int reqLen, XLogRecPtr > > targetRecPtr, char *cur_page) > > +{ > > + XLogRecPtr read_upto, > > > > Do we really need this function? Can't we make use of an existing WAL > > reader function - read_local_xlog_page()? > > I clearly explained the reasons upthread [2]. Please let me know if > you have more thoughts/doubts here, we can connect offlist. *I* also think the function is not needed, as explained above. Why do we need that function while we know how far we can read WAL records *before* calling the function? > Attaching v6 patch set with above review comments addressed. Please > review it further. > > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > I think we should allow all these functions to be executed in wait and > *nowait* mode. If a user specifies nowait mode, the function should > return if no WAL data is present, rather than waiting for new WAL data > to become available, default behaviour could be anything you like. > > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com > I've added a new function read_local_xlog_page_2 (similar to > read_local_xlog_page but works in wait and no wait mode) and the > callers can specify whether to wait or not wait using private_data. > Actually, I wanted to use the private_data structure of > read_local_xlog_page but the logical decoding already has context as > private_data, that is why I had to have a new function. I know it > creates a bit of duplicate code, but its cleaner than using > backend-local variables or additional flags in XLogReaderState or > adding wait/no-wait boolean to page_read callback. Any other > suggestions are welcome here. > > With this, I'm able to have wait/no wait versions for any functions. > But for now, I'm having wait/no wait for two functions > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more > sense. > > Regards, > Bharath Rupireddy. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Here are a few comments. 1) > > > == > > > > > > +-- > > > +-- pg_get_wal_records_info_till_end_of_wal() > > > +-- > > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > > > + OUT lsn pg_lsn, > > > + OUT prev_lsn pg_lsn, > > > + OUT xid xid, > > > > > > Why is this function required? Is pg_get_wal_records_info() alone not > > > enough? I think it is. See if we can make end_lsn optional in > > > pg_get_wal_records_info() and lets just have it alone. I think it can > > > do the job of pg_get_wal_records_info_till_end_of_wal function. > > I rather agree to Ashutosh. This feature can be covered by > pg_get_wal_records(start_lsn, NULL, false). > I don't think that validations are worth doing at least for the first > two, as far as that value has a special meaning. I see it useful if > pg_get_wal_records_info() means dump the all available records for the > moment, or records of the last segment, page or something. > *I* also think the function is not needed, as explained above. Why do > we need that function while we know how far we can read WAL records > *before* calling the function? I agree with this. The function prototype comes first and the validation can be done accordingly. I feel we can even support 'pg_get_wal_record_info' with the same name. All 3 function's objectives are the same. So it is better to use the same name (pg_wal_record_info) with different prototypes. 2) The function 'pg_get_first_valid_wal_record_lsn' looks redundant as we are getting the same information from the function 'pg_get_first_and_last_valid_wal_record_lsn'. With this function, we can easily fetch the first lsn. So IMO we should remove 'pg_get_first_valid_wal_record_lsn'. 3) The word 'get' should be removed from the function name(*_get_*) as all the functions of the extension are used only to get the information. It will also sync with xlogfuncs's naming conventions like pg_current_wal_lsn, pg_walfile_name, etc. 4) The function names can be modified with lesser words by retaining the existing meaning. :s/pg_get_raw_wal_record/pg_wal_raw_record :s/pg_get_first_valid_wal_record_lsn/pg_wal_first_lsn :s/pg_get_first_and_last_valid_wal_record_lsn/pg_wal_first_and_last_lsn :s/pg_get_wal_record_info/pg_wal_record_info :s/pg_get_wal_stats/pg_wal_stats 5) Even 'pg_get_wal_stats' and 'pg_get_wal_stats_till_end_of_wal' can be clubbed as one function. The above comments are trying to simplify the extension APIs and to make it easy for the user to understand and use it. Thanks & Regards, Nitin Jadhav On Thu, Mar 3, 2022 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hi. > > +#ifdef FRONTEND > +/* > + * Functions that are currently not needed in the backend, but are better > + * implemented inside xlogreader.c because of the internal facilities available > + * here. > + */ > + > #endif /* FRONTEND */ > > Why didn't you remove the emptied #ifdef section? > > +int > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > + int reqLen, XLogRecPtr targetRecPtr, char *cur_page) > > The difference with the original function is this function has one > additional if-block amid. I think we can insert the code directly in > the original function. > > + /* > + * We are trying to read future WAL. Let's not wait for WAL to be > + * available if indicated. > + */ > + if (state->private_data != NULL) > > However, in the first place it seems to me there's not need for the > function to take care of no_wait affairs. > > If, for expample, pg_get_wal_record_info() with no_wait = true, it is > enough that the function identifies the bleeding edge of WAL then loop > until the LSN. So I think no need for the new function, nor for any > modification on the origical function. > > The changes will reduce the footprint of the patch largely, I think. > > At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch) > > > > Thanks for reviewing. > > > > > +-- > > > +-- pg_get_wal_records_info() > > > +-- > > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, > > > + IN end_lsn pg_lsn, > > > + IN wait_for_wal boolean DEFAULT false, > > > + OUT lsn pg_lsn, > > > > > > What does the wait_for_wal flag mean here when one has already > > > specified the start and end lsn? AFAIU, If a user has specified a > > > start and stop LSN, it means that the user knows the extent to which > > > he/she wants to display the WAL records in which case we need to stop > > > once the end lsn has reached . So what is the meaning of wait_for_wal > > > flag? Does it look sensible to have the wait_for_wal flag here? To me > > > it doesn't. > > > > Users can always specify a future end_lsn and set wait_for_wal to > > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can > > wait for the WAL. IMO, this is useful. If you remember you were okay > > with wait/nowait versions for some of the functions upthread [1]. I'm > > not going to retain this behaviour for both > > pg_get_wal_records_info/pg_get_wal_stats as it is similar to > > pg_waldump's --follow option. > > I agree to this for now. However, I prefer that NULL or invalid > end_lsn is equivalent to pg_current_wal_lsn(). > > > > == > > > > > > +-- > > > +-- pg_get_wal_records_info_till_end_of_wal() > > > +-- > > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > > > + OUT lsn pg_lsn, > > > + OUT prev_lsn pg_lsn, > > > + OUT xid xid, > > > > > > Why is this function required? Is pg_get_wal_records_info() alone not > > > enough? I think it is. See if we can make end_lsn optional in > > > pg_get_wal_records_info() and lets just have it alone. I think it can > > > do the job of pg_get_wal_records_info_till_end_of_wal function. > > I rather agree to Ashutosh. This feature can be covered by > pg_get_wal_records(start_lsn, NULL, false). > > > > == > > > > > > +-- > > > +-- pg_get_wal_stats_till_end_of_wal() > > > +-- > > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, > > > + OUT resource_manager text, > > > + OUT count int8, > > > > > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? > > > > I'm doing the following input validations for these functions to not > > cause any issues with invalid LSN. If I were to have the default value > > for end_lsn as 0/0, I can't perform input validations right? That is > > the reason I'm having separate functions {pg_get_wal_records_info, > > pg_get_wal_stats}_till_end_of_wal() versions. > > > > /* Validate input. */ > > if (XLogRecPtrIsInvalid(start_lsn)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("invalid WAL record start LSN"))); > > > > if (XLogRecPtrIsInvalid(end_lsn)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("invalid WAL record end LSN"))); > > > > if (start_lsn >= end_lsn) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("WAL record start LSN must be less than end LSN"))); > > I don't think that validations are worth doing at least for the first > two, as far as that value has a special meaning. I see it useful if > pg_get_wal_records_info() means dump the all available records for the > moment, or records of the last segment, page or something. > > > > == > > > > > > > > > + if (loc <= read_upto) > > > + break; > > > + > > > + /* Let's not wait for WAL to be available if > > > indicated */ > > > + if (loc > read_upto && > > > + state->private_data != NULL) > > > + { > > > > > > Why loc > read_upto? The first if condition is (loc <= read_upto) > > > followed by the second if condition - (loc > read_upto). Is the first > > > if condition (loc <= read_upto) not enough to indicate that loc > > > > read_upto? > > > > Yeah, that's unnecessary, I improved the comment there and removed loc > > > read_upto. > > > > > == > > > > > > +#define IsEndOfWALReached(state) \ > > > + (state->private_data != NULL && \ > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->no_wait == true) && \ > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->reached_end_of_wal == true)) > > > > > > I think we should either use state or xlogreader. First line says > > > state->private_data and second line xlogreader->private_data. > > > > I've changed it to use state instead of xlogreader. > > > > > == > > > > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->reached_end_of_wal == true)) > > > + > > > > > > There is a new patch coming to make the end of WAL messages less > > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use > > > that instead of introducing new flags in private area to represent the > > > end of WAL. > > > > Yeah that would be great. But we never know which one gets committed > > first. Until then it's not good to have dependencies on two "on-going" > > patches. Later, we can change. > > > > > == > > > > > > +/* > > > + * XLogReaderRoutine->page_read callback for reading local xlog files > > > + * > > > + * This function is same as read_local_xlog_page except that it works in both > > > + * wait and no wait mode. The callers can specify about waiting in private_data > > > + * of XLogReaderState. > > > + */ > > > +int > > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > > > + int reqLen, XLogRecPtr > > > targetRecPtr, char *cur_page) > > > +{ > > > + XLogRecPtr read_upto, > > > > > > Do we really need this function? Can't we make use of an existing WAL > > > reader function - read_local_xlog_page()? > > > > I clearly explained the reasons upthread [2]. Please let me know if > > you have more thoughts/doubts here, we can connect offlist. > > *I* also think the function is not needed, as explained above. Why do > we need that function while we know how far we can read WAL records > *before* calling the function? > > > Attaching v6 patch set with above review comments addressed. Please > > review it further. > > > > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > I think we should allow all these functions to be executed in wait and > > *nowait* mode. If a user specifies nowait mode, the function should > > return if no WAL data is present, rather than waiting for new WAL data > > to become available, default behaviour could be anything you like. > > > > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com > > I've added a new function read_local_xlog_page_2 (similar to > > read_local_xlog_page but works in wait and no wait mode) and the > > callers can specify whether to wait or not wait using private_data. > > Actually, I wanted to use the private_data structure of > > read_local_xlog_page but the logical decoding already has context as > > private_data, that is why I had to have a new function. I know it > > creates a bit of duplicate code, but its cleaner than using > > backend-local variables or additional flags in XLogReaderState or > > adding wait/no-wait boolean to page_read callback. Any other > > suggestions are welcome here. > > > > With this, I'm able to have wait/no wait versions for any functions. > > But for now, I'm having wait/no wait for two functions > > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more > > sense. > > > > Regards, > > Bharath Rupireddy. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > >
I think we should also see if we can allow end users to input timeline information with the pg_walinspect functions. This will help the end users to get information about WAL records from previous timeline which can be helpful in case of restored servers. -- With Regards, Ashutosh Sharma. On Thu, Mar 3, 2022 at 8:20 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hi. > > +#ifdef FRONTEND > +/* > + * Functions that are currently not needed in the backend, but are better > + * implemented inside xlogreader.c because of the internal facilities available > + * here. > + */ > + > #endif /* FRONTEND */ > > Why didn't you remove the emptied #ifdef section? > > +int > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > + int reqLen, XLogRecPtr targetRecPtr, char *cur_page) > > The difference with the original function is this function has one > additional if-block amid. I think we can insert the code directly in > the original function. > > + /* > + * We are trying to read future WAL. Let's not wait for WAL to be > + * available if indicated. > + */ > + if (state->private_data != NULL) > > However, in the first place it seems to me there's not need for the > function to take care of no_wait affairs. > > If, for expample, pg_get_wal_record_info() with no_wait = true, it is > enough that the function identifies the bleeding edge of WAL then loop > until the LSN. So I think no need for the new function, nor for any > modification on the origical function. > > The changes will reduce the footprint of the patch largely, I think. > > At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch) > > > > Thanks for reviewing. > > > > > +-- > > > +-- pg_get_wal_records_info() > > > +-- > > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, > > > + IN end_lsn pg_lsn, > > > + IN wait_for_wal boolean DEFAULT false, > > > + OUT lsn pg_lsn, > > > > > > What does the wait_for_wal flag mean here when one has already > > > specified the start and end lsn? AFAIU, If a user has specified a > > > start and stop LSN, it means that the user knows the extent to which > > > he/she wants to display the WAL records in which case we need to stop > > > once the end lsn has reached . So what is the meaning of wait_for_wal > > > flag? Does it look sensible to have the wait_for_wal flag here? To me > > > it doesn't. > > > > Users can always specify a future end_lsn and set wait_for_wal to > > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can > > wait for the WAL. IMO, this is useful. If you remember you were okay > > with wait/nowait versions for some of the functions upthread [1]. I'm > > not going to retain this behaviour for both > > pg_get_wal_records_info/pg_get_wal_stats as it is similar to > > pg_waldump's --follow option. > > I agree to this for now. However, I prefer that NULL or invalid > end_lsn is equivalent to pg_current_wal_lsn(). > > > > == > > > > > > +-- > > > +-- pg_get_wal_records_info_till_end_of_wal() > > > +-- > > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn, > > > + OUT lsn pg_lsn, > > > + OUT prev_lsn pg_lsn, > > > + OUT xid xid, > > > > > > Why is this function required? Is pg_get_wal_records_info() alone not > > > enough? I think it is. See if we can make end_lsn optional in > > > pg_get_wal_records_info() and lets just have it alone. I think it can > > > do the job of pg_get_wal_records_info_till_end_of_wal function. > > I rather agree to Ashutosh. This feature can be covered by > pg_get_wal_records(start_lsn, NULL, false). > > > > == > > > > > > +-- > > > +-- pg_get_wal_stats_till_end_of_wal() > > > +-- > > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn, > > > + OUT resource_manager text, > > > + OUT count int8, > > > > > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough? > > > > I'm doing the following input validations for these functions to not > > cause any issues with invalid LSN. If I were to have the default value > > for end_lsn as 0/0, I can't perform input validations right? That is > > the reason I'm having separate functions {pg_get_wal_records_info, > > pg_get_wal_stats}_till_end_of_wal() versions. > > > > /* Validate input. */ > > if (XLogRecPtrIsInvalid(start_lsn)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("invalid WAL record start LSN"))); > > > > if (XLogRecPtrIsInvalid(end_lsn)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("invalid WAL record end LSN"))); > > > > if (start_lsn >= end_lsn) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("WAL record start LSN must be less than end LSN"))); > > I don't think that validations are worth doing at least for the first > two, as far as that value has a special meaning. I see it useful if > pg_get_wal_records_info() means dump the all available records for the > moment, or records of the last segment, page or something. > > > > == > > > > > > > > > + if (loc <= read_upto) > > > + break; > > > + > > > + /* Let's not wait for WAL to be available if > > > indicated */ > > > + if (loc > read_upto && > > > + state->private_data != NULL) > > > + { > > > > > > Why loc > read_upto? The first if condition is (loc <= read_upto) > > > followed by the second if condition - (loc > read_upto). Is the first > > > if condition (loc <= read_upto) not enough to indicate that loc > > > > read_upto? > > > > Yeah, that's unnecessary, I improved the comment there and removed loc > > > read_upto. > > > > > == > > > > > > +#define IsEndOfWALReached(state) \ > > > + (state->private_data != NULL && \ > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->no_wait == true) && \ > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->reached_end_of_wal == true)) > > > > > > I think we should either use state or xlogreader. First line says > > > state->private_data and second line xlogreader->private_data. > > > > I've changed it to use state instead of xlogreader. > > > > > == > > > > > > + (((ReadLocalXLOGPage2Private *) > > > xlogreader->private_data)->reached_end_of_wal == true)) > > > + > > > > > > There is a new patch coming to make the end of WAL messages less > > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use > > > that instead of introducing new flags in private area to represent the > > > end of WAL. > > > > Yeah that would be great. But we never know which one gets committed > > first. Until then it's not good to have dependencies on two "on-going" > > patches. Later, we can change. > > > > > == > > > > > > +/* > > > + * XLogReaderRoutine->page_read callback for reading local xlog files > > > + * > > > + * This function is same as read_local_xlog_page except that it works in both > > > + * wait and no wait mode. The callers can specify about waiting in private_data > > > + * of XLogReaderState. > > > + */ > > > +int > > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr, > > > + int reqLen, XLogRecPtr > > > targetRecPtr, char *cur_page) > > > +{ > > > + XLogRecPtr read_upto, > > > > > > Do we really need this function? Can't we make use of an existing WAL > > > reader function - read_local_xlog_page()? > > > > I clearly explained the reasons upthread [2]. Please let me know if > > you have more thoughts/doubts here, we can connect offlist. > > *I* also think the function is not needed, as explained above. Why do > we need that function while we know how far we can read WAL records > *before* calling the function? > > > Attaching v6 patch set with above review comments addressed. Please > > review it further. > > > > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > I think we should allow all these functions to be executed in wait and > > *nowait* mode. If a user specifies nowait mode, the function should > > return if no WAL data is present, rather than waiting for new WAL data > > to become available, default behaviour could be anything you like. > > > > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com > > I've added a new function read_local_xlog_page_2 (similar to > > read_local_xlog_page but works in wait and no wait mode) and the > > callers can specify whether to wait or not wait using private_data. > > Actually, I wanted to use the private_data structure of > > read_local_xlog_page but the logical decoding already has context as > > private_data, that is why I had to have a new function. I know it > > creates a bit of duplicate code, but its cleaner than using > > backend-local variables or additional flags in XLogReaderState or > > adding wait/no-wait boolean to page_read callback. Any other > > suggestions are welcome here. > > > > With this, I'm able to have wait/no wait versions for any functions. > > But for now, I'm having wait/no wait for two functions > > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more > > sense. > > > > Regards, > > Bharath Rupireddy. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center
On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Added a new function that returns the first and last valid WAL record > LSN of a given WAL file. Sounds like fuzzy thinking to me. WAL records can cross file boundaries, and forgetting about that leads to all sorts of problems. Just give people one function that decodes a range of LSNs and call it good. Why do you need anything else? If people want to get the first record that begins in a segment or the first record any portion of which is in a particular segment or the last record that begins in a segment or the last record that ends in a segment or any other such thing, they can use a WHERE clause for that... and if you think they can't, then that should be good cause to rethink the return value of the one-and-only SRF that I think you need here. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 3, 2022 at 10:05 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > Added a new function that returns the first and last valid WAL record > > LSN of a given WAL file. > > Sounds like fuzzy thinking to me. WAL records can cross file > boundaries, and forgetting about that leads to all sorts of problems. > Just give people one function that decodes a range of LSNs and call it > good. Why do you need anything else? If people want to get the first > record that begins in a segment or the first record any portion of > which is in a particular segment or the last record that begins in a > segment or the last record that ends in a segment or any other such > thing, they can use a WHERE clause for that... and if you think they > can't, then that should be good cause to rethink the return value of > the one-and-only SRF that I think you need here. Thanks Robert. Thanks to others for your review comments. Here's the v7 patch set. These patches are based on the motive that "keep it simple and short yet effective and useful". With that in mind, I have not implemented the wait mode for any of the functions (as it doesn't look an effective use-case and requires adding a new page_read callback, instead throw error if future LSN is specified), also these functions will give WAL information on the current server's timeline. Having said that, I'm open to adding new functions in future once this initial version gets in, if there's a requirement and users ask for the new functions. Please review the v7 patch set and provide your thoughts. Regards, Bharath Rupireddy.
Вложения
Thanks Bharath for working on all my review comments. I took a quick look at the new version of the patch (v7-pg_walinspect.patch) and this version looks a lot better. I'll do some detailed review later (maybe next week or so) and let you know my further comments, if any. -- With Regards, Ashutosh Sharma. On Fri, Mar 4, 2022 at 3:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Mar 3, 2022 at 10:05 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Added a new function that returns the first and last valid WAL record > > > LSN of a given WAL file. > > > > Sounds like fuzzy thinking to me. WAL records can cross file > > boundaries, and forgetting about that leads to all sorts of problems. > > Just give people one function that decodes a range of LSNs and call it > > good. Why do you need anything else? If people want to get the first > > record that begins in a segment or the first record any portion of > > which is in a particular segment or the last record that begins in a > > segment or the last record that ends in a segment or any other such > > thing, they can use a WHERE clause for that... and if you think they > > can't, then that should be good cause to rethink the return value of > > the one-and-only SRF that I think you need here. > > Thanks Robert. > > Thanks to others for your review comments. > > Here's the v7 patch set. These patches are based on the motive that > "keep it simple and short yet effective and useful". With that in > mind, I have not implemented the wait mode for any of the functions > (as it doesn't look an effective use-case and requires adding a new > page_read callback, instead throw error if future LSN is specified), > also these functions will give WAL information on the current server's > timeline. Having said that, I'm open to adding new functions in future > once this initial version gets in, if there's a requirement and users > ask for the new functions. > > Please review the v7 patch set and provide your thoughts. > > Regards, > Bharath Rupireddy.
On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote: > > Attaching v6 patch set with above review comments addressed. Please > review it further. * Don't issue WARNINGs or other messages for ordinary situations, like when pg_get_wal_records_info() hits the end of WAL. * It feels like the APIs that allow waiting for the end of WAL are slightly off. Can't you just do pg_get_wal_records_info(start_lsn, least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting behavior? Try to make the API more orthogonal, where a few basic functions can be combined to give you everything you need, rather than specifying extra parameters and issuing WARNINGs. I * In the docs, include some example output. I don't see any output in the tests, which makes sense because it's mostly non-deterministic, but it would be helpful to see sample output of at least pg_get_wal_records_info(). * Is pg_get_wal_stats() even necessary, or can you get the same information with a query over pg_get_wal_records_info()? For instance, if you want to group by transaction ID rather than rmgr, then pg_get_wal_stats() is useless. * Would be nice to have a pg_wal_file_is_valid() or similar, which would test that it exists, and the header matches the filename (e.g. if it was recycled but not used, that would count as invalid). I think pg_get_first_valid_wal_record_lsn() would make some cases look invalid even if the file is valid -- for example, if a wal record spans many wal segments, the segments might look invalid because they contain no complete records, but the file itself is still valid and contains valid wal data. * Is there a reason you didn't include the timeline ID in pg_get_wal_records_info()? * Can we mark this extension 'trusted'? I'm not 100% clear on the standards for that marker, but it seems reasonable for a database owner with the right privileges might want to install it. * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that function should require pg_read_server_files? Or at least pg_read_all_data? Regards, Jeff Davis
On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote: > > > > Attaching v6 patch set with above review comments addressed. Please > > review it further. Thanks Jeff for reviewing it. I've posted the latest v7 patch-set upthread [1] which is having more simple-yet-useful-and-effective functions. > * Don't issue WARNINGs or other messages for ordinary situations, like > when pg_get_wal_records_info() hits the end of WAL. v7 patch-set [1] has no warnings, but the functions will error out if future LSN is specified. > * It feels like the APIs that allow waiting for the end of WAL are > slightly off. Can't you just do pg_get_wal_records_info(start_lsn, > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting > behavior? Try to make the API more orthogonal, where a few basic > functions can be combined to give you everything you need, rather than > specifying extra parameters and issuing WARNINGs. I v7 patch-set [1] onwards waiting mode has been removed for all of the functions, again to keep things simple-yet-useful-and-effective. However, we can always add new pg_walinspect functions that wait for future WAL in the next versions once basic stuff gets committed and if many users ask for it. > * In the docs, include some example output. I don't see any output in > the tests, which makes sense because it's mostly non-deterministic, but > it would be helpful to see sample output of at least > pg_get_wal_records_info(). +1. Added for pg_get_wal_records_info and pg_get_wal_stats. > * Is pg_get_wal_stats() even necessary, or can you get the same > information with a query over pg_get_wal_records_info()? For instance, > if you want to group by transaction ID rather than rmgr, then > pg_get_wal_stats() is useless. Yes, you are right pg_get_wal_stats provides WAL stats per resource manager which is similar to pg_waldump with --start, --end and --stats option. It provides more information than pg_get_wal_records_info and is a good way of getting stats than adding more columns to pg_get_wal_records_info, calculating percentage in sql and having group by clause. IMO, pg_get_wal_stats is more readable and useful. > * Would be nice to have a pg_wal_file_is_valid() or similar, which > would test that it exists, and the header matches the filename (e.g. if > it was recycled but not used, that would count as invalid). I think > pg_get_first_valid_wal_record_lsn() would make some cases look invalid > even if the file is valid -- for example, if a wal record spans many > wal segments, the segments might look invalid because they contain no > complete records, but the file itself is still valid and contains valid > wal data. Actually I haven't tried testing a single WAL record spanning many WAL files yet(I'm happy to try it if someone suggests such a use-case). In that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't have a problem because it just gives the next valid LSN and it's previous LSN using existing WAL reader API XLogFindNextRecord(). It opens up the WAL file segments using (some dots to connect - page_read/read_local_xlog_page, WALRead, segment_open/wal_segment_open). Thoughts? I don't think it's necessary to have a function pg_wal_file_is_valid() that given a WAL file name as input checks whether a WAL file exists or not, probably not in the core (xlogfuncs.c) too. These kinds of functions can open up challenges in terms of user input validation and may cause unnecessary problems, please see some related discussion [2]. > * Is there a reason you didn't include the timeline ID in > pg_get_wal_records_info()? I'm right now allowing the functions to read WAL from the current server's timeline which I have mentioned in the docs. The server's current timeline is available via pg_control_checkpoint()'s timeline_id. So, having timeline_id as a column doesn't make sense. Again this is to keep things simple-yet-useful-and-effective. However, we can add new pg_walinspect functions to read WAL from historic as well as current timelines in the next versions once basic stuff gets committed and if many users ask for it. + <para> + All the functions of this module will provide the WAL information using the + current server's timeline ID. + </para> > * Can we mark this extension 'trusted'? I'm not 100% clear on the > standards for that marker, but it seems reasonable for a database owner > with the right privileges might want to install it. 'trusted' extensions concept is added by commit 50fc694 [3]. Since pg_walinspect deals with WAL, we strictly want to control who creates and can execute functions exposed by it, so I don't know if 'trusted' is a good idea here. Also, pageinspect isn't a 'trusted' extension. > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > function should require pg_read_server_files? Or at least > pg_read_all_data? pg_read_all_data may not be the right choice, but pg_read_server_files is. However, does it sound good if some functions are allowed to be executed by users with a pg_monitor role and others pg_get_raw_wal_record by users with pg_read_server_files? Since the extension itself can be created by superusers, isn't the pg_get_raw_wal_record sort of safe with pg_mointor itself? If hackers don't agree, I'm happy to grant execution on pg_get_raw_wal_record() to the pg_read_server_files role. Attaching the v8 patch-set resolving above comments and some tests for checking function permissions. Please review it further. [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed Jan 29 18:42:43 2020 -0500 Invent "trusted" extensions, and remove the pg_pltemplate catalog. Regards, Bharath Rupireddy.
Вложения
On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote: > * Can we mark this extension 'trusted'? I'm not 100% clear on the > standards for that marker, but it seems reasonable for a database owner > with the right privileges might want to install it. I'm not clear on the standard either, exactly, but might not that allow the database owner to get a peek at what's happening in other databases? -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote: > > * Can we mark this extension 'trusted'? I'm not 100% clear on the > > standards for that marker, but it seems reasonable for a database owner > > with the right privileges might want to install it. > > I'm not clear on the standard either, exactly, but might not that > allow the database owner to get a peek at what's happening in other > databases? The standard is basically that all of the functions it brings are written to enforce the PG privilege system and you aren't able to use the extension to bypass those privileges. In some cases that means that the C-language functions installed have if(!superuser) ereport() calls throughout them- that's a fine answer, but it's perhaps not very helpful to mark those as trusted. In other cases, the C-language functions installed don't directly provide access to data, such as the PostGIS functions. I've not looked back on this thread, but I'd expect pg_walinspect to need those superuser checks and with those it *could* be marked as trusted, but that again brings into question how useful it is to mark it thusly. In an ideal world, we might have a pg_readwal predefined role which allows a role which was GRANT'd that role to be able to read WAL traffic, and then the pg_walinspect extension could check that the calling role has that predefined role, and other functions and extensions could also check that rather than any existing superuser checks. A cloud provider or such could then include in their setup of a new instance something like: GRANT pg_readwal TO admin_user WITH ADMIN OPTION; Presuming that there isn't anything that ends up in the WAL that's an issue for the admin_user to have access to. I certainly don't think we should allow either database owners or regular users on a system the ability to access the WAL traffic of the entire system. More forcefully- we should *not* be throwing more access rights towards $owners in general and should be thinking about how we can allow admins, providers, whomever, the ability to control what rights users are given. If they're all lumped under 'owner' then there's no way for people to provide granular access to just those things they wish and intend to. Thanks, Stephen
Вложения
At Thu, 10 Mar 2022 22:15:42 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote: > > > > > > Attaching v6 patch set with above review comments addressed. Please > > > review it further. > > Thanks Jeff for reviewing it. I've posted the latest v7 patch-set > upthread [1] which is having more simple-yet-useful-and-effective > functions. > > > * Don't issue WARNINGs or other messages for ordinary situations, like > > when pg_get_wal_records_info() hits the end of WAL. > > v7 patch-set [1] has no warnings, but the functions will error out if > future LSN is specified. > > > * It feels like the APIs that allow waiting for the end of WAL are > > slightly off. Can't you just do pg_get_wal_records_info(start_lsn, > > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting > > behavior? Try to make the API more orthogonal, where a few basic > > functions can be combined to give you everything you need, rather than > > specifying extra parameters and issuing WARNINGs. I > > v7 patch-set [1] onwards waiting mode has been removed for all of the > functions, again to keep things simple-yet-useful-and-effective. > However, we can always add new pg_walinspect functions that wait for > future WAL in the next versions once basic stuff gets committed and if > many users ask for it. > > > * In the docs, include some example output. I don't see any output in > > the tests, which makes sense because it's mostly non-deterministic, but > > it would be helpful to see sample output of at least > > pg_get_wal_records_info(). > > +1. Added for pg_get_wal_records_info and pg_get_wal_stats. > > > * Is pg_get_wal_stats() even necessary, or can you get the same > > information with a query over pg_get_wal_records_info()? For instance, > > if you want to group by transaction ID rather than rmgr, then > > pg_get_wal_stats() is useless. > > Yes, you are right pg_get_wal_stats provides WAL stats per resource > manager which is similar to pg_waldump with --start, --end and --stats > option. It provides more information than pg_get_wal_records_info and > is a good way of getting stats than adding more columns to > pg_get_wal_records_info, calculating percentage in sql and having > group by clause. IMO, pg_get_wal_stats is more readable and useful. > > > * Would be nice to have a pg_wal_file_is_valid() or similar, which > > would test that it exists, and the header matches the filename (e.g. if > > it was recycled but not used, that would count as invalid). I think > > pg_get_first_valid_wal_record_lsn() would make some cases look invalid > > even if the file is valid -- for example, if a wal record spans many > > wal segments, the segments might look invalid because they contain no > > complete records, but the file itself is still valid and contains valid > > wal data. > > Actually I haven't tried testing a single WAL record spanning many WAL > files yet(I'm happy to try it if someone suggests such a use-case). In > that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't > have a problem because it just gives the next valid LSN and it's > previous LSN using existing WAL reader API XLogFindNextRecord(). It > opens up the WAL file segments using (some dots to connect - > page_read/read_local_xlog_page, WALRead, > segment_open/wal_segment_open). Thoughts? > > I don't think it's necessary to have a function pg_wal_file_is_valid() > that given a WAL file name as input checks whether a WAL file exists > or not, probably not in the core (xlogfuncs.c) too. These kinds of > functions can open up challenges in terms of user input validation and > may cause unnecessary problems, please see some related discussion > [2]. > > > * Is there a reason you didn't include the timeline ID in > > pg_get_wal_records_info()? > > I'm right now allowing the functions to read WAL from the current > server's timeline which I have mentioned in the docs. The server's > current timeline is available via pg_control_checkpoint()'s > timeline_id. So, having timeline_id as a column doesn't make sense. > Again this is to keep things simple-yet-useful-and-effective. However, > we can add new pg_walinspect functions to read WAL from historic as > well as current timelines in the next versions once basic stuff gets > committed and if many users ask for it. > > + <para> > + All the functions of this module will provide the WAL information using the > + current server's timeline ID. > + </para> > > > * Can we mark this extension 'trusted'? I'm not 100% clear on the > > standards for that marker, but it seems reasonable for a database owner > > with the right privileges might want to install it. > > 'trusted' extensions concept is added by commit 50fc694 [3]. Since > pg_walinspect deals with WAL, we strictly want to control who creates > and can execute functions exposed by it, so I don't know if 'trusted' > is a good idea here. Also, pageinspect isn't a 'trusted' extension. > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > function should require pg_read_server_files? Or at least > > pg_read_all_data? > > pg_read_all_data may not be the right choice, but pg_read_server_files > is. However, does it sound good if some functions are allowed to be > executed by users with a pg_monitor role and others > pg_get_raw_wal_record by users with pg_read_server_files? Since the > extension itself can be created by superusers, isn't the > pg_get_raw_wal_record sort of safe with pg_mointor itself? > > If hackers don't agree, I'm happy to grant execution on > pg_get_raw_wal_record() to the pg_read_server_files role. > > Attaching the v8 patch-set resolving above comments and some tests for > checking function permissions. Please review it further. > > [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com > [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com > [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Wed Jan 29 18:42:43 2020 -0500 > > Invent "trusted" extensions, and remove the pg_pltemplate catalog. I played with this a bit, and would like to share some thoughts on it. It seems to me too rigorous that pg_get_wal_records_info/stats() reject future LSNs as end-LSN and I think WARNING or INFO and stop at the real end-of-WAL is more kind to users. I think the same with the restriction that start and end LSN are required to be different. The definition of end-lsn is fuzzy here. If I fed a future LSN to the functions, they tell me the beginning of the current insertion point in error message. On the other hand they don't accept the same value as end-LSN. I think it is right that they tell the current insertion point and they should take the end-LSN as the LSN to stop reading. I think pg_get_wal_stats() is worth to have but I think it should be implemented in SQL. Currently pg_get_wal_records_info() doesn't tell about FPI since pg_waldump doesn't but it is internally collected (of course!) and easily revealed. If we do that, the pg_get_wal_records_stats() would be reduced to the following SQL statement SELECT resource_manager resmgr, count(*) AS N, (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N", sum(total_length) AS "combined size", (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size", sum(fpi_len) AS fpilen, (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen" FROM pg_get_wal_records_info('0/1000000', '0/175DD7f') GROUP by resource_manager WINDOW tot AS () ORDER BY "combined size" desc; The only difference with pg_waldump is the statement above doesn't show lines for the resource managers that don't contained in the result of pg_get_wal_records_info(). But I don't think that matters. Sometimes the field description has very long (28kb long) content. It makes the result output almost unreadable and I had a bit hard time struggling with the output full of '-'s. I would like have a default limit on the length of such fields that can be long but I'm not sure we want that. The difference between pg_get_wal_record_info and _records_ other than the number of argument is the former accepts incorrect LSNs. The following works, pg_get_wal_record_info('0/1000000'); pg_get_wal_records_info('0/1000000'); but this doesn't pg_get_wal_records_info('0/1000000', '0/1000000'); > ERROR: WAL start LSN must be less than end LSN But the following works pg_get_wal_records_info('0/1000000', '0/1000029'); > 0/1000028 | 0/0 | 0 So I think we can consolidate the two functions as: - pg_get_wal_records_info('0/1000000'); (current behavior) find the first record and show all records thereafter. - pg_get_wal_records_info('0/1000000', '0/1000000'); finds the first record since the start lsn and show it. - pg_get_wal_records_info('0/1000000', '0/1000030'); finds the first record since the start lsn then show records up to the end-lsn. And about pg_get_raw_wal_record(). I don't see any use-case of the function alone on SQL interface. Even if we need to inspect broken WAL files, it needs profound knowledge of WAL format and tools that doesn't work on SQL interface. However like pageinspect, if we separate the WAL-record fetching and parsing it could be thought as useful. pg_get_wal_records_info woule be like: SELECT * FROM pg_walinspect_parse(raw) FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)); And pg_get_wal_stats woule be like: SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw)) FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn))); Regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, some minor non-syntactical corrections. At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > I played with this a bit, and would like to share some thoughts on it. > > It seems to me too rigorous that pg_get_wal_records_info/stats() > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > the real end-of-WAL is more kind to users. I think the same with the > restriction that start and end LSN are required to be different. > > The definition of end-lsn is fuzzy here. If I fed a future LSN to the > functions, they tell me the beginning of the current insertion point > in error message. On the other hand they don't accept the same > value as end-LSN. I think it is right that they tell the current > insertion point and they should take the end-LSN as the LSN to stop > reading. > > I think pg_get_wal_stats() is worth to have but I think it should be > implemented in SQL. Currently pg_get_wal_records_info() doesn't tell > about FPI since pg_waldump doesn't but it is internally collected (of > course!) and easily revealed. If we do that, the > pg_get_wal_records_stats() would be reduced to the following SQL > statement > > SELECT resource_manager resmgr, > count(*) AS N, > (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N", > sum(total_length) AS "combined size", > (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size", > sum(fpi_len) AS fpilen, > (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen" > FROM pg_get_wal_records_info('0/1000000', '0/175DD7f') > GROUP by resource_manager > WINDOW tot AS () > ORDER BY "combined size" desc; > > The only difference with pg_waldump is the statement above doesn't > show lines for the resource managers that don't contained in the > result of pg_get_wal_records_info(). But I don't think that matters. > > > Sometimes the field description has very long (28kb long) content. It > makes the result output almost unreadable and I had a bit hard time > struggling with the output full of '-'s. I would like have a default > limit on the length of such fields that can be long but I'm not sure > we want that. > > - The difference between pg_get_wal_record_info and _records_ other than - the number of argument is the former accepts incorrect LSNs. The discussion is somewhat confused after some twists and turns.. It should be something like the following. pg_get_wal_record_info and pg_get_wal_records_info are almost same since the latter can show a single record. However it is a bit annoying to do that. Since, other than it doens't accept same LSNs for start and end, it doesn't show a record when there' no record in the specfied LSN range. But I don't think there's no usefulness of the behavior. The following works, pg_get_wal_record_info('0/1000000'); pg_get_wal_records_info('0/1000000'); but this doesn't pg_get_wal_records_info('0/1000000', '0/1000000'); > ERROR: WAL start LSN must be less than end LSN And the following shows no records. pg_get_wal_records_info('0/1000000', '0/1000001'); pg_get_wal_records_info('0/1000000', '0/1000028'); But the following works pg_get_wal_records_info('0/1000000', '0/1000029'); > 0/1000028 | 0/0 | 0 > So I think we can consolidate the two functions as: > > - pg_get_wal_records_info('0/1000000'); > > (current behavior) find the first record and show all records > thereafter. > > - pg_get_wal_records_info('0/1000000', '0/1000000'); > > finds the first record since the start lsn and show it. > > - pg_get_wal_records_info('0/1000000', '0/1000030'); > > finds the first record since the start lsn then show records up to > the end-lsn. > > > And about pg_get_raw_wal_record(). I don't see any use-case of the > function alone on SQL interface. Even if we need to inspect broken > WAL files, it needs profound knowledge of WAL format and tools that > doesn't work on SQL interface. > > However like pageinspect, if we separate the WAL-record fetching and > parsing it could be thought as useful. > > pg_get_wal_records_info woule be like: > > SELECT * FROM pg_walinspect_parse(raw) > FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)); > > And pg_get_wal_stats woule be like: > > SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw)) > FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn))); Regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Sorry, some minor non-syntactical corrections. > > At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > I played with this a bit, and would like to share some thoughts on it. > > > > It seems to me too rigorous that pg_get_wal_records_info/stats() > > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > > the real end-of-WAL is more kind to users. I think the same with the > > restriction that start and end LSN are required to be different. > > > > The definition of end-lsn is fuzzy here. If I fed a future LSN to the > > functions, they tell me the beginning of the current insertion point > > in error message. On the other hand they don't accept the same > > value as end-LSN. I think it is right that they tell the current > > insertion point and they should take the end-LSN as the LSN to stop > > reading. > > > > I think pg_get_wal_stats() is worth to have but I think it should be > > implemented in SQL. Currently pg_get_wal_records_info() doesn't tell > > about FPI since pg_waldump doesn't but it is internally collected (of > > course!) and easily revealed. If we do that, the > > pg_get_wal_records_stats() would be reduced to the following SQL > > statement > > > > SELECT resource_manager resmgr, > > count(*) AS N, > > (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N", > > sum(total_length) AS "combined size", > > (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size", > > sum(fpi_len) AS fpilen, > > (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen" > > FROM pg_get_wal_records_info('0/1000000', '0/175DD7f') > > GROUP by resource_manager > > WINDOW tot AS () > > ORDER BY "combined size" desc; > > > > The only difference with pg_waldump is the statement above doesn't > > show lines for the resource managers that don't contained in the > > result of pg_get_wal_records_info(). But I don't think that matters. > > > > > > Sometimes the field description has very long (28kb long) content. It > > makes the result output almost unreadable and I had a bit hard time > > struggling with the output full of '-'s. I would like have a default > > limit on the length of such fields that can be long but I'm not sure > > we want that. > > > > > - The difference between pg_get_wal_record_info and _records_ other than > - the number of argument is the former accepts incorrect LSNs. > > The discussion is somewhat confused after some twists and turns.. It > should be something like the following. > > pg_get_wal_record_info and pg_get_wal_records_info are almost same > since the latter can show a single record. However it is a bit > annoying to do that. Since, other than it doens't accept same LSNs for > start and end, it doesn't show a record when there' no record in the > specfied LSN range. But I don't think there's no usefulness of the > behavior. > > The following works, > pg_get_wal_record_info('0/1000000'); This does work but it doesn't show any WARNING message for the start pointer adjustment. I think it should. > pg_get_wal_records_info('0/1000000'); > I think this is fine. It should be working because the user hasn't specified the end pointer so we assume the default end pointer is end-of-WAL. > but this doesn't > pg_get_wal_records_info('0/1000000', '0/1000000'); > > ERROR: WAL start LSN must be less than end LSN > I think this behaviour is fine. We cannot have the same start and end lsn pointers. > And the following shows no records. > pg_get_wal_records_info('0/1000000', '0/1000001'); > pg_get_wal_records_info('0/1000000', '0/1000028'); > I think we should be erroring out here saying - couldn't find any valid WAL record between given start and end lsn because there exists no valid wal records between the specified start and end lsn pointers. -- With Regards, Ashutosh Sharma.
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > - The difference between pg_get_wal_record_info and _records_ other than > - the number of argument is the former accepts incorrect LSNs. > > The discussion is somewhat confused after some twists and turns.. It > should be something like the following. > > pg_get_wal_record_info and pg_get_wal_records_info are almost same > since the latter can show a single record. However it is a bit > annoying to do that. Since, other than it doens't accept same LSNs for > start and end, it doesn't show a record when there' no record in the > specfied LSN range. But I don't think there's no usefulness of the > behavior. > So, do you want the pg_get_wal_record_info function to be removed as we can use pg_get_wal_records_info() to achieve what it does? -- With Regards, Ashutosh Sharma.
On Fri, Mar 11, 2022 at 8:08 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > Attaching the v8 patch-set resolving above comments and some tests for > > checking function permissions. Please review it further. > > I played with this a bit, and would like to share some thoughts on it. Thanks a lot Kyotaro-san for reviewing. > It seems to me too rigorous that pg_get_wal_records_info/stats() > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > the real end-of-WAL is more kind to users. I think the same with the > restriction that start and end LSN are required to be different. Throwing error on future LSNs is the same behaviour for all of the pg_walinspect function input LSNs. IMO it is a cleaner thing to do rather than confuse the users with different behaviours for each function. The principle is this - pg_walinspect functions can't show future WAL info. Having said that, I agree to make it a WARNING instead of ERROR, for the simple reason that ERROR aborts the txn and the applications can retry without aborting the txn. For instance, pg_terminate_backend emits a WARNING if the PID isn't a postgres process id. PS: WARNING may not be a better idea than ERROR if we turn pg_get_wal_stats a SQL function, see my response below. > The definition of end-lsn is fuzzy here. If I fed a future LSN to the > functions, they tell me the beginning of the current insertion point > in error message. On the other hand they don't accept the same > value as end-LSN. I think it is right that they tell the current > insertion point and they should take the end-LSN as the LSN to stop > reading. The future LSN is determined by this: if (!RecoveryInProgress()) available_lsn = GetFlushRecPtr(NULL); else available_lsn = GetXLogReplayRecPtr(NULL); GetFlushRecPtr returns last byte + 1 flushed meaning this is the end LSN currently known in the server, but it is not the start LSN of the last WAL record in the server. Same goes with GetXLogReplayRecPtr which gives lastReplayedEndRecPtr end+1 position. I picked GetFlushRecPtr and GetXLogReplayRecPtr to determine the future WAL LSN because this is how read_local_xlog_page determines to read WAL upto and goes for wait mode, but I wanted to avoid the wait mode completely for all the pg_walinspect functions (to keep things simple for now), hence doing the similar checks within the input validation code and emitting warning. And you are right when we emit something like below, users tend to use 0/15B6D68 (from the DETAIL message) as the end LSN. I don't want to ignore this DETAIL message altogether as it gives an idea where the server is. How about rephrasing the DETAIL message a bit, something like "Database system flushed the WAL up to WAL LSN %X/% X.'' or some other better phrasing? WARNING: WAL start LSN cannot be a future WAL LSN DETAIL: Last known WAL LSN on the database system is 0/15B6D68. If the users aren't sure about what's the end record LSN, they can just use pg_get_wal_records_info and pg_get_wal_stats without end LSN: select * from pg_get_wal_records_info('0/15B6D68'); select * from pg_get_wal_stats('0/15B6D68'); > I think pg_get_wal_stats() is worth to have but I think it should be > implemented in SQL. Currently pg_get_wal_records_info() doesn't tell > about FPI since pg_waldump doesn't but it is internally collected (of > course!) and easily revealed. If we do that, the > pg_get_wal_records_stats() would be reduced to the following SQL > statement > > SELECT resource_manager resmgr, > count(*) AS N, > (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N", > sum(total_length) AS "combined size", > (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size", > sum(fpi_len) AS fpilen, > (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen" > FROM pg_get_wal_records_info('0/1000000', '0/175DD7f') > GROUP by resource_manager > WINDOW tot AS () > ORDER BY "combined size" desc; > > The only difference with pg_waldump is the statement above doesn't > show lines for the resource managers that don't contained in the > result of pg_get_wal_records_info(). But I don't think that matters. Yeah, this is better. One problem with the above is when pg_get_wal_records_info emits a warning for future LSN. But this shouldn't stop us doing it via SQL. Instead I would let all the pg_walinspect functions emit errors as opposed to WARNING. Thoughts? postgres=# SELECT resource_manager, count(*) AS count, (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS count_percentage, sum(total_length) AS combined_size, (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS combined_size_percentage FROM pg_get_wal_records_info('0/10A3E50', '0/25B6F00') GROUP BY resource_manager WINDOW tot AS () ORDER BY combined_size desc; WARNING: WAL end LSN cannot be a future WAL LSN DETAIL: Last known WAL LSN on the database system is 0/15CAA70. resource_manager | count | count_percentage | combined_size | combined_size_percentage ------------------+-------+------------------+---------------+-------------------------- | 1 | 100.00 | | (1 row) > Sometimes the field description has very long (28kb long) content. It > makes the result output almost unreadable and I had a bit hard time > struggling with the output full of '-'s. I would like have a default > limit on the length of such fields that can be long but I'm not sure > we want that. Yeah, it's a text column, let's leave it as-is, if required users can always ignore the description columns. > And about pg_get_raw_wal_record(). I don't see any use-case of the > function alone on SQL interface. Even if we need to inspect broken > WAL files, it needs profound knowledge of WAL format and tools that > doesn't work on SQL interface. >However like pageinspect, if we separate the WAL-record fetching and > parsing it could be thought as useful. > SELECT * FROM pg_walinspect_parse(raw) > FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)); > > And pg_get_wal_stats woule be like: > > SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw)) > FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn))); Imagine pg_get_raw_wal_record function feeding raw WAL record to an external tool/extension that understands the WAL. Apart from this, I don't have a concrete reason either. I'm open to removing this function as well and adding it along with the raw WAL parsing function in future. I haven't thought about the raw WAL parsing functions for now. In fact, there are many functions we can add to pg_walinspect - functions with wait mode for future WAL, WAL parsing, function to return all the WAL record info/stats given a WAL file name, functions to return WAL info/stats from historic timelines as well, function to see if the given WAL file is valid and so on. We can park these functions for future versions of pg_walinspect once the extension itself with basic yet-useful-and-effective functions gets in. I will make a note of these functions and will work in future based on how pg_walinspect gets received by the users and community out there. Regards, Bharath Rupireddy.
Some comments on pg_walinspect-docc.patch this time: + <varlistentry> + <term> + <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function> + </term> You may shorten this by mentioning just the function input parameters and specify "returns record" like shown below. So no need to specify all the OUT params. pg_get_wal_record_info(in_lsn pg_lsn) returns record. Please check the documentation for other functions for reference. == + <term> + <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function> + </term> Same comment applies here as well. In the return type you can just mention - "returns setof record" like shown below: pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records. You may also check for such optimizations at other places. I might have missed some. == +<screen> +postgres=# select prev_lsn, xid, resource_manager, length, total_length, block_ref from pg_get_wal_records_info('0/158A7F0', '0/1591400'); + prev_lsn | xid | resource_manager | length | total_length | block_ref +-----------+-----+------------------+--------+--------------+-------------------------------------------------------------------------- + 0/158A7B8 | 735 | Heap | 54 | 7838 | blkref #0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408 + 0/158A7F0 | 735 | Btree | 53 | 8133 | blkref #0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112 + 0/158C6A8 | 735 | Heap | 53 | 873 | blkref #0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372 Instead of specifying column names in the targetlist I think it's better to use "*" so that it will display all the output columns. Also you may shorten the gap between start and end lsn to reduce the output size. == Any reason for not specifying author name in the .sgml file. Do you want me to add my name to the author? :) <para> Ashutosh Sharma <email>ashu.coek88@gmail.com</email> </para> </sect2> -- With Regards, Ashutosh Sharma. On Thu, Mar 10, 2022 at 10:15 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote: > > > > > > Attaching v6 patch set with above review comments addressed. Please > > > review it further. > > Thanks Jeff for reviewing it. I've posted the latest v7 patch-set > upthread [1] which is having more simple-yet-useful-and-effective > functions. > > > * Don't issue WARNINGs or other messages for ordinary situations, like > > when pg_get_wal_records_info() hits the end of WAL. > > v7 patch-set [1] has no warnings, but the functions will error out if > future LSN is specified. > > > * It feels like the APIs that allow waiting for the end of WAL are > > slightly off. Can't you just do pg_get_wal_records_info(start_lsn, > > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting > > behavior? Try to make the API more orthogonal, where a few basic > > functions can be combined to give you everything you need, rather than > > specifying extra parameters and issuing WARNINGs. I > > v7 patch-set [1] onwards waiting mode has been removed for all of the > functions, again to keep things simple-yet-useful-and-effective. > However, we can always add new pg_walinspect functions that wait for > future WAL in the next versions once basic stuff gets committed and if > many users ask for it. > > > * In the docs, include some example output. I don't see any output in > > the tests, which makes sense because it's mostly non-deterministic, but > > it would be helpful to see sample output of at least > > pg_get_wal_records_info(). > > +1. Added for pg_get_wal_records_info and pg_get_wal_stats. > > > * Is pg_get_wal_stats() even necessary, or can you get the same > > information with a query over pg_get_wal_records_info()? For instance, > > if you want to group by transaction ID rather than rmgr, then > > pg_get_wal_stats() is useless. > > Yes, you are right pg_get_wal_stats provides WAL stats per resource > manager which is similar to pg_waldump with --start, --end and --stats > option. It provides more information than pg_get_wal_records_info and > is a good way of getting stats than adding more columns to > pg_get_wal_records_info, calculating percentage in sql and having > group by clause. IMO, pg_get_wal_stats is more readable and useful. > > > * Would be nice to have a pg_wal_file_is_valid() or similar, which > > would test that it exists, and the header matches the filename (e.g. if > > it was recycled but not used, that would count as invalid). I think > > pg_get_first_valid_wal_record_lsn() would make some cases look invalid > > even if the file is valid -- for example, if a wal record spans many > > wal segments, the segments might look invalid because they contain no > > complete records, but the file itself is still valid and contains valid > > wal data. > > Actually I haven't tried testing a single WAL record spanning many WAL > files yet(I'm happy to try it if someone suggests such a use-case). In > that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't > have a problem because it just gives the next valid LSN and it's > previous LSN using existing WAL reader API XLogFindNextRecord(). It > opens up the WAL file segments using (some dots to connect - > page_read/read_local_xlog_page, WALRead, > segment_open/wal_segment_open). Thoughts? > > I don't think it's necessary to have a function pg_wal_file_is_valid() > that given a WAL file name as input checks whether a WAL file exists > or not, probably not in the core (xlogfuncs.c) too. These kinds of > functions can open up challenges in terms of user input validation and > may cause unnecessary problems, please see some related discussion > [2]. > > > * Is there a reason you didn't include the timeline ID in > > pg_get_wal_records_info()? > > I'm right now allowing the functions to read WAL from the current > server's timeline which I have mentioned in the docs. The server's > current timeline is available via pg_control_checkpoint()'s > timeline_id. So, having timeline_id as a column doesn't make sense. > Again this is to keep things simple-yet-useful-and-effective. However, > we can add new pg_walinspect functions to read WAL from historic as > well as current timelines in the next versions once basic stuff gets > committed and if many users ask for it. > > + <para> > + All the functions of this module will provide the WAL information using the > + current server's timeline ID. > + </para> > > > * Can we mark this extension 'trusted'? I'm not 100% clear on the > > standards for that marker, but it seems reasonable for a database owner > > with the right privileges might want to install it. > > 'trusted' extensions concept is added by commit 50fc694 [3]. Since > pg_walinspect deals with WAL, we strictly want to control who creates > and can execute functions exposed by it, so I don't know if 'trusted' > is a good idea here. Also, pageinspect isn't a 'trusted' extension. > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > function should require pg_read_server_files? Or at least > > pg_read_all_data? > > pg_read_all_data may not be the right choice, but pg_read_server_files > is. However, does it sound good if some functions are allowed to be > executed by users with a pg_monitor role and others > pg_get_raw_wal_record by users with pg_read_server_files? Since the > extension itself can be created by superusers, isn't the > pg_get_raw_wal_record sort of safe with pg_mointor itself? > > If hackers don't agree, I'm happy to grant execution on > pg_get_raw_wal_record() to the pg_read_server_files role. > > Attaching the v8 patch-set resolving above comments and some tests for > checking function permissions. Please review it further. > > [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com > [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com > [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Wed Jan 29 18:42:43 2020 -0500 > > Invent "trusted" extensions, and remove the pg_pltemplate catalog. > > Regards, > Bharath Rupireddy.
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > - The difference between pg_get_wal_record_info and _records_ other than > - the number of argument is the former accepts incorrect LSNs. > > The discussion is somewhat confused after some twists and turns.. It > should be something like the following. > > pg_get_wal_record_info and pg_get_wal_records_info are almost same > since the latter can show a single record. However it is a bit > annoying to do that. Since, other than it doens't accept same LSNs for > start and end, it doesn't show a record when there' no record in the > specfied LSN range. But I don't think there's no usefulness of the > behavior. I would like to reassert the usability of pg_get_wal_record_info and pg_get_wal_records_info: pg_get_wal_record_info(lsn): if lsn is invalid i.e. '0/0' - throws an error if lsn is future lsn - throws an error if lsn looks okay, it figures out the next available valid WAL record and returns info about that pg_get_wal_records_info(start_lsn, end_lsn default null) -> if start and end lsns are provided no end_lsn would give the WAL records info till the end of WAL, if start_lsn is invalid i.e. '0/0' - throws an error if start_lsn is future lsn - throws an error if end_lsn isn't provided by the user - calculates the end_lsn as server's current flush lsn if end_lsn is provided by the user - throws an error if it's future LSN if start_lsn and end_lsn look okay, it returns info about all WAL records from the next available valid WAL record of start_lsn until end_lsn So, both pg_get_wal_record_info and pg_get_wal_records_info are necessary IMHO. Coming to the behaviour when input lsn is '0/1000000', it's an issue with XLogSegmentOffset(lsn, wal_segment_size) != 0 check, which I will fix in the next version. if (*first_record != lsn && XLogSegmentOffset(lsn, wal_segment_size) != 0) ereport(WARNING, (errmsg_plural("first record is after %X/%X, at %X/%X, skipping over %u byte", "first record is after %X/%X, at %X/%X, skipping over %u bytes", (*first_record - lsn), LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(*first_record), (uint32) (*first_record - lsn)))); Regards, Bharath Rupireddy.
On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > It seems to me too rigorous that pg_get_wal_records_info/stats() > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > the real end-of-WAL is more kind to users. I think the same with the > restriction that start and end LSN are required to be different. In his review just yesterday, Jeff suggested this: "Don't issue WARNINGs or other messages for ordinary situations, like when pg_get_wal_records_info() hits the end of WAL." I think he's entirely right, and I don't think any patch that does otherwise should get committed. It is worth remembering that the results of queries are often examined by something other than a human being sitting at a psql terminal. Any tool that uses this is going to want to understand what happened from the result set, not by parsing strings that may show up inside warning messages. I think that the right answer here is to have a function that returns one row per record parsed, and each row should also include the start and end LSN of the record. If for some reason the WAL records return start after the specified start LSN (e.g. because we skip over a page header) or end before the specified end LSN (e.g. because we reach end-of-WAL) the user can figure it out from looking at the LSNs in the output rows and comparing them to the LSNs provided as input. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote: > The standard is basically that all of the functions it brings are > written to enforce the PG privilege system and you aren't able to use > the extension to bypass those privileges. In some cases that means > that Every extension should follow that standard, right? If it doesn't (e.g. creating dangerous functions and granting them to public), then even superuser should not install it. > the C-language functions installed have if(!superuser) ereport() > calls I'm curious why not rely on the grant system where possible? I thought we were trying to get away from explicit superuser checks. > I've not looked back on this thread, but I'd expect pg_walinspect to > need those superuser checks and with those it *could* be marked as > trusted, but that again brings into question how useful it is to mark > it > thusly. As long as any functions are safely accessible to public or a predefined role, there is some utility for the 'trusted' marker. As this patch is currently written, pg_monitor has access these functions, though I don't think that's the right privilege level at least for pg_get_raw_wal_record(). > I certainly don't think we should allow either database owners or > regular users on a system the ability to access the WAL traffic of > the > entire system. Agreed. That was not what I intended by asking if it should be marked 'trusted'. The marker only allows the non-superuser to run the CREATE EXTENSION command; it's up to the extension script to decide whether any non-superusers can do anything at all with the extension. > More forcefully- we should *not* be throwing more access > rights towards $owners in general and should be thinking about how we > can allow admins, providers, whomever, the ability to control what > rights users are given. If they're all lumped under 'owner' then > there's no way for people to provide granular access to just those > things they wish and intend to. Not sure I understand, but that sounds like a larger discussion. Regards, Jeff Davis
On Fri, Mar 11, 2022 at 7:53 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Some comments on pg_walinspect-docc.patch this time: > > + <varlistentry> > + <term> > + <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn, > prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length > OUT int4, total_length OUT int4, description OUT text, block_ref OUT > text, data OUT bytea, data_len OUT int4)</function> > + </term> > > You may shorten this by mentioning just the function input parameters > and specify "returns record" like shown below. So no need to specify > all the OUT params. > > pg_get_wal_record_info(in_lsn pg_lsn) returns record. > > Please check the documentation for other functions for reference. > > == > > + <term> > + <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn > pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, > resource_manager OUT text, length OUT int4, total_length OUT int4, > description OUT text, block_ref OUT text, data OUT bytea, data_len OUT > int4)</function> > + </term> > > Same comment applies here as well. In the return type you can just > mention - "returns setof record" like shown below: > > pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records. > > You may also check for such optimizations at other places. I might > have missed some. I used the way verify_heapam shows the columns as it looks good IMO and we can't show sample outputs for all of the functions in the documentation. > == > > +<screen> > +postgres=# select prev_lsn, xid, resource_manager, length, > total_length, block_ref from pg_get_wal_records_info('0/158A7F0', > '0/1591400'); > + prev_lsn | xid | resource_manager | length | total_length | > block_ref > +-----------+-----+------------------+--------+--------------+-------------------------------------------------------------------------- > + 0/158A7B8 | 735 | Heap | 54 | 7838 | blkref > #0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408 > + 0/158A7F0 | 735 | Btree | 53 | 8133 | blkref > #0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112 > + 0/158C6A8 | 735 | Heap | 53 | 873 | blkref > #0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372 > > Instead of specifying column names in the targetlist I think it's > better to use "*" so that it will display all the output columns. Also > you may shorten the gap between start and end lsn to reduce the output > size. All columns are giving huge output, especially because of data and description columns hence I'm not showing them in the sample output. > == > > Any reason for not specifying author name in the .sgml file. Do you > want me to add my name to the author? :) Haha. Thanks. I will add in the v9 patch set which I will post in a while. Regards, Bharath Rupireddy.
On Sat, Mar 12, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > It seems to me too rigorous that pg_get_wal_records_info/stats() > > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > > the real end-of-WAL is more kind to users. I think the same with the > > restriction that start and end LSN are required to be different. > > In his review just yesterday, Jeff suggested this: "Don't issue > WARNINGs or other messages for ordinary situations, like when > pg_get_wal_records_info() hits the end of WAL." I think he's entirely > right, and I don't think any patch that does otherwise should get > committed. It is worth remembering that the results of queries are > often examined by something other than a human being sitting at a psql > terminal. Any tool that uses this is going to want to understand what > happened from the result set, not by parsing strings that may show up > inside warning messages. > > I think that the right answer here is to have a function that returns > one row per record parsed, and each row should also include the start > and end LSN of the record. If for some reason the WAL records return > start after the specified start LSN (e.g. because we skip over a page > header) or end before the specified end LSN (e.g. because we reach > end-of-WAL) the user can figure it out from looking at the LSNs in the > output rows and comparing them to the LSNs provided as input. Thanks Robert. I've removed the WARNING part and added end_lsn as suggested. Thanks Kyotaro-san, Ashutosh and Jeff for your review. I tried to address your review comments, if not all, but many. Here's v9 patch-set please review it further. Regards, Bharath Rupireddy.
Вложения
At Fri, 11 Mar 2022 15:39:13 -0500, Robert Haas <robertmhaas@gmail.com> wrote in > On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > It seems to me too rigorous that pg_get_wal_records_info/stats() > > reject future LSNs as end-LSN and I think WARNING or INFO and stop at > > the real end-of-WAL is more kind to users. I think the same with the > > restriction that start and end LSN are required to be different. > > In his review just yesterday, Jeff suggested this: "Don't issue > WARNINGs or other messages for ordinary situations, like when > pg_get_wal_records_info() hits the end of WAL." I think he's entirely > right, and I don't think any patch that does otherwise should get It depends on what we think is the "ordinary" here. If we don't expect that specified LSN range is not filled-out, the case above is ordinary and no need for any WARNING nor INFO. I'm fine with that definition here. > committed. It is worth remembering that the results of queries are > often examined by something other than a human being sitting at a psql > terminal. Any tool that uses this is going to want to understand what > happened from the result set, not by parsing strings that may show up > inside warning messages. Right. I don't think it is right that WARNING is required to evaluate the result. And I think that the WARNING like 'reached end-of-wal before end LSN' is a kind that is not required in evaluation of the result. Since each WAL row contains at least start LSN. > I think that the right answer here is to have a function that returns > one row per record parsed, and each row should also include the start > and end LSN of the record. If for some reason the WAL records return > start after the specified start LSN (e.g. because we skip over a page > header) or end before the specified end LSN (e.g. because we reach > end-of-WAL) the user can figure it out from looking at the LSNs in the > output rows and comparing them to the LSNs provided as input. I agree with you here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Greetings, * Jeff Davis (pgsql@j-davis.com) wrote: > On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote: > > The standard is basically that all of the functions it brings are > > written to enforce the PG privilege system and you aren't able to use > > the extension to bypass those privileges. In some cases that means > > that > > Every extension should follow that standard, right? If it doesn't (e.g. > creating dangerous functions and granting them to public), then even > superuser should not install it. Every extension that's intended to be installed on a multi-user PG system where the admin cares about security in the database, sure. I disagree that this applies universally to every system or every extension. Those are standards that modules we distribute in contrib should meet but I don't know that we necessarily have to have them for, say, modules in test. > > the C-language functions installed have if(!superuser) ereport() > > calls > > I'm curious why not rely on the grant system where possible? I thought > we were trying to get away from explicit superuser checks. We don't yet have capabilities for everything. I agree that we should be getting away from explicit superuser checks and explained below how we might be able to in this case. > > I've not looked back on this thread, but I'd expect pg_walinspect to > > need those superuser checks and with those it *could* be marked as > > trusted, but that again brings into question how useful it is to mark > > it > > thusly. > > As long as any functions are safely accessible to public or a > predefined role, there is some utility for the 'trusted' marker. I'm not sure that I agree, though I'm also not sure that it's a useful thing to debate. Still, if all of the functions in a particular extension have explicit if (!superuser) ereport() checks in them, then while installing it is 'safe' and it could be marked as 'trusted' there's very little point in doing so as the only person who can get anything useful from those functions is a superuser, and a superuser can install non-trusted extensions anyway. How is it useful to mark such an extension as 'trusted'? > As this patch is currently written, pg_monitor has access these > functions, though I don't think that's the right privilege level at > least for pg_get_raw_wal_record(). Yeah, I agree that pg_monitor isn't the right thing for such a function to be checking. > > I certainly don't think we should allow either database owners or > > regular users on a system the ability to access the WAL traffic of > > the > > entire system. > > Agreed. That was not what I intended by asking if it should be marked > 'trusted'. The marker only allows the non-superuser to run the CREATE > EXTENSION command; it's up to the extension script to decide whether > any non-superusers can do anything at all with the extension. Sure. > > More forcefully- we should *not* be throwing more access > > rights towards $owners in general and should be thinking about how we > > can allow admins, providers, whomever, the ability to control what > > rights users are given. If they're all lumped under 'owner' then > > there's no way for people to provide granular access to just those > > things they wish and intend to. > > Not sure I understand, but that sounds like a larger discussion. The point I was trying to make is that it's better to move in the direction of things like pg_read_all_data rather than just declaring that the owner of a database can read all of the tables in that database, as an example. Maybe we want to implicitly have such privilege for the owner of the database too, but we should first make it something that's able to be GRANT'd out to non-owners so that it's not required that all of those privileges be given out together at once. Note that to be considered an 'owner' of an object you have to be a member of the role that owns the object, which means that said role is necessarily able to also become the owning role too. Lumping lots of privileges together- all the rights that being an 'owner' of the object conveys, plus the ability to also become that role directly and do things as that role, works actively against the general idea of 'least privilege'. Thanks, Stephen
Вложения
On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote: > > > As this patch is currently written, pg_monitor has access these > > functions, though I don't think that's the right privilege level at > > least for pg_get_raw_wal_record(). > > Yeah, I agree that pg_monitor isn't the right thing for such a function > to be checking. On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > function should require pg_read_server_files? Or at least > pg_read_all_data? The v9 patch set posted at [1] grants execution on pg_get_raw_wal_record() to the pg_monitor role. pg_read_all_data may not be the right choice, but pg_read_server_files is as these functions do read the WAL files on the server. If okay, I'm happy to grant execution on pg_get_raw_wal_record() to the pg_read_server_files role. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com Regards, Bharath Rupireddy.
On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > > As this patch is currently written, pg_monitor has access these > > > functions, though I don't think that's the right privilege level at > > > least for pg_get_raw_wal_record(). > > > > Yeah, I agree that pg_monitor isn't the right thing for such a function > > to be checking. > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > function should require pg_read_server_files? Or at least > > pg_read_all_data? > > The v9 patch set posted at [1] grants execution on > pg_get_raw_wal_record() to the pg_monitor role. > > pg_read_all_data may not be the right choice, but pg_read_server_files > is as these functions do read the WAL files on the server. If okay, > I'm happy to grant execution on pg_get_raw_wal_record() to the > pg_read_server_files role. > > Thoughts? > > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com Attaching v10 patch set which allows pg_get_raw_wal_record to be executed by either superuser or users with pg_read_server_files role, no other change from v9 patch set. Please review it further. Regards, Bharath Rupireddy.
Вложения
Greetings, * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > > > > As this patch is currently written, pg_monitor has access these > > > > functions, though I don't think that's the right privilege level at > > > > least for pg_get_raw_wal_record(). > > > > > > Yeah, I agree that pg_monitor isn't the right thing for such a function > > > to be checking. > > > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > > function should require pg_read_server_files? Or at least > > > pg_read_all_data? > > > > The v9 patch set posted at [1] grants execution on > > pg_get_raw_wal_record() to the pg_monitor role. > > > > pg_read_all_data may not be the right choice, but pg_read_server_files > > is as these functions do read the WAL files on the server. If okay, > > I'm happy to grant execution on pg_get_raw_wal_record() to the > > pg_read_server_files role. > > > > Thoughts? > > > > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com > > Attaching v10 patch set which allows pg_get_raw_wal_record to be > executed by either superuser or users with pg_read_server_files role, > no other change from v9 patch set. In a quick look, that seems reasonable to me. If folks want to give out access to this function individually they're also able to do so, which is good. Doesn't seem worthwhile to introduce a new predefined role for this one function. Thanks, Stephen
Вложения
I can see that the pg_get_wal_records_info function shows the details of the WAL record whose existence is beyond the user specified stop/end lsn pointer. See below: ashu@postgres=# select * from pg_get_wal_records_info('0/01000028', '0/01000029'); -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ start_lsn | 0/1000028 end_lsn | 0/100009F prev_lsn | 0/0 xid | 0 resource_manager | XLOG record_length | 114 fpi_length | 0 description | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli 1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown block_ref | data_length | 88 data | \x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000 In this case, the end lsn pointer specified by the user is '0/01000029'. There is only one WAL record which starts before this specified end lsn pointer whose start pointer is at 01000028, but that WAL record ends at 0/100009F which is way beyond the specified end lsn. So, how come we are able to display the complete WAL record info? AFAIU, end lsn is the lsn pointer where you need to stop reading the WAL data. If that is true, then there exists no valid WAL record between the start and end lsn in this particular case. -- With Regards, Ashutosh Sharma. On Wed, Mar 16, 2022 at 7:56 PM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > > > > > > As this patch is currently written, pg_monitor has access these > > > > > functions, though I don't think that's the right privilege level at > > > > > least for pg_get_raw_wal_record(). > > > > > > > > Yeah, I agree that pg_monitor isn't the right thing for such a function > > > > to be checking. > > > > > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > > > > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > > > function should require pg_read_server_files? Or at least > > > > pg_read_all_data? > > > > > > The v9 patch set posted at [1] grants execution on > > > pg_get_raw_wal_record() to the pg_monitor role. > > > > > > pg_read_all_data may not be the right choice, but pg_read_server_files > > > is as these functions do read the WAL files on the server. If okay, > > > I'm happy to grant execution on pg_get_raw_wal_record() to the > > > pg_read_server_files role. > > > > > > Thoughts? > > > > > > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com > > > > Attaching v10 patch set which allows pg_get_raw_wal_record to be > > executed by either superuser or users with pg_read_server_files role, > > no other change from v9 patch set. > > In a quick look, that seems reasonable to me. If folks want to give out > access to this function individually they're also able to do so, which > is good. Doesn't seem worthwhile to introduce a new predefined role for > this one function. > > Thanks, > > Stephen
At Wed, 16 Mar 2022 20:49:12 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in > I can see that the pg_get_wal_records_info function shows the details > of the WAL record whose existence is beyond the user specified > stop/end lsn pointer. See below: > > ashu@postgres=# select * from pg_get_wal_records_info('0/01000028', > '0/01000029'); > -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > start_lsn | 0/1000028 > end_lsn | 0/100009F > prev_lsn | 0/0 ... > record_length | 114 ... > In this case, the end lsn pointer specified by the user is > '0/01000029'. There is only one WAL record which starts before this > specified end lsn pointer whose start pointer is at 01000028, but that > WAL record ends at 0/100009F which is way beyond the specified end > lsn. So, how come we are able to display the complete WAL record info? > AFAIU, end lsn is the lsn pointer where you need to stop reading the > WAL data. If that is true, then there exists no valid WAL record > between the start and end lsn in this particular case. You're right considering how pg_waldump behaves. pg_waldump works almost the way as you described above. The record above actually ends at 1000099 and pg_waldump shows that record by specifying -s 0/1000028 -e 0/100009a, but not for -e 0/1000099. # I personally think the current behavior is fine, though.. It still suggests unspecifiable end-LSN.. > select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060'); > ERROR: cannot accept future end LSN > DETAIL: Last known WAL LSN on the database system is 4/4C000060. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Mar 16, 2022 at 8:49 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > I can see that the pg_get_wal_records_info function shows the details > of the WAL record whose existence is beyond the user specified > stop/end lsn pointer. See below: > > ashu@postgres=# select * from pg_get_wal_records_info('0/01000028', > '0/01000029'); > -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > start_lsn | 0/1000028 > end_lsn | 0/100009F > prev_lsn | 0/0 > xid | 0 > resource_manager | XLOG > record_length | 114 > fpi_length | 0 > description | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli > 1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB > 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; > oldest running xid 0; shutdown > block_ref | > data_length | 88 > data | > \x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000 > > In this case, the end lsn pointer specified by the user is > '0/01000029'. There is only one WAL record which starts before this > specified end lsn pointer whose start pointer is at 01000028, but that > WAL record ends at 0/100009F which is way beyond the specified end > lsn. So, how come we are able to display the complete WAL record info? > AFAIU, end lsn is the lsn pointer where you need to stop reading the > WAL data. If that is true, then there exists no valid WAL record > between the start and end lsn in this particular case. Thanks Ashutosh, it's an edge case and I don't think we would want to show a WAL record that ends at LSN after the user specified end-lsn which doesn't look good. I fixed it in the v11 patch set. Now, the pg_get_wal_records_info will show records only upto user specified end_lsn, it doesn't show the last record which starts at LSN < end_lsn but ends at LSN > end_lsn, see [1]. Please review the v11 patch set further. [1] postgres=# select start_lsn, end_lsn, prev_lsn from pg_get_wal_records_info('0/01000028', '0/01000029'); start_lsn | end_lsn | prev_lsn -----------+---------+---------- (0 rows) postgres=# select start_lsn, end_lsn, prev_lsn from pg_get_wal_records_info('0/01000028', '0/100009F'); start_lsn | end_lsn | prev_lsn -----------+-----------+---------- 0/1000028 | 0/100009F | 0/0 (1 row) postgres=# select start_lsn, end_lsn, prev_lsn from pg_get_wal_records_info('0/01000028', '0/10000A0'); start_lsn | end_lsn | prev_lsn -----------+-----------+---------- 0/1000028 | 0/100009F | 0/0 (1 row) postgres=# select start_lsn, end_lsn, prev_lsn from pg_get_wal_records_info('0/01000028', '0/0100009E'); start_lsn | end_lsn | prev_lsn -----------+---------+---------- (0 rows) Regards, Bharath Rupireddy.
Вложения
On Thu, Mar 17, 2022 at 10:48 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > It still suggests unspecifiable end-LSN.. > > > select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060'); > > ERROR: cannot accept future end LSN > > DETAIL: Last known WAL LSN on the database system is 4/4C000060. Thanks Kyotaro-san. We can change the detail message to show (current flush lsn/last replayed lsn - 1), that's what I've done in v11 posted upthread at [1]. The problem is that all the pg_walinspect functions would wait for the first valid record in read_local_xlog_page() via InitXLogReaderState()->XLogFindNextRecord(), see[2]. We have two things to do: 1) Just document the behaviour "pg_walinspect functions will wait for the first valid WAL record if there is none found after the specified input LSN/start LSN.". This seems easier but some may see it as a problem. 2) Have read_local_xlog_page_2 which doesn't wait for future WAL LSN unlike read_local_xlog_page and like pg_waldump's WALDumpReadPage. It requires a new function read_local_xlog_page_2 that almost looks like read_local_xlog_page except wait (pg_usleep) loop, we can avoid code duplication by moving the read_local_xlog_page code to a static function read_local_xlog_page_guts(existing params, bool wait): read_local_xlog_page(params) read_local_xlog_page_guts(existing params, false); read_local_xlog_page_2(params) read_local_xlog_page_guts(existing params, true); read_local_xlog_page_guts: if (wait) wait for future wal; ---> existing pg_usleep code in read_local_xlog_page. else return; I'm fine either way, please let me know your thoughts on this? [1] https://www.postgresql.org/message-id/CALj2ACU8XjbYbMwh5x6hEUJdpRoG9%3DPO52_tuOSf1%3DMO7WtsmQ%40mail.gmail.com [2] postgres=# select pg_current_wal_flush_lsn(); pg_current_wal_flush_lsn -------------------------- 0/1624430 (1 row) postgres=# select * from pg_get_wal_record_info('0/1624430'); ERROR: cannot accept future input LSN DETAIL: Last known WAL LSN on the database system is 0/162442F. postgres=# select * from pg_get_wal_record_info('0/162442f'); ---> waits for the first valid record in read_local_xlog_page. Regards, Bharath Rupireddy.
Hi Bharath, Due to recent commits on master, the pg_walinpect module is not compiling. Kindly update the patch. pg_walinspect.c: In function ‘GetXLogRecordInfo’: pg_walinspect.c:362:39: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘max_block_id’ 362 | for (block_id = 0; block_id <= record->max_block_id; block_id++) | ^~ pg_walinspect.c:382:29: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 382 | uint8 bimg_info = record->blocks[block_id].bimg_info; | ^~ pg_walinspect.c:385:21: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 385 | fpi_len += record->blocks[block_id].bimg_len; | ^~ pg_walinspect.c:402:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 402 | record->blocks[block_id].hole_offset, | ^~ pg_walinspect.c:403:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 403 | record->blocks[block_id].hole_length, | ^~ pg_walinspect.c:405:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 405 | record->blocks[block_id].hole_length - | ^~ pg_walinspect.c:406:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 406 | record->blocks[block_id].bimg_len, | ^~ pg_walinspect.c:414:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 414 | record->blocks[block_id].hole_offset, | ^~ pg_walinspect.c:415:16: error: ‘XLogReaderState’ {aka ‘struct XLogReaderState’} has no member named ‘blocks’ 415 | record->blocks[block_id].hole_length); | ^~ make: *** [../../src/Makefile.global:941: pg_walinspect.o] Error 1 Thanks & Regards, Nitin Jadhav On Thu, Mar 17, 2022 at 1:54 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Mar 17, 2022 at 10:48 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > It still suggests unspecifiable end-LSN.. > > > > > select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060'); > > > ERROR: cannot accept future end LSN > > > DETAIL: Last known WAL LSN on the database system is 4/4C000060. > > Thanks Kyotaro-san. We can change the detail message to show (current > flush lsn/last replayed lsn - 1), that's what I've done in v11 posted > upthread at [1]. The problem is that all the pg_walinspect functions > would wait for the first valid record in read_local_xlog_page() via > InitXLogReaderState()->XLogFindNextRecord(), see[2]. > > We have two things to do: > 1) Just document the behaviour "pg_walinspect functions will wait for > the first valid WAL record if there is none found after the specified > input LSN/start LSN.". This seems easier but some may see it as a > problem. > 2) Have read_local_xlog_page_2 which doesn't wait for future WAL LSN > unlike read_local_xlog_page and like pg_waldump's WALDumpReadPage. It > requires a new function read_local_xlog_page_2 that almost looks like > read_local_xlog_page except wait (pg_usleep) loop, we can avoid code > duplication by moving the read_local_xlog_page code to a static > function read_local_xlog_page_guts(existing params, bool wait): > > read_local_xlog_page(params) > read_local_xlog_page_guts(existing params, false); > > read_local_xlog_page_2(params) > read_local_xlog_page_guts(existing params, true); > > read_local_xlog_page_guts: > if (wait) wait for future wal; ---> existing pg_usleep code in > read_local_xlog_page. > else return; > > I'm fine either way, please let me know your thoughts on this? > > [1] https://www.postgresql.org/message-id/CALj2ACU8XjbYbMwh5x6hEUJdpRoG9%3DPO52_tuOSf1%3DMO7WtsmQ%40mail.gmail.com > [2] > postgres=# select pg_current_wal_flush_lsn(); > pg_current_wal_flush_lsn > -------------------------- > 0/1624430 > (1 row) > > postgres=# select * from pg_get_wal_record_info('0/1624430'); > ERROR: cannot accept future input LSN > DETAIL: Last known WAL LSN on the database system is 0/162442F. > postgres=# select * from pg_get_wal_record_info('0/162442f'); ---> > waits for the first valid record in read_local_xlog_page. > > Regards, > Bharath Rupireddy. > >
Hi, First look at this patch, so I might be repeating stuff already commented on / discussed. On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote: > +-- > +-- pg_get_raw_wal_record() > +-- > +CREATE FUNCTION pg_get_raw_wal_record(IN in_lsn pg_lsn, > + OUT start_lsn pg_lsn, > + OUT end_lsn pg_lsn, > + OUT prev_lsn pg_lsn, > + OUT record_length int4, > + OUT record bytea > +) > +AS 'MODULE_PATHNAME', 'pg_get_raw_wal_record' > +LANGUAGE C CALLED ON NULL INPUT PARALLEL SAFE; What is raw about the function? Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a NULL lsn, does it? Also, that's the default, why is it restated here? > +REVOKE EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) FROM PUBLIC; > +GRANT EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) TO pg_read_server_files; > + > +-- > +-- pg_get_wal_record_info() > +-- > +CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn, > + OUT start_lsn pg_lsn, > + OUT end_lsn pg_lsn, > + OUT prev_lsn pg_lsn, > + OUT xid xid, > + OUT resource_manager text, > + OUT record_length int4, > + OUT fpi_length int4, > + OUT description text, > + OUT block_ref text, > + OUT data_length int4, > + OUT data bytea > +) > +AS 'MODULE_PATHNAME', 'pg_get_wal_record_info' > +LANGUAGE C CALLED ON NULL INPUT PARALLEL SAFE; > + > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC; > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor; I don't think it's appropriate for pg_monitor to see all the data in the WAL. > +-- > +-- pg_get_wal_stats() > +-- > +CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn, > + IN end_lsn pg_lsn DEFAULT NULL, > + OUT resource_manager text, > + OUT count int8, > + OUT count_percentage float4, > + OUT record_length int8, > + OUT record_length_percentage float4, > + OUT fpi_length int8, > + OUT fpi_length_percentage float4 > + ) > +RETURNS SETOF record AS $$ > +SELECT resource_manager, > + count(*) AS cnt, > + CASE WHEN count(*) > 0 THEN (count(*) * 100 / sum(count(*)) OVER total)::numeric(5,2) ELSE 0 END AS "count_%", > + sum(record_length) AS trecl, > + CASE WHEN sum(record_length) > 0 THEN (sum(record_length) * 100 / sum(sum(record_length)) OVER total)::numeric(5,2)ELSE 0 END AS "trecl_%", > + sum(fpi_length) AS tfpil, > + CASE WHEN sum(fpi_length) > 0 THEN (sum(fpi_length) * 100 / sum(sum(fpi_length)) OVER total)::numeric(5,2) ELSE 0END AS "tfpil_%" > +FROM pg_get_wal_records_info(start_lsn, end_lsn) > +GROUP BY resource_manager > +WINDOW total AS (); > +$$ LANGUAGE SQL CALLED ON NULL INPUT PARALLEL SAFE; This seems like an exceedingly expensive way to compute this. Not just because of doing the grouping, window etc, but also because it's serializing the "data" field from pg_get_wal_records_info() just to never use it. With any appreciatable amount of data the return value pg_get_wal_records_info() will be serialized into a on-disk tuplestore. This is probably close to an order of magnitude slower than pg_waldump --stats. Which imo renders this largely useless. The column names don't seem great either. "tfpil"? > +/* > + * Module load callback. > + */ > +void > +_PG_init(void) > +{ > + /* Define custom GUCs and install hooks here, if any. */ > + > + /* > + * Have EmitWarningsOnPlaceholders("pg_walinspect"); if custom GUCs are > + * defined. > + */ > +} > + > +/* > + * Module unload callback. > + */ > +void > +_PG_fini(void) > +{ > + /* Uninstall hooks, if any. */ > +} Why have this stuff if it's not used? > +/* > + * Validate given LSN and return the LSN up to which the server has WAL. > + */ > +static XLogRecPtr > +ValidateInputLSN(XLogRecPtr lsn) > +{ > + XLogRecPtr curr_lsn; > + > + /* Validate input WAL LSN. */ > + if (XLogRecPtrIsInvalid(lsn)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid WAL LSN"))); > + > + /* > + * We determine the current LSN of the server similar to how page_read > + * callback read_local_xlog_page does. > + */ > + if (!RecoveryInProgress()) > + curr_lsn = GetFlushRecPtr(NULL); > + else > + curr_lsn = GetXLogReplayRecPtr(NULL); > + > + Assert(!XLogRecPtrIsInvalid(curr_lsn)); > + > + if (lsn >= curr_lsn) > + { > + /* > + * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last > + * record flushed or replayed respectively. But let's use the LSN up > + * to "end" in user facing message. > + */ > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot accept future input LSN"), > + errdetail("Last known WAL LSN on the database system is %X/%X.", > + LSN_FORMAT_ARGS(curr_lsn - 1)))); > + } > + return curr_lsn; > +} > + > +/* > + * Validate given start LSN and end LSN, return the new end LSN in case user > + * hasn't specified one. > + */ > +static XLogRecPtr > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn) > +{ > + XLogRecPtr curr_lsn; > + > + /* Validate WAL start LSN. */ > + if (XLogRecPtrIsInvalid(start_lsn)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid WAL start LSN"))); > + > + if (!RecoveryInProgress()) > + curr_lsn = GetFlushRecPtr(NULL); > + else > + curr_lsn = GetXLogReplayRecPtr(NULL); > + > + if (start_lsn >= curr_lsn) > + { > + /* > + * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last > + * record flushed or replayed respectively. But let's use the LSN up > + * to "end" in user facing message. > + */ > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot accept future start LSN"), > + errdetail("Last known WAL LSN on the database system is %X/%X.", > + LSN_FORMAT_ARGS(curr_lsn - 1)))); > + } > + /* > + * If end_lsn is specified, let's ensure that it's not a future LSN i.e. > + * something the database system doesn't know about. > + */ > + if (!XLogRecPtrIsInvalid(end_lsn) && > + (end_lsn >= curr_lsn)) > + { > + /* > + * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last > + * record flushed or replayed respectively. But let's use the LSN up > + * to "end" in user facing message. > + */ > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot accept future end LSN"), > + errdetail("Last known WAL LSN on the database system is %X/%X.", > + LSN_FORMAT_ARGS(curr_lsn - 1)))); > + } > + > + /* > + * When end_lsn is not specified let's read up to the last WAL position > + * known to be on the server. > + */ > + if (XLogRecPtrIsInvalid(end_lsn)) > + end_lsn = curr_lsn; > + > + Assert(!XLogRecPtrIsInvalid(end_lsn)); > + > + if (start_lsn >= end_lsn) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("WAL start LSN must be less than end LSN"))); > + > + return end_lsn; > +} These two functions are largely redundant, that doesn't seem great. > +Datum > +pg_get_raw_wal_record(PG_FUNCTION_ARGS) > +{ > +#define PG_GET_RAW_WAL_RECORD_COLS 5 > + XLogRecPtr lsn; > + XLogRecord *record; > + XLogRecPtr first_record; > + XLogReaderState *xlogreader; > + bytea *raw_record; > + uint32 rec_len; > + char *raw_record_data; > + TupleDesc tupdesc; > + Datum result; > + HeapTuple tuple; > + Datum values[PG_GET_RAW_WAL_RECORD_COLS]; > + bool nulls[PG_GET_RAW_WAL_RECORD_COLS]; > + int i = 0; > + > + lsn = PG_GETARG_LSN(0); > + > + (void) ValidateInputLSN(lsn); > + > + /* Build a tuple descriptor for our result type. */ > + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > + > + xlogreader = InitXLogReaderState(lsn, &first_record); > + > + Assert(xlogreader); > + > + record = ReadNextXLogRecord(xlogreader, first_record); > + > + rec_len = XLogRecGetTotalLen(xlogreader); > + > + Assert(rec_len > 0); > + Most of this has another copy in pg_get_wal_record_info(). Can more of this be deduplicated? > +/* > + * Get WAL record info. > + */ > +static void > +GetXLogRecordInfo(XLogReaderState *record, XLogRecPtr lsn, > + Datum *values, bool *nulls, uint32 ncols) > +{ > + const char *id; > + const RmgrData *desc; > + uint32 fpi_len = 0; > + RelFileNode rnode; > + ForkNumber forknum; > + BlockNumber blk; > + int block_id; > + StringInfoData rec_desc; > + StringInfoData rec_blk_ref; > + StringInfoData temp; > + bytea *data; > + char *main_data; > + uint32 main_data_len; > + int i = 0; > + > + desc = &RmgrTable[XLogRecGetRmid(record)]; > + initStringInfo(&rec_desc); > + id = desc->rm_identify(XLogRecGetInfo(record)); > + > + if (id == NULL) > + appendStringInfo(&rec_desc, "UNKNOWN (%x) ", XLogRecGetInfo(record) & ~XLR_INFO_MASK); > + else > + appendStringInfo(&rec_desc, "%s ", id); > + > + initStringInfo(&temp); > + desc->rm_desc(&temp, record); > + appendStringInfo(&rec_desc, "%s", temp.data); > + pfree(temp.data); > + initStringInfo(&rec_blk_ref); This seems unnecessarily wasteful. You serialize into one stringinfo, just to then copy that stringinfo into another stringinfo. Just to then allocate yet another stringinfo. > + /* Block references (detailed format). */ This comment seems copied from pg_waldump, but doesn't make sense here, because there's no short format. > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > + { > + if (!XLogRecHasBlockRef(record, block_id)) > + continue; > + > + XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blk); > + > + if (forknum != MAIN_FORKNUM) > + appendStringInfo(&rec_blk_ref, > + "blkref #%u: rel %u/%u/%u fork %s blk %u", > + block_id, rnode.spcNode, rnode.dbNode, > + rnode.relNode, get_forkname(forknum), blk); > + else > + appendStringInfo(&rec_blk_ref, > + "blkref #%u: rel %u/%u/%u blk %u", > + block_id, rnode.spcNode, rnode.dbNode, > + rnode.relNode, blk); > + > + if (XLogRecHasBlockImage(record, block_id)) > + { > + uint8 bimg_info = record->blocks[block_id].bimg_info; > + > + /* Calculate the amount of FPI data in the record. */ > + fpi_len += record->blocks[block_id].bimg_len; > + > + if (BKPIMAGE_COMPRESSED(bimg_info)) > + { > + const char *method; > + > + if ((bimg_info & BKPIMAGE_COMPRESS_PGLZ) != 0) > + method = "pglz"; > + else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0) > + method = "lz4"; > + else > + method = "unknown"; > + > + appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u, " > + "compression saved: %u, method: %s", > + XLogRecBlockImageApply(record, block_id) ? > + "" : " for WAL verification", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].hole_length, > + BLCKSZ - > + record->blocks[block_id].hole_length - > + record->blocks[block_id].bimg_len, > + method); > + } > + else > + { > + appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u", > + XLogRecBlockImageApply(record, block_id) ? > + "" : " for WAL verification", > + record->blocks[block_id].hole_offset, > + record->blocks[block_id].hole_length); > + } > + } > + } To me duplicating this much code from waldump seems like a bad idea from a maintainability POV. > +/* > + * Get info and data of all WAL records between start LSN and end LSN. > + */ > +static void > +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, > + XLogRecPtr end_lsn) > +{ > +#define PG_GET_WAL_RECORDS_INFO_COLS 11 > + XLogRecPtr first_record; > + XLogReaderState *xlogreader; > + ReturnSetInfo *rsinfo; > + TupleDesc tupdesc; > + Tuplestorestate *tupstore; > + MemoryContext per_query_ctx; > + MemoryContext oldcontext; > + Datum values[PG_GET_WAL_RECORDS_INFO_COLS]; > + bool nulls[PG_GET_WAL_RECORDS_INFO_COLS]; > + > + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; > + > + /* Check to see if caller supports us returning a tuplestore. */ > + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("set-valued function called in context that cannot accept a set"))); > + if (!(rsinfo->allowedModes & SFRM_Materialize)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("materialize mode required, but it is not allowed in this context"))); > + > + /* Build a tuple descriptor for our result type. */ > + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > + > + /* Build tuplestore to hold the result rows. */ > + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; > + oldcontext = MemoryContextSwitchTo(per_query_ctx); > + tupstore = tuplestore_begin_heap(true, false, work_mem); > + rsinfo->returnMode = SFRM_Materialize; > + rsinfo->setResult = tupstore; > + rsinfo->setDesc = tupdesc; This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807. > + for (;;) > + { > + (void) ReadNextXLogRecord(xlogreader, first_record); > + > + /* > + * Let's not show the record info if it is spanning more than the > + * end_lsn. EndRecPtr is "end+1" of the last read record, hence > + * use "end" here. > + */ > + if ((xlogreader->EndRecPtr - 1) <= end_lsn) > + { > + GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls, > + PG_GET_WAL_RECORDS_INFO_COLS); > + > + tuplestore_putvalues(tupstore, tupdesc, values, nulls); > + } > + > + /* Exit loop if read up to end_lsn. */ > + if (xlogreader->EndRecPtr >= end_lsn) > + break; Seems weird to have both of these conditions separately. Greetings, Andres Freund
On Fri, Mar 18, 2022 at 8:07 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Hi Bharath, > > Due to recent commits on master, the pg_walinpect module is not > compiling. Kindly update the patch. Thanks Nitin. Here's an updated v12 patch-set. I will respond to Andres comments in a while. Regards, Bharath Rupireddy.
Вложения
On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > First look at this patch, so I might be repeating stuff already commented on / > discussed. Thanks for taking a look at the patch. > On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote: > > +-- > > +-- pg_get_raw_wal_record() > > What is raw about the function? It right now gives data starting from the output of XLogReadRecord upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord returns a pointer to the decoded record's header, I'm not sure it's the right choice. Actually, this function's intention(not an immediate use-case though), is to feed the WAL record to another function and then, say, repair a corrupted page given a base data page. As I said upthread, I'm open to removing this function for now, when a realistic need comes we can add it back. It also raised some concerns around the security and permissions. Thoughts? > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a > NULL lsn, does it? Also, that's the default, why is it restated here? pg_get_wal_records_info needed that option (if end_lsn being the default, providing WAL info upto the end of WAL). Also, we can emit better error message ("invalid WAL start LSN") instead of generic one. I wanted to keep error message and code same across all the functions hence CALLED ON NULL INPUT option for pg_get_raw_wal_record. > > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC; > > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor; > > I don't think it's appropriate for pg_monitor to see all the data in the WAL. How about pg_read_server_files or some other? > > +-- pg_get_wal_stats() > > This seems like an exceedingly expensive way to compute this. Not just because > of doing the grouping, window etc, but also because it's serializing the > "data" field from pg_get_wal_records_info() just to never use it. With any > appreciatable amount of data the return value pg_get_wal_records_info() will > be serialized into a on-disk tuplestore. > > This is probably close to an order of magnitude slower than pg_waldump > --stats. Which imo renders this largely useless. Yeah that's true. Do you suggest having pg_get_wal_stats() a c-function like in v8 patch [1]? SEe some numbers at [2] with pg_get_wal_stats using pg_get_wal_records_info and pg_get_wal_records_info as a direct c-function like in v8 patch [1]. A direct c-function always fares better (84 msec vs 1400msec). > The column names don't seem great either. "tfpil"? "total fpi length" - tfpil wanted to keep it short - it's just an internal column name isn't it? The actual column name the user sees is fpi_length. > > +void > > +_PG_init(void) > > > +void > > +_PG_fini(void) > > Why have this stuff if it's not used? I kept it as a placeholder for future code additions, for instance test_decoding.c and ssl_passphrase_func.c has empty _PG_init(), _PG_fini(). If okay, I can mention there like "placeholder for now", otherwise I can remove it. > > +static XLogRecPtr > > +ValidateInputLSN(XLogRecPtr lsn) > > > +static XLogRecPtr > > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn) > > +{ > > These two functions are largely redundant, that doesn't seem great. I will modify it in the next version. > > +Datum > > +pg_get_raw_wal_record(PG_FUNCTION_ARGS) > > Most of this has another copy in pg_get_wal_record_info(). Can more of this be > deduplicated? I will do, if we decide on whether or not to have the pg_get_raw_wal_record function at all? Please see my comments above. > > + initStringInfo(&temp); > > + desc->rm_desc(&temp, record); > > + appendStringInfo(&rec_desc, "%s", temp.data); > > + pfree(temp.data); > > + initStringInfo(&rec_blk_ref); > > This seems unnecessarily wasteful. You serialize into one stringinfo, just to > then copy that stringinfo into another stringinfo. Just to then allocate yet > another stringinfo. Yeah, I will remove it. Looks like all the rm_desc callbacks append to the passed-in buffer and not reset it, so we should be good. > > + /* Block references (detailed format). */ > > This comment seems copied from pg_waldump, but doesn't make sense here, > because there's no short format. Yes, I will remove it. > > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > > + { > > To me duplicating this much code from waldump seems like a bad idea from a > maintainability POV. Even if we were to put the above code from pg_walinspect and pg_waldump into, say, walutils.c or some other existing file, there we had to make if (pg_walinspect) appendStringInfo else if (pg_waldump) printf() sort of thing, isn't it clumsy? Also, unnecessary if conditions need to be executed for every record. For maintainability, I added a note in pg_walinspect.c and pg_waldump.c to consider fixing things in both places (of course this might sound dumbest way of doing it, IMHO, it's sensible, given the if(pg_walinspect)-else if(pg_waldump) sorts of code that we need to do in the common functions). Thoughts? > > +/* > > + * Get info and data of all WAL records between start LSN and end LSN. > > + */ > > +static void > > +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, > > This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807. Yes, I will change it. > > + for (;;) > > + { > > + (void) ReadNextXLogRecord(xlogreader, first_record); > > + > > + /* > > + * Let's not show the record info if it is spanning more than the > > + * end_lsn. EndRecPtr is "end+1" of the last read record, hence > > + * use "end" here. > > + */ > > + if ((xlogreader->EndRecPtr - 1) <= end_lsn) > > + { > > + GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls, > > + PG_GET_WAL_RECORDS_INFO_COLS); > > + > > + tuplestore_putvalues(tupstore, tupdesc, values, nulls); > > + } > > + > > + /* Exit loop if read up to end_lsn. */ > > + if (xlogreader->EndRecPtr >= end_lsn) > > + break; > > Seems weird to have both of these conditions separately. Yeah. It is to handle some edge cases to print the WAL upto end_lsn and avoid waiting in read_local_xlog_page. I will change it. Actually, there's an open point as specified in [3]. Any thoughts on it? [1] https://www.postgresql.org/message-id/CALj2ACWhcbW_s4BXLyCpLWcCppZN9ncTXHbn4dv1F0Vpe0kxqA%40mail.gmail.com [2] with pg_get_wal_stats using pg_get_wal_stats: Time: 1394.919 ms (00:01.395) Time: 1403.199 ms (00:01.403) Time: 1408.138 ms (00:01.408) Time: 1397.670 ms (00:01.398) with pg_get_wal_stats as a c-function like in v8 patch [1]: Time: 84.319 ms Time: 84.303 ms Time: 84.208 ms Time: 84.452 ms use case: create extension pg_walinspect; create table foo(col int); insert into foo select * from generate_series(1, 100000); update foo set col = col*2+1; delete from foo; \timing on select * from pg_get_wal_stats('0/01000028'); \timing off output: postgres=# select * from pg_get_wal_stats('0/01000028'); resource_manager | count | count_percentage | record_length | record_length_percentage | fpi_length | fpi_length_percentage ------------------+--------+------------------+---------------+--------------------------+------------+----------------------- Storage | 13 | 0 | 546 | 0 | 0 | 0 CLOG | 1 | 0 | 30 | 0 | 0 | 0 Database | 2 | 0 | 84 | 0 | 0 | 0 Btree | 13078 | 3.1 | 1486990 | 4.97 | 461512 | 23.13 Heap | 404835 | 95.84 | 26354653 | 88.17 | 456576 | 22.88 Transaction | 721 | 0.17 | 178933 | 0.6 | 0 | 0 Heap2 | 3056 | 0.72 | 1131836 | 3.79 | 376932 | 18.89 Standby | 397 | 0.09 | 23226 | 0.08 | 0 | 0 XLOG | 316 | 0.07 | 716027 | 2.4 | 700164 | 35.09 (9 rows) [3] https://www.postgresql.org/message-id/CALj2ACVBST5Us6-eDz4q_Gem3rUHSC7AYNOB7tmp9Yqq6PHsXw%40mail.gmail.com Regards, Bharath Rupireddy.
Hi, On 2022-03-22 21:57:51 +0530, Bharath Rupireddy wrote: > On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote: > > On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote: > > > +-- > > > +-- pg_get_raw_wal_record() > > > > What is raw about the function? > > It right now gives data starting from the output of XLogReadRecord > upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord > returns a pointer to the decoded record's header, I'm not sure it's > the right choice. Actually, this function's intention(not an immediate > use-case though), is to feed the WAL record to another function and > then, say, repair a corrupted page given a base data page. > > As I said upthread, I'm open to removing this function for now, when a > realistic need comes we can add it back. It also raised some concerns > around the security and permissions. Thoughts? I'm ok with having it with appropriate permissions, I just don't like the name. > > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a > > NULL lsn, does it? Also, that's the default, why is it restated here? > > pg_get_wal_records_info needed that option (if end_lsn being the > default, providing WAL info upto the end of WAL). Also, we can emit > better error message ("invalid WAL start LSN") instead of generic one. > I wanted to keep error message and code same across all the functions > hence CALLED ON NULL INPUT option for pg_get_raw_wal_record. I think it should be strict if it behaves strict. I fail to see what consistency in error messages is worth here. And I'd probably just create two different functions for begin and begin & end LSN and mark those as strict as well. > > > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC; > > > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor; > > > > I don't think it's appropriate for pg_monitor to see all the data in the WAL. > > How about pg_read_server_files or some other? That seems more appropriate. > > > +-- pg_get_wal_stats() > > > > This seems like an exceedingly expensive way to compute this. Not just because > > of doing the grouping, window etc, but also because it's serializing the > > "data" field from pg_get_wal_records_info() just to never use it. With any > > appreciatable amount of data the return value pg_get_wal_records_info() will > > be serialized into a on-disk tuplestore. > > > > This is probably close to an order of magnitude slower than pg_waldump > > --stats. Which imo renders this largely useless. > > Yeah that's true. Do you suggest having pg_get_wal_stats() a > c-function like in v8 patch [1]? Yes. > SEe some numbers at [2] with pg_get_wal_stats using > pg_get_wal_records_info and pg_get_wal_records_info as a direct > c-function like in v8 patch [1]. A direct c-function always fares > better (84 msec vs 1400msec). That indeed makes the view as is pretty much useless. And it'd probably be worse in a workload with longer records / many FPIs. > > > +void > > > +_PG_init(void) > > > > > +void > > > +_PG_fini(void) > > > > Why have this stuff if it's not used? > > I kept it as a placeholder for future code additions, for instance > test_decoding.c and ssl_passphrase_func.c has empty _PG_init(), > _PG_fini(). If okay, I can mention there like "placeholder for now", > otherwise I can remove it. That's not comparable, the test_decoding case has it as a placeholder because it serves as a template to create further output plugins. Something not the case here. So please remove. > > > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > > > + { > > > > To me duplicating this much code from waldump seems like a bad idea from a > > maintainability POV. > > Even if we were to put the above code from pg_walinspect and > pg_waldump into, say, walutils.c or some other existing file, there we > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump) > printf() sort of thing, isn't it clumsy? Why is that needed? Just use the stringinfo in both places? You're outputting the exact same thing in both places right now. There's already a stringinfo in XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was written), so you could just convert at least the relevant printfs in XLogDumpDisplayRecord(). > Also, unnecessary if > conditions need to be executed for every record. For maintainability, > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing > things in both places (of course this might sound dumbest way of doing > it, IMHO, it's sensible, given the if(pg_walinspect)-else > if(pg_waldump) sorts of code that we need to do in the common > functions). Thoughts? IMO we shouldn't merge this with as much duplication as there is right now, the notes don't change that it's a PITA to maintain. > Yeah. It is to handle some edge cases to print the WAL upto end_lsn > and avoid waiting in read_local_xlog_page. I will change it. > > Actually, there's an open point as specified in [3]. Any thoughts on it? Seems more user-friendly to wait - it's otherwise hard for a user to know what LSN to put in. Greetings, Andres Freund
At Tue, 22 Mar 2022 11:00:06 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-03-22 21:57:51 +0530, Bharath Rupireddy wrote: > > > This is probably close to an order of magnitude slower than pg_waldump > > > --stats. Which imo renders this largely useless. > > > > Yeah that's true. Do you suggest having pg_get_wal_stats() a > > c-function like in v8 patch [1]? > > Yes. > > > SEe some numbers at [2] with pg_get_wal_stats using > > pg_get_wal_records_info and pg_get_wal_records_info as a direct > > c-function like in v8 patch [1]. A direct c-function always fares > > better (84 msec vs 1400msec). > > That indeed makes the view as is pretty much useless. And it'd probably be > worse in a workload with longer records / many FPIs. FWIW agreed. The SQL version is too slow.. > > > > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > > > > + { > > > > > > To me duplicating this much code from waldump seems like a bad idea from a > > > maintainability POV. > > > > Even if we were to put the above code from pg_walinspect and > > pg_waldump into, say, walutils.c or some other existing file, there we > > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump) > > printf() sort of thing, isn't it clumsy? > > Why is that needed? Just use the stringinfo in both places? You're outputting > the exact same thing in both places right now. There's already a stringinfo in > XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was > written), so you could just convert at least the relevant printfs in > XLogDumpDisplayRecord(). > > Also, unnecessary if > > conditions need to be executed for every record. For maintainability, > > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing > > things in both places (of course this might sound dumbest way of doing > > it, IMHO, it's sensible, given the if(pg_walinspect)-else > > if(pg_waldump) sorts of code that we need to do in the common > > functions). Thoughts? > > IMO we shouldn't merge this with as much duplication as there is right now, > the notes don't change that it's a PITA to maintain. The two places emit different outputs but the only difference is the delimiter between two blockrefs. (By the way, the current code forgets to insert a delimiter there). So even if the function took "bool is_waldump", it is used only when appending a line delimiter. It would be nicer if the "bool is_waldump" were "char *delimiter". Others might think differently, though.. So, the function looks like this. StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, uint32 &fpi_len); > > Yeah. It is to handle some edge cases to print the WAL upto end_lsn > > and avoid waiting in read_local_xlog_page. I will change it. > > > > Actually, there's an open point as specified in [3]. Any thoughts on it? > > Seems more user-friendly to wait - it's otherwise hard for a user to know what > LSN to put in. I'm not sure it is user-friendly that the function "freeze"s for a reason uncertain to the user.. Even if any results are accumulated before waiting, all of them vanishes by entering Ctrl-C to release the "freeze". About the usefulness of the waiting behavior, it depends on what we think the function's major use cases are. Robert (AFAIU) thinks it as a simple WAL dumper that is intended to use in some automated mechanism. The start/end LSNs simply identifies the records to emit. No warning/errors and no waits except for apparently invalid inputs. I thought it as a means by which to manually inspect wal on SQL interface but don't have a strong opinion on the waiting behavior. (Because I can avoid that by giving a valid LSN pair to the function if I don't want it to "freeze".) Anyway, the opinions here on the interface are described as follows. A. as a diag interface for human use. 1. If the whole region is filled with records, return them all. 2. If start-LSN is too past, starts from the first available record. 3-1. If start-LSN is in futnre, wait for the record to come. 4-1. If end-LSN is in future, waits for new records. 5-1. If end-LSN is too past, error out? B. as a simple WAL dumper 1. If the whole region is filled with records, return them all. 2. If start-LSN is too past, starts from the first available record. 3-2. If start-LSN is in futnre, returns nothig. 4-2. If end-LSN is in future, ends with the last available record. 5-2. If end-LSN is too past, returns northing. 1 and 2 are uncontroversial. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 23 Mar 2022 11:51:25 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > The two places emit different outputs but the only difference is the > delimiter between two blockrefs. (By the way, the current code forgets > to insert a delimiter there). So even if the function took "bool > is_waldump", it is used only when appending a line delimiter. It > would be nicer if the "bool is_waldump" were "char *delimiter". > Others might think differently, though.. > > So, the function looks like this. > > StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, > uint32 &fpi_len); By the way, xlog_block_info@xlogrecovery.c has the subset of the function. So the function can be shared with the callers of xlog_block_info but I'm not sure it is not too-much... StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, bool fpw_detail, uint32 &fpi_len); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Mar 22, 2022 at 11:30 PM Andres Freund <andres@anarazel.de> wrote: > > > > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a > > > NULL lsn, does it? Also, that's the default, why is it restated here? > > > > pg_get_wal_records_info needed that option (if end_lsn being the > > default, providing WAL info upto the end of WAL). Also, we can emit > > better error message ("invalid WAL start LSN") instead of generic one. > > I wanted to keep error message and code same across all the functions > > hence CALLED ON NULL INPUT option for pg_get_raw_wal_record. > > I think it should be strict if it behaves strict. I fail to see what > consistency in error messages is worth here. And I'd probably just create two > different functions for begin and begin & end LSN and mark those as strict as > well. I'm okay with changing them to be STRICT. Right now, the behaviour of pg_get_wal_records_info is this: CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn, IN end_lsn pg_lsn DEFAULT NULL, select pg_get_wal_records_info(start_lsn, end_lsn); if start_lsn is future, then errors out if end_lsn is future, then errors out otherwise, returns WAL records info between start_lsn and end_lsn select pg_get_wal_records_info(start_lsn); if start_lsn is future, then errors out sets end_lsn = current server lsn and returns WAL records info between start_lsn and end_lsn Same is true for pg_get_wal_stats. Getting WAL records info provided start_lsn until end-of-WAL is a basic ask and a good function to have. Now, if I were to make pg_get_wal_records_info STRICT, then I would need to have another function like pg_get_wal_records_info_till_end_of_wal/pg_get_wal_stats_till_end_of_wal much like ones in few of my initial patches upthread. Is it okay to have these functions pg_get_wal_records_info(start_lsn, end_lsn)/pg_get_wal_stats(start_lsn, end_lsn) and pg_get_wal_records_info_till_end_of_wal(start_lsn)/pg_get_wal_stats_till_end_of_wal(start_lsn)? This way, it will be more clear to the user actually than to stuff more than one behaviour in a single function with default values. Please let me know your thoughts. > > Yeah. It is to handle some edge cases to print the WAL upto end_lsn > > and avoid waiting in read_local_xlog_page. I will change it. > > > > Actually, there's an open point as specified in [3]. Any thoughts on it? > > Seems more user-friendly to wait - it's otherwise hard for a user to know what > LSN to put in. I agree with Kyotaro-san that the wait behavior isn't a good choice, because CTRL+C would not emit the accumulated info/stats unlike pg_waldump. Also, with wait behaviour it's easy for a user to trick the server with an unreasonably futuristic WAL LSN, say F/FFFFFFFF. Also, if we use pg_walinspect functions, say, within a WAL monitoring app, the wait behaviour isn't good there as it might look like the functions hanging the app. We might think about adding a timeout for waiting, but that doesn't seem an elegant way. Users/Apps can easily figure out the LSNs to get WAL info/stats - either they can use pg_current_wal_XXXX or by looking at the control file or server logs or pg_stat_replication, what not. LSNs are everywhere within the postgres eco-system. Instead, the functions simply can figure out what's current server LSN at-the-moment and choose to error out if any of the provided input LSN is beyond that as it's being done currently. This looks simpler and user-friendly. On Wed, Mar 23, 2022 at 8:27 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Wed, 23 Mar 2022 11:51:25 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > The two places emit different outputs but the only difference is the > > delimiter between two blockrefs. (By the way, the current code forgets > > to insert a delimiter there). So even if the function took "bool > > is_waldump", it is used only when appending a line delimiter. It > > would be nicer if the "bool is_waldump" were "char *delimiter". > > Others might think differently, though.. > > > > So, the function looks like this. > > > > StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, > > uint32 &fpi_len); > > By the way, xlog_block_info@xlogrecovery.c has the subset of the > function. So the function can be shared with the callers of > xlog_block_info but I'm not sure it is not too-much... > > StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, > bool fpw_detail, uint32 &fpi_len); > Yes, putting them in a common function is a good idea. I'm thinking something like below. StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter, uint32 *fpi_len, bool detailed_format) I will try to put the common functions in xlogreader.h/.c, so that both pg_waldump and pg_walinspect can make use of it. Thoughts? Regards, Bharath Rupireddy.
On Tue, Mar 22, 2022 at 11:30 PM Andres Freund <andres@anarazel.de> wrote: > > > > To me duplicating this much code from waldump seems like a bad idea from a > > > maintainability POV. > > > > Even if we were to put the above code from pg_walinspect and > > pg_waldump into, say, walutils.c or some other existing file, there we > > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump) > > printf() sort of thing, isn't it clumsy? > > Why is that needed? Just use the stringinfo in both places? You're outputting > the exact same thing in both places right now. There's already a stringinfo in > XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was > written), so you could just convert at least the relevant printfs in > XLogDumpDisplayRecord(). > > > Also, unnecessary if > > conditions need to be executed for every record. For maintainability, > > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing > > things in both places (of course this might sound dumbest way of doing > > it, IMHO, it's sensible, given the if(pg_walinspect)-else > > if(pg_waldump) sorts of code that we need to do in the common > > functions). Thoughts? > > IMO we shouldn't merge this with as much duplication as there is right now, > the notes don't change that it's a PITA to maintain. Here's a refactoring patch that basically moves the pg_waldump's functions and stats structures to xlogreader.h/.c so that the pg_walinspect can reuse them. If it looks okay, I will send the pg_walinspect patches based on it. Regards, Bharath Rupireddy.
Вложения
At Wed, 23 Mar 2022 21:36:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Here's a refactoring patch that basically moves the pg_waldump's > functions and stats structures to xlogreader.h/.c so that the > pg_walinspect can reuse them. If it looks okay, I will send the > pg_walinspect patches based on it. +void +XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter, + uint32 *fpi_len, bool detailed_format, + StringInfo buf) ... + if (detailed_format && delimiter) + appendStringInfoChar(buf, '\n'); It is odd that the variable "delimiter" is used as a bool in the function, though it is a "char *", which I meant that it is used as delimiter string (assuming that you might want to insert ", " between two blkref descriptions). +get_forkname(ForkNumber num) forkNames[] is public and used in reinit.c. I think we don't need this function. +#define MAX_XLINFO_TYPES 16 ... + XLogRecStats rmgr_stats[RM_NEXT_ID]; + XLogRecStats record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES]; +} XLogStats; + This doesn't seem to be a part of xlogreader. Couldn't we add a new module "xlogstats"? XLogRecGetBlockRefInfo also doesn't seem to me as a part of xlogreader, the xlogstats looks like a better place. +#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \ + tot_len_pct, total_count, \ It doesn't need to be a macro. However in the first place I don't think it is useful to have. Rather it may be harmful since it doesn't reduce complexity much but instead just hides details. If we want to avoid tedious repetitions of the same statements, a macro like the following may work. #define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom)) ... > n_pct = CALC_PCT(n, total_count); > rec_len_pct = CALC_PCT(rec_len, total_rec_len); > fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len); > tot_len_pct = CALC_PCT(tot_len, total_len); But it is not seem that different if we directly write out the detail. > n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count); > rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len); > fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len); > tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > +void > +XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter, > + uint32 *fpi_len, bool detailed_format, > + StringInfo buf) > ... > + if (detailed_format && delimiter) > + appendStringInfoChar(buf, '\n'); > > It is odd that the variable "delimiter" is used as a bool in the > function, though it is a "char *", which I meant that it is used as > delimiter string (assuming that you might want to insert ", " between > two blkref descriptions). I'm passing NULL if the delimiter isn't required (for pg_walinspect) and I wanted to check if it's passed, so I was using the delimiter in the condition. However, I now changed it to delimiter != NULL. > +get_forkname(ForkNumber num) > > forkNames[] is public and used in reinit.c. I think we don't need > this function. Yes. I removed it. > +#define MAX_XLINFO_TYPES 16 > ... > + XLogRecStats rmgr_stats[RM_NEXT_ID]; > + XLogRecStats record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES]; > +} XLogStats; > + > > This doesn't seem to be a part of xlogreader. Couldn't we add a new > module "xlogstats"? XLogRecGetBlockRefInfo also doesn't seem to me as > a part of xlogreader, the xlogstats looks like a better place. I'm not sure if it's worth adding new files xlogstats.h/.c just for 2 structures, 1 macro, and 2 functions with no plan to add new stats structures or functions. Since xlogreader is the one that reads the WAL, and is being included by both backend and other modules (tools and extensions) IMO it's the right place. However, I can specify in xlogreader that if at all the new stats related structures or functions are going to be added, it's good to move them into a new header and .c file. Thoughts? > +#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \ > + tot_len_pct, total_count, \ > > It doesn't need to be a macro. However in the first place I don't > think it is useful to have. Rather it may be harmful since it doesn't > reduce complexity much but instead just hides details. If we want to > avoid tedious repetitions of the same statements, a macro like the > following may work. > > #define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom)) > ... > > n_pct = CALC_PCT(n, total_count); > > rec_len_pct = CALC_PCT(rec_len, total_rec_len); > > fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len); > > tot_len_pct = CALC_PCT(tot_len, total_len); > > But it is not seem that different if we directly write out the detail. > > > n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count); > > rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len); > > fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len); > > tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len); I removed the XLOG_GET_STATS_PERCENTAGE macro. Attaching v14 patch-set here with. It has bunch of other changes along with the above: 1) Used STRICT for all the functions and introduced _till_end_of_wal versions for pg_get_wal_records_info and pg_get_wal_stats. 2) Most of the code is reused between pg_walinspect and pg_waldump and also within pg_walinspect. 3) Added read_local_xlog_page_no_wait without duplicating the code so that the pg_walinspect functions don't wait even while finding the first valid WAL record. 4) No function waits for future WAL lsn even to find the first valid WAL record. 5) Addressed the review comments raised upthread by Andres. I hope this version makes the patch cleaner, please review it further. Regards, Bharath Rupireddy.
Вложения
Hi, On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote: > On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi > > This doesn't seem to be a part of xlogreader. Couldn't we add a new > > module "xlogstats"? XLogRecGetBlockRefInfo also doesn't seem to me as > > a part of xlogreader, the xlogstats looks like a better place. > > I'm not sure if it's worth adding new files xlogstats.h/.c just for 2 > structures, 1 macro, and 2 functions with no plan to add new stats > structures or functions. Since xlogreader is the one that reads the > WAL, and is being included by both backend and other modules (tools > and extensions) IMO it's the right place. However, I can specify in > xlogreader that if at all the new stats related structures or > functions are going to be added, it's good to move them into a new > header and .c file. I don't like that location for XLogRecGetBlockRefInfo(). How about putting it in xlogdesc.c - that kind of fits? And what do you think about creating src/backend/access/rmgrdesc/stats.c for XLogRecStoreStats()? It's not a perfect location, but not too bad either. XLogRecGetLen() would be ok in xlogreader, but stats.c also would work? Greetings, Andres Freund
On Fri, Mar 25, 2022 at 12:18 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote: > > On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi > > > This doesn't seem to be a part of xlogreader. Couldn't we add a new > > > module "xlogstats"? XLogRecGetBlockRefInfo also doesn't seem to me as > > > a part of xlogreader, the xlogstats looks like a better place. > > > > I'm not sure if it's worth adding new files xlogstats.h/.c just for 2 > > structures, 1 macro, and 2 functions with no plan to add new stats > > structures or functions. Since xlogreader is the one that reads the > > WAL, and is being included by both backend and other modules (tools > > and extensions) IMO it's the right place. However, I can specify in > > xlogreader that if at all the new stats related structures or > > functions are going to be added, it's good to move them into a new > > header and .c file. > > I don't like that location for XLogRecGetBlockRefInfo(). How about putting it > in xlogdesc.c - that kind of fits? Done. > And what do you think about creating src/backend/access/rmgrdesc/stats.c for > XLogRecStoreStats()? It's not a perfect location, but not too bad either. > > XLogRecGetLen() would be ok in xlogreader, but stats.c also would work? I've added a new xlogstats.c/.h (as suggested by Kyotaro-san as well) file under src/backend/access/transam/. I don't think the new file fits well under rmgrdesc. Attaching v15 patch-set, please have a look at it. Regards, Bharath Rupireddy.
Вложения
Hi Bharath,
First look at the patch, bear with me if any of the following comments are repeated.
1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more meaningful to show the corresponding WAL record.
For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output and not the one with start and end lsn as ('0/17335E0', '0/1733617').
With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.
2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.
rkn=# select * from pg_get_wal_record('0/4041728');
start_lsn | end_lsn | prev_lsn | record_length |
record
-----------+-----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------
0/4041728 | 0/40421AF | 0/40416F0 | 2670 | \x6e0a0000d2020000f016040400000000000a0000fef802b400007f7f00000000408fe738a25500000300000000000000010000007f06000000
4000003b0a00000000000012000000100101007f7f7f7f0885e738a25500003000c815380a03000885e738a255000000007f7f7f7f7f7f0000000078674301296000003000f8150020042000000000709e1203909
dba01c89c8601609cd00030986008f8956804d202000000000000000000000000120006001f00030820ffff5f04000000000001400000010000000000000004000000000080bf0200030000000000000000006100
0000610000000000000000000000000000000000000000000000000000000000000000000000330100000000000000bc02000001000000010000000000803f00000000000000b0060000010000000000000017000
start_lsn | end_lsn | prev_lsn | record_length |
record
-----------+-----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------
0/4041728 | 0/40421AF | 0/40416F0 | 2670 | \x6e0a0000d2020000f016040400000000000a0000fef802b400007f7f00000000408fe738a25500000300000000000000010000007f06000000
4000003b0a00000000000012000000100101007f7f7f7f0885e738a25500003000c815380a03000885e738a255000000007f7f7f7f7f7f0000000078674301296000003000f8150020042000000000709e1203909
dba01c89c8601609cd00030986008f8956804d202000000000000000000000000120006001f00030820ffff5f04000000000001400000010000000000000004000000000080bf0200030000000000000000006100
0000610000000000000000000000000000000000000000000000000000000000000000000000330100000000000000bc02000001000000010000000000803f00000000000000b0060000010000000000000017000
3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during recovery right?
4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the specified input lsn range, would it be better to output the existing WAL records displaying a message that it is a partial list of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out saying "requested WAL segment has already been removed"?
5. Following are very minor comments in the code
- Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN
- In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data
- In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable
- In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements verifying if the start lsn is a future lsn.
Thanks,
RKN
On Fri, Mar 25, 2022 at 8:37 PM RKN Sai Krishna <rknsaiforpostgres@gmail.com> wrote: > > Hi Bharath, > > First look at the patch, bear with me if any of the following comments are repeated. Thanks RKN, for playing around with the patches. > 1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more meaningfulto show the corresponding WAL record. In general, all the functions will first look for a first valid WAL record from the given input lsn/start lsn(XLogFindNextRecord) and then give info of all the valid records including the first valid WAL record until either the given end lsn or till end of WAL depending on the function used. > For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output andnot the one with start and end lsn as ('0/17335E0', '0/1733617'). If '0/17335E7' is an LSN containing a valid WAL record, pg_get_wal_record gives the info of that, otherwise if there's any next valid WAL record, it finds and gives that info. '0/17335E0' is before '0/17335E7' the input lsn, so it doesn't show that record, but the next valid record. All the pg_walinspect functions don't look for the nearest valid WAL record (could be previous to input lsn or next to input lsn), but they look for the next valid WAL record. This is because the xlogreader infra now has no API for backward iteration from a given LSN ( it has XLogFindNextRecord and XLogReadRecord which scans the WAL in forward direction). But, it's a good idea to have XLogFindPreviousRecord and XLogReadPreviousRecord versions (as we have links for previous WAL record in each WAL record) but that's a separate discussion. > With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN. Yes. > 2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose. I believe this is something psql does for larger column outputs for pretty-display. When used in a non-psql client, the column values are returned properly. Nothing to do with the pg_walinspect patches here. > 3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during recoveryright? There are functions that can be executable during recovery pg_last_wal_receive_lsn, pg_last_wal_replay_lsn. The pg_walinspect functions are useful even in recovery and I don't see a strong reason to not support them. Hence, I'm right now supporting them. > 4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the specifiedinput lsn range, would it be better to output the existing WAL records displaying a message that it is a partiallist of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out saying"requested WAL segment has already been removed"? "requested WAL segment %s has already been removed" is a common error across the xlogreader infra (see wal_segment_open) and I don't want to invent a new behaviour. And all the segment_open callbacks report an error when they are not finding the WAL file that they are looking for. > 5. Following are very minor comments in the code > > Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN That's fine, because it actually returns curr_lsn via the function param curr_lsn. However, I modified the comment a bit. > In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data No, we are just returning pointer to the string, not deep copying, see CStringGetTextDatum. All the functions get executed within a function's memory context and after handing off the results to the client that gets deleted, deallocating all the memory. > In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable XLogRecGetInfo is not a function, it's a macro, so that's fine. #define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info) > In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements verifyingif the start lsn is a future lsn. The existing if conditions are: if (IsFutureLSN(start_lsn, &curr_lsn)) if (!till_end_of_wal && end_lsn >= curr_lsn) if (till_end_of_wal) if (start_lsn >= end_lsn) I clubbed them like this: if (!till_end_of_wal) if (IsFutureLSN(start_lsn, &curr_lsn)) if (!till_end_of_wal && end_lsn >= curr_lsn) else if (till_end_of_wal) Other if conditions are serving different purposes, so I'm leaving them as-is. Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch, others remain the same. Regards, Bharath Rupireddy.
Вложения
On Sat, 2022-03-26 at 10:31 +0530, Bharath Rupireddy wrote: > Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch, > others remain the same. I looked more closely at this patch. * It seems that pg_get_wal_record() is not returning the correct raw data for the record. I tested with pg_logical_emit_message, and the message isn't there. pg_get_wal_record_info() uses XLogRecordGetData(), which seems closer to what I expect. * I'm a little unclear on the purpose of pg_get_wal_record(). What does it offer that the other functions don't? * I don't think we need the stats at all. We can run GROUP BY queries on the results of pg_get_wal_records_info(). * Include the xlinfo portion of the wal record in addition to the rmgr, like pg_waldump --stats=record shows. That way we can GROUP BY that as well. * I don't think we need the changes to xlogutils.c. You calculate the end pointer based on the flush pointer, anyway, so we should never need to wait (and when I take it out, the tests still pass). I think we can radically simplify it without losing functionality, unless I'm missing something. 1. Eliminate pg_get_wal_record(), pg_get_wal_records_info_till_end_of_wal(), pg_get_wal_stats(), pg_get_wal_stats_till_end_of_wal(). 2. Rename pg_get_wal_record_info -> pg_get_wal_record 3. Rename pg_get_wal_records_info -> pg_get_wal_records 4. For pg_get_wal_records, if end_lsn is NULL, read until the end of WAL. 5. For pg_get_wal_record and pg_get_wal_records, also return the xlinfo using rm_identify() if available. 6. Remove changes to xlogutils. 7. Remove the refactor to pull the stats out to a separate file, because stats aren't needed. 8. With only two functions in the API, it may even make sense to just make it a part of postgres rather than a separate module. However, I'm a little behind on this discussion thread, so perhaps I'm missing some important context. I'll try to catch up soon. Regards, Jeff Davis
Hi, On 2022-04-01 16:35:38 -0700, Jeff Davis wrote: > * I don't think we need the stats at all. We can run GROUP BY queries > on the results of pg_get_wal_records_info(). It's well over an order of magnitude slower. And I don't see how that can be avoided. That makes it practically useless. See numbers at the bottom of https://postgr.es/m/CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR%2BQRbuOw%40mail.gmail.com Greetings, Andres Freund
On Sat, Apr 2, 2022 at 5:05 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Sat, 2022-03-26 at 10:31 +0530, Bharath Rupireddy wrote: > > Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch, > > others remain the same. > > I looked more closely at this patch. Thanks Jeff for reviewing this. > * It seems that pg_get_wal_record() is not returning the correct raw > data for the record. I tested with pg_logical_emit_message, and the > message isn't there. pg_get_wal_record_info() uses XLogRecordGetData(), > which seems closer to what I expect. > > * I'm a little unclear on the purpose of pg_get_wal_record(). What does > it offer that the other functions don't? My intention is to return the overall undecoded WAL record [5] i.e. the data starting from XLogReadRecord's output [6] till length XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed to have this function, I also mentioned a possible use-case there. pg_get_wal_record_info returns the main data of the WAL record (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update and so on). > * I don't think we need the stats at all. We can run GROUP BY queries > on the results of pg_get_wal_records_info(). As identified in [1], SQL-version of stats function is way slower in normal cases, hence it was agreed (by Andres, Kyotaro-san and myself) to have a C-function for stats. > * Include the xlinfo portion of the wal record in addition to the rmgr, > like pg_waldump --stats=record shows. That way we can GROUP BY that as > well. > > 5. For pg_get_wal_record and pg_get_wal_records, also return the xlinfo > using rm_identify() if available. Yes, that's already part of the description column (much like pg_waldump does) and the users can still do it with GROUP BY and HAVING clauses, see [4]. > * I don't think we need the changes to xlogutils.c. You calculate the > end pointer based on the flush pointer, anyway, so we should never need > to wait (and when I take it out, the tests still pass). > > 6. Remove changes to xlogutils. As mentioned in [1], read_local_xlog_page_no_wait required because the functions can still wait in read_local_xlog_page for WAL while finding the first valid record after the given input LSN (the use case is simple - just provide input LSN closer to server's current flush LSN, may be off by 3 or 4 bytes). Also, I tried to keep the changes minimal with the read_local_xlog_page_guts static function. IMO, that shouldn't be a problem. > I think we can radically simplify it without losing functionality, > unless I'm missing something. > > 1. Eliminate pg_get_wal_record(), > pg_get_wal_records_info_till_end_of_wal(), pg_get_wal_stats(), > pg_get_wal_stats_till_end_of_wal(). > > 4. For pg_get_wal_records, if end_lsn is NULL, read until the end of > WAL. It's pretty much clear to the users with till_end_of_wal functions instead of cooking many things into the same functions with default values for input LSNs as NULL which also requires the functions to be "CALLED ON NULL INPUT" types which isn't good. This was also suggested by Andres, see [2], and I agree with it. > 2. Rename pg_get_wal_record_info -> pg_get_wal_record > > 3. Rename pg_get_wal_records_info -> pg_get_wal_records As these functions aren't returning the WAL record data, but info about it (decoded data), I would like to retain the function names as-is. > 8. With only two functions in the API, it may even make sense to just > make it a part of postgres rather than a separate module. As said above, I would like to have till_end_of_wal versions. Firstly, pg_walinspect functions may not be needed by everyone, the extension provides a way for the users to install if required. Also, many hackers have suggested new functions [3], but, right now the idea is to get pg_walinspect onboard with simple-yet-useful functions and then think of extending it with new functions later. [1] https://www.postgresql.org/message-id/CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR%2BQRbuOw%40mail.gmail.com [2] https://www.postgresql.org/message-id/20220322180006.hgbsoldgbljyrcm7%40alap3.anarazel.de [3] There are many functions we can add to pg_walinspect - functions with wait mode for future WAL, WAL parsing, function to return all the WAL record info/stats given a WAL file name, functions to return WAL info/stats from historic timelines as well, function to see if the given WAL file is valid and so on. [4] postgres=# select count(resource_manager), description, from pg_get_wal_records_info('0/14E0568', '0/14F2568') group by description having description like '%INSERT_LEAF%'; count | description -------+--------------------- 7 | INSERT_LEAF off 108 1 | INSERT_LEAF off 111 1 | INSERT_LEAF off 135 1 | INSERT_LEAF off 142 3 | INSERT_LEAF off 143 1 | INSERT_LEAF off 144 1 | INSERT_LEAF off 145 1 | INSERT_LEAF off 146 1 | INSERT_LEAF off 274 1 | INSERT_LEAF off 405 (10 rows) [5] /* * The overall layout of an XLOG record is: * Fixed-size header (XLogRecord struct) * XLogRecordBlockHeader struct * XLogRecordBlockHeader struct * ... * XLogRecordDataHeader[Short|Long] struct * block data * block data * ... * main data [6] XLogRecord * XLogReadRecord(XLogReaderState *state, char **errormsg) { decoded = XLogNextRecord(state, errormsg); if (decoded) { /* * This function returns a pointer to the record's header, not the * actual decoded record. The caller will access the decoded record * through the XLogRecGetXXX() macros, which reach the decoded * recorded as xlogreader->record. */ Assert(state->record == decoded); return &decoded->header; } [7] https://www.postgresql.org/message-id/20220322180006.hgbsoldgbljyrcm7%40alap3.anarazel.de Regards, Bharath Rupireddy.
On Mon, 2022-04-04 at 09:15 +0530, Bharath Rupireddy wrote: > My intention is to return the overall undecoded WAL record [5] i.e. > the data starting from XLogReadRecord's output [6] till length > XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed > to have this function, I also mentioned a possible use-case there. The current patch does not actually do this: it's returning a pointer into the DecodedXLogRecord struct, which doesn't have the raw bytes of the WAL record. To return the raw bytes of the record is not entirely trivial: it seems we have to look in the decoded record and either find a pointer into readBuf, or readRecordBuf, depending on whether the record crosses a boundary or not. If we find a good way to do this I'm fine keeping the function, but if not, we can leave it for v16. > pg_get_wal_record_info returns the main data of the WAL record > (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update > and so on). We also discussed just removing the main data from the output here. It's not terribly useful, and could be arbitrarily large. Similar to how we leave out the backup block data and images. > As identified in [1], SQL-version of stats function is way slower in > normal cases, hence it was agreed (by Andres, Kyotaro-san and myself) > to have a C-function for stats.a pointer into Now I agree. We should also have an equivalent of "pg_waldump -- stats=record" though, too. > Yes, that's already part of the description column (much like > pg_waldump does) and the users can still do it with GROUP BY and > HAVING clauses, see [4]. I still think an extra column for the results of rm_identify() would make sense. Not critical, but seems useful. > As mentioned in [1], read_local_xlog_page_no_wait required because > the > functions can still wait in read_local_xlog_page for WAL while > finding > the first valid record after the given input LSN (the use case is > simple - just provide input LSN closer to server's current flush LSN, > may be off by 3 or 4 bytes). Did you look into using XLogReadAhead() rather than XLogReadRecord()? > It's pretty much clear to the users with till_end_of_wal functions > instead of cooking many things into the same functions with default > values for input LSNs as NULL which also requires the functions to be > "CALLED ON NULL INPUT" types which isn't good. This was also > suggested > by Andres, see [2], and I agree with it. OK, it's a matter of taste I suppose. I don't have a strong opinion. > > 2. Rename pg_get_wal_record_info -> pg_get_wal_record > > > > 3. Rename pg_get_wal_records_info -> pg_get_wal_records > > As these functions aren't returning the WAL record data, but info > about it (decoded data), I would like to retain the function names > as-is. The name pg_get_wal_records_info bothers me slightly, but I don't have a better suggestion. One other thought: functions like pg_logical_emit_message() return an LSN, but if you feed that into pg_walinspect you get the *next* record. That makes sense because pg_logical_emit_message() returns the result of XLogInsertRecord(), which is the end of the last inserted record. But it can be slightly annoying/confusing. I don't have any particular suggestion, but maybe it's worth a mention in the docs or something? Regards, Jeff Davis
On Wed, Apr 6, 2022 at 10:32 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Mon, 2022-04-04 at 09:15 +0530, Bharath Rupireddy wrote: > > My intention is to return the overall undecoded WAL record [5] i.e. > > the data starting from XLogReadRecord's output [6] till length > > XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed > > to have this function, I also mentioned a possible use-case there. > > The current patch does not actually do this: it's returning a pointer > into the DecodedXLogRecord struct, which doesn't have the raw bytes of > the WAL record. > > To return the raw bytes of the record is not entirely trivial: it seems > we have to look in the decoded record and either find a pointer into > readBuf, or readRecordBuf, depending on whether the record crosses a > boundary or not. If we find a good way to do this I'm fine keeping the > function, but if not, we can leave it for v16. With no immediate use of raw WAL data without a WAL record parsing function, I'm dropping that function for now. > > pg_get_wal_record_info returns the main data of the WAL record > > (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update > > and so on). > > We also discussed just removing the main data from the output here. > It's not terribly useful, and could be arbitrarily large. Similar to > how we leave out the backup block data and images. Done. > > As identified in [1], SQL-version of stats function is way slower in > > normal cases, hence it was agreed (by Andres, Kyotaro-san and myself) > > to have a C-function for stats.a pointer into > > Now I agree. We should also have an equivalent of "pg_waldump -- > stats=record" though, too. Added a parameter per_record (with default being false, emitting per-rmgr stats) to pg_get_wal_stats and pg_get_wal_stats_till_end_of_wal, when set returns per-record stats, much like pg_waldump. > > Yes, that's already part of the description column (much like > > pg_waldump does) and the users can still do it with GROUP BY and > > HAVING clauses, see [4]. > > I still think an extra column for the results of rm_identify() would > make sense. Not critical, but seems useful. Added rm_identify as record_type column in pg_get_wal_record_info, pg_get_wal_records_info, pg_get_wal_record_info_till_end_of_wal. Removed the rm_identify from the description column as it's unnecessary now here. > > As mentioned in [1], read_local_xlog_page_no_wait required because > > the > > functions can still wait in read_local_xlog_page for WAL while > > finding > > the first valid record after the given input LSN (the use case is > > simple - just provide input LSN closer to server's current flush LSN, > > may be off by 3 or 4 bytes). > > Did you look into using XLogReadAhead() rather than XLogReadRecord()? I don't think XLogReadAhead will help either, as it calls page_read callback, XLogReadAhead->XLogDecodeNextRecord->ReadPageInternal->page_read->read_local_xlog_page (which again waits for future WAL). Per our internal discussion, I'm keeping the read_local_xlog_page_no_wait as it offers a better solution without much code duplication. > The name pg_get_wal_records_info bothers me slightly, but I don't have > a better suggestion. IMO, pg_get_wal_records_info looks okay, hence didn't change it. > One other thought: functions like pg_logical_emit_message() return an > LSN, but if you feed that into pg_walinspect you get the *next* record. > That makes sense because pg_logical_emit_message() returns the result > of XLogInsertRecord(), which is the end of the last inserted record. > But it can be slightly annoying/confusing. I don't have any particular > suggestion, but maybe it's worth a mention in the docs or something? Yes, all the pg_walinspect functions would find the next valid WAL record to the input/start LSN and start returning the details from then. IMO, the descriptions of these functions have already specified it: pg_get_wal_record_info Gets WAL record information of a given LSN. If the given LSN isn't containing a valid WAL record, it gives the information of the next available valid WAL record. This function emits an error if a future (the all other functions say this: Gets information/statistics of all the valid WAL records between/from Attaching v17 patch-set with the above review comments addressed. Please have a look at it. Regards, Bharath Rupireddy.
Вложения
On Wed, Apr 6, 2022 at 2:15 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Attaching v17 patch-set with the above review comments addressed. > Please have a look at it. Had to rebase because of 5c279a6d350 (Custom WAL Resource Managers.). Please find v18 patch-set. Regards, Bharath Rupireddy.
Вложения
On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I am facing the below doc build failure on my machine due to this work: ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml"> Tabs appear in SGML/XML files make: *** [check-tabs] Error 1 The attached patch fixes this for me. -- With Regards, Amit Kapila.
Вложения
On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I am facing the below doc build failure on my machine due to this work: > > ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml"> > Tabs appear in SGML/XML files > make: *** [check-tabs] Error 1 > > The attached patch fixes this for me. Thanks. It looks like there's a TAB in between. Your patch LGTM. I'm wondering why this hasn't been caught in the build farm members (or it may have been found but I'm missing to locate it.). Can you please provide me with the doc build command to catch these kinds of errors? Regards, Bharath Rupireddy.
On Mon, Apr 11, 2022 at 6:33 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > I am facing the below doc build failure on my machine due to this work: > > > > ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml"> > > Tabs appear in SGML/XML files > > make: *** [check-tabs] Error 1 > > > > The attached patch fixes this for me. > > Thanks. It looks like there's a TAB in between. Your patch LGTM. > > I'm wondering why this hasn't been caught in the build farm members > (or it may have been found but I'm missing to locate it.). > > Can you please provide me with the doc build command to catch these > kinds of errors? > Nothing special. In the doc/src/sgml, I did make clean followed by make check. -- With Regards, Amit Kapila.