Обсуждение: pg_walinspect - a new extension to get raw WAL data and WAL stats

Поиск

Список

Период

Сортировка

pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

08 сентября 2021 г., 16:48:08

Hi,

While working on one of the internal features, we found that it is a
bit difficult to run pg_waldump for a normal user to know WAL info and
stats of a running postgres database instance in the cloud. Many a
times users or DBAs or developers would want to get and analyze
following:
1) raw WAL record associated with an LSN or raw WAL records between a
start LSN and end LSN for feeding to some other functionality
2) WAL statistics associated with an LSN or between start LSN and end
LSN for debugging or analytical purposes. The WAL stats are the number
of inserts, updates, deletes, index inserts, commits, checkpoints,
aborts, wal record sizes, FPI (Full Page Image) count etc. which are
basically everything that we get with pg_waldump --stats option plus
some other information as we may feel will be useful.

An available option is to use pg_waldump, a standalone program
emitting human readable WAL info into a standard output/file. This
works well when users have access to the system on which postgres is
running. But for a postgres database instance running in the cloud
environments, starting the pg_waldump, fetching and presenting its
output to the users in a structured way may be a bit hard to do.

How about we create a new extension, called pg_walinspect (synonymous
to pageinspect extension) with a bunch of SQL-callable functions that
get the raw WAL records or stats out of a running postgres database
instance in a more structured way that is easily consumable by all the
users or DBAs or developers? We can also provide these functionalities
into the core postgres (in xlogfuncs.c) instead of a new extension,
but we would like it to be pluggable so that the functions will be
used only if required.

[1] shows a rough sketch of the functions that the new pg_walinspect
extension can provide. These are not exhaustive; we can
add/remove/modify as we move further.

We would like to invite more thoughts from the hackers.

Credits: Thanks to Satya Narlapuram, Chen Liang (for some initial
work), Tianyu Zhang and Ashutosh Sharma (copied in cc) for internal
discussions.

[1]
a)  bytea pg_get_wal_record(pg_lsn lsn); and bytea
pg_get_wal_record(pg_lsn lsn, text wal_dir); - Returns a single row of
raw WAL record of bytea type. WAL data is read from pg_wal or
specified wal_dir directory.

b)  bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn); and
bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn, text
wal_dir); - Returns multiple rows of raw WAL records of bytea type,
one row per each WAL record. WAL data is read from pg_wal or specified
wal_dir directory.

CREATE TYPE walinspect_stats_type AS (stat1, stat2, stat3 …. statN);
c)  walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn); and
walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn, text wal_dir); -
Returns a single row of WAL record’s stats of walinspect_stats_type
type. WAL data is read from pg_wal or specified wal_dir directory.

d)  walinspect_stats_type[] pg_get_wal_stats(pg_lsn start_lsn, pg_lsn
end_lsn); and walinspect_stats_type[] pg_get_wal_stats(pg_lsn
start_lsn, pg_lsn end_lsn, text wal_dir); - Returns multiple rows of
WAL record stats of walinspect_stats_type type, one row per each WAL
record. WAL data is read from pg_wal or specified wal_dir directory.

e)  walinspect_stats_type  pg_get_wal_stats(bytea wal_record); -
Returns a single row of provided WAL record (wal_record) stats.

f)  walinspect_stats_type pg_get_wal_stats_aggr(pg_lsn start_lsn,
pg_lsn end_lsn); and walinspect_stats_type
pg_get_wal_stats_aggr(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir);
- Returns a single row of aggregated stats of all the WAL records
between start_lsn and end_lsn. WAL data is read from pg_wal or
specified wal_dir directory.

CREATE TYPE walinspect_lsn_range_type AS (pg_lsn start_lsn, pg_lsn end_lsn);
g)  walinspect_lsn_range_type   walinspect_get_lsn_range(text
wal_dir); - Returns a single row of start LSN and end LSN of the WAL
records available under pg_wal or specified wal_dir directory.

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

"Bossart, Nathan"

Дата:

10 сентября 2021 г., 01:49:46

On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> How about we create a new extension, called pg_walinspect (synonymous
> to pageinspect extension) with a bunch of SQL-callable functions that
> get the raw WAL records or stats out of a running postgres database
> instance in a more structured way that is easily consumable by all the
> users or DBAs or developers? We can also provide these functionalities
> into the core postgres (in xlogfuncs.c) instead of a new extension,
> but we would like it to be pluggable so that the functions will be
> used only if required.

+1

Nathan

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Michael Paquier

Дата:

10 сентября 2021 г., 04:51:21

On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote:
> +1

A backend approach has the advantage that you can use the proper locks
to make sure that a segment is not recycled or removed by a concurrent
checkpoint, so that would be reliable.
--
Michael

Вложения

signature.asc

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

"Drouvot, Bertrand"

Дата:

10 сентября 2021 г., 10:04:10

On 9/10/21 12:49 AM, Bossart, Nathan wrote:
> On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
>> How about we create a new extension, called pg_walinspect (synonymous
>> to pageinspect extension) with a bunch of SQL-callable functions that
>> get the raw WAL records or stats out of a running postgres database
>> instance in a more structured way that is easily consumable by all the
>> users or DBAs or developers? We can also provide these functionalities
>> into the core postgres (in xlogfuncs.c) instead of a new extension,
>> but we would like it to be pluggable so that the functions will be
>> used only if required.
> +1
>
> Nathan
>
+1

Bertrand

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

10 сентября 2021 г., 17:29:48

On Fri, Sep 10, 2021 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote:
> > +1
>
> A backend approach has the advantage that you can use the proper locks
> to make sure that a segment is not recycled or removed by a concurrent
> checkpoint, so that would be reliable.

Thanks for sharing your thoughts. IMO, using locks for showing WAL
stats isn't a good way, because these new functions may block the
checkpointer from removing/recycling the WAL files.  We don't want to
do that. If at all, user has asked stats of  an LSN/range of LSNs if
it is/they are available in the pg_wal directory, we provide the info
otherwise we can throw warnings/errors. This behaviour is pretty much
in sycn with what pg_waldump does right now.

And, some users may not need these new functions at all, so in such
cases going with an extension way makes it more usable.

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bruce Momjian

Дата:

06 октября 2021 г., 01:07:07

On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> While working on one of the internal features, we found that it is a
> bit difficult to run pg_waldump for a normal user to know WAL info and
> stats of a running postgres database instance in the cloud. Many a
> times users or DBAs or developers would want to get and analyze
> following:

Uh, are we going to implement everything that is only available at the
command line as an extension just for people who are using managed cloud
services where the command line is not available and the cloud provider
has not made that information accessible?  Seems like this might lead to
a lot of duplicated effort.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Chapman Flack

Дата:

06 октября 2021 г., 01:22:19

On 10/05/21 18:07, Bruce Momjian wrote:
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services

One extension that runs a curated menu of command-line tools for you
and returns their stdout?

Regards,
-Chap

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Jeremy Schneider

Дата:

06 октября 2021 г., 01:30:07

On 10/5/21 15:07, Bruce Momjian wrote:
> On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
>> While working on one of the internal features, we found that it is a
>> bit difficult to run pg_waldump for a normal user to know WAL info and
>> stats of a running postgres database instance in the cloud. Many a
>> times users or DBAs or developers would want to get and analyze
>> following:
> 
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services where the command line is not available and the cloud provider
> has not made that information accessible?  Seems like this might lead to
> a lot of duplicated effort.

No. For most command line utilities, there's no reason to expose them in
SQL or they already are exposed in SQL. For example, everything in
pg_controldata is already available via SQL functions.

Specifically exposing pg_waldump functionality in SQL could speed up
finding bugs in the PG logical replication code. We found and fixed a
few over this past year, but there are more lurking out there.

Having the extension in core is actually the only way to avoid
duplicated effort, by having shared code which the pg_dump binary and
the extension both wrap or call.

-Jeremy

-- 
http://about.me/jeremy_schneider

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bruce Momjian

Дата:

06 октября 2021 г., 03:38:51

On Tue, Oct  5, 2021 at 06:22:19PM -0400, Chapman Flack wrote:
> On 10/05/21 18:07, Bruce Momjian wrote:
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services
> 
> One extension that runs a curated menu of command-line tools for you
> and returns their stdout?

Yes, that would make sense, and something the cloud service providers
would write.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bruce Momjian

Дата:

06 октября 2021 г., 03:43:10

On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> On 10/5/21 15:07, Bruce Momjian wrote:
> > On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> >> While working on one of the internal features, we found that it is a
> >> bit difficult to run pg_waldump for a normal user to know WAL info and
> >> stats of a running postgres database instance in the cloud. Many a
> >> times users or DBAs or developers would want to get and analyze
> >> following:
> > 
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services where the command line is not available and the cloud provider
> > has not made that information accessible?  Seems like this might lead to
> > a lot of duplicated effort.
> 
> No. For most command line utilities, there's no reason to expose them in
> SQL or they already are exposed in SQL. For example, everything in
> pg_controldata is already available via SQL functions.

That's a good example.

> Specifically exposing pg_waldump functionality in SQL could speed up
> finding bugs in the PG logical replication code. We found and fixed a
> few over this past year, but there are more lurking out there.

Uh, why is pg_waldump more important than other command line tool
information?

> Having the extension in core is actually the only way to avoid
> duplicated effort, by having shared code which the pg_dump binary and
> the extension both wrap or call.

Uh, aren't you duplicating code by having pg_waldump as a command-line
tool and an extension?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Jeremy Schneider

Дата:

06 октября 2021 г., 19:56:34

On 10/5/21 17:43, Bruce Momjian wrote:
> On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
>> Specifically exposing pg_waldump functionality in SQL could speed up
>> finding bugs in the PG logical replication code. We found and fixed a
>> few over this past year, but there are more lurking out there.
> 
> Uh, why is pg_waldump more important than other command line tool
> information?

Going down the list of all other utilities in src/bin:

* pg_amcheck, pg_config, pg_controldata: already available in SQL
* psql, pgbench, pg_dump: already available as client-side apps
* initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
possible use case outside server OS access, most of these are too low
level and don't even make sense in SQL
* pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
don't feel any deep interest myself

Speaking selfishly, there are a few reasons I would be specifically
interested in pg_waldump (the only remaining one on the list).

.

First, to better support existing features around logical replication
and decoding.

In particular, it seems inconsistent to me that all the replication
management SQL functions take LSNs as arguments - and yet there's no
SQL-based way to find the LSNs that you are supposed to pass into these
functions.

https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION

Over the past few years I've been pulled in to help several large PG
users who ran into these bugs, and it's very painful - because the only
real remediation is to drop and recreate the replication slot, which
means either re-copying all the data to the downstream system or
figuring out a way to resync it. Some PG users have 3rd party tools like
HVR which can do row-by-row resync (IIUC), but no matter how you slice
it, we're talking about a lot of pain for people replicating large data
sets between multiple systems. In most cases, the only/best option even
with very large tables is to just make a fresh copy of the data - which
can translate to a business outage of hours or even days.

My favorite example is the SQL function pg_replication_slot_advance() -
this could really help PG users find less painful solutions to broken
decoding, however it's not really possible to /know/ an LSN to advance
to without inspecting WAL. ISTM there's a strong use case here for a SQL
interface on WAL inspection.

.

Second: debugging and troubleshooting logical replication and decoding bugs.

I helped track down a few logical replication bugs and get fixed into
code at postgresql.org this past year. (But I give credit to others who
are much better at C than I am, and who did a lot more work than I did
on these bugs!)

Logical decoding bugs are some of the hardest to fix - because all you
have is a WAL stream, but you don't know the SQL or workload patterns or
PG code paths which created that WAL stream.

Dumping the WAL - knowing which objects and which types of operations
are involved and stats like number of changes or number of
subtransactions - this helps identify which transaction and what SQL in
the application triggered the failure, which can help find short-term
workarounds. Businesses need those short-term workarounds so they can
keep going while we work on finding and fixing bugs, which can take some
time. This is another place where I think a SQL interface to WAL would
be helpful to PG users. Especially the ability to filter and trace a
single transaction through a large number of WAL files in the directory.

.

Third: statistics on WAL - especially full page writes. Giving users the
full power of SQL allows much more sophisticated analysis of the WAL
records. Personally, I'd probably find myself importing all the WAL
stats into the DB anyway because SQL is my preferred data manipulation
language.

>> Having the extension in core is actually the only way to avoid
>> duplicated effort, by having shared code which the pg_dump binary and
>> the extension both wrap or call.
> 
> Uh, aren't you duplicating code by having pg_waldump as a command-line
> tool and an extension?

Well this whole conversation is just theoretical anyway until the code
is shared.  :)  But if Bharath is writing functions to decode WAL, then
wouldn't we just have pg_waldump use these same functions in order to
avoid duplicating code?

Bharath - was some code already posted and I just missed it?

Looking at the proposed API from the initial email, I like that there's
both stats functionality and WAL record inspection functionality
(similar to pg_waldump). I like that there's the ability to pull the
records as raw bytea data, however I think we're also going to want an
ability in v1 of the patch to decode it (similar to pageinspect
heap_page_item_attrs, etc).

Another feature that might be interesting down the road would be the
ability to provide filtering of WAL records for security purposes. For
example, allowing a user to only dump raw WAL records for one particular
database, or maybe excluding WAL records that change system catalogs or
the like. But I probably wouldn't start here, personally.

Now then.... as Blaise Pascal said in 1657 (and as was also said by
Winston Churchill, Mark Twain, etc).... "I'm sorry I wrote you such a
long letter; I didn't have time to write a short one."

-Jeremy

-- 
http://about.me/jeremy_schneider

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bruce Momjian

Дата:

06 октября 2021 г., 20:19:23

On Wed, Oct  6, 2021 at 09:56:34AM -0700, Jeremy Schneider wrote:
> On 10/5/21 17:43, Bruce Momjian wrote:
> > On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> >> Specifically exposing pg_waldump functionality in SQL could speed up
> >> finding bugs in the PG logical replication code. We found and fixed a
> >> few over this past year, but there are more lurking out there.
> > 
> > Uh, why is pg_waldump more important than other command line tool
> > information?
> 
> Going down the list of all other utilities in src/bin:
> 
> * pg_amcheck, pg_config, pg_controldata: already available in SQL
> * psql, pgbench, pg_dump: already available as client-side apps
> * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
> pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
> possible use case outside server OS access, most of these are too low
> level and don't even make sense in SQL
> * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
> don't feel any deep interest myself
> 
> Speaking selfishly, there are a few reasons I would be specifically
> interested in pg_waldump (the only remaining one on the list).

This is the analysis I was looking for to understand if copying the
features of command-line tools in extensions was a wise direction.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Alvaro Herrera

Дата:

06 октября 2021 г., 20:23:33

On 2021-Oct-06, Jeremy Schneider wrote:

> Well this whole conversation is just theoretical anyway until the code
> is shared.  :)  But if Bharath is writing functions to decode WAL, then
> wouldn't we just have pg_waldump use these same functions in order to
> avoid duplicating code?

Actually, a lot of the code is already shared, since the rmgrdesc
routines are in src/backend.  Keep in mind that it was there before
pg_xlogdump existed, to support WAL_DEBUG.  When pg_xlogdump was added,
what we did was allow that backend-only code be compilable in a frontend
environment.  Also, we already have xlogreader.

So pg_waldump itself is mostly scaffolding to let the frontend
environment get argument values to pass to backend-enabled code.  The
only really interesting, novel thing is the --stats mode ... and I bet
you can write that with some SQL-level aggregation of the raw data, no
need for any C code.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

07 октября 2021 г., 08:13:14

On Wed, Oct 6, 2021 at 10:26 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
>
> On 10/5/21 17:43, Bruce Momjian wrote:
> > On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> >> Specifically exposing pg_waldump functionality in SQL could speed up
> >> finding bugs in the PG logical replication code. We found and fixed a
> >> few over this past year, but there are more lurking out there.
> >
> > Uh, why is pg_waldump more important than other command line tool
> > information?
>
> Going down the list of all other utilities in src/bin:
>
> * pg_amcheck, pg_config, pg_controldata: already available in SQL
> * psql, pgbench, pg_dump: already available as client-side apps
> * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
> pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
> possible use case outside server OS access, most of these are too low
> level and don't even make sense in SQL
> * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
> don't feel any deep interest myself
>
> Speaking selfishly, there are a few reasons I would be specifically
> interested in pg_waldump (the only remaining one on the list).

Thanks Jeremy for the analysis.

> First, to better support existing features around logical replication
> and decoding.
>
> In particular, it seems inconsistent to me that all the replication
> management SQL functions take LSNs as arguments - and yet there's no
> SQL-based way to find the LSNs that you are supposed to pass into these
> functions.
>
> https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION
>
> Over the past few years I've been pulled in to help several large PG
> users who ran into these bugs, and it's very painful - because the only
> real remediation is to drop and recreate the replication slot, which
> means either re-copying all the data to the downstream system or
> figuring out a way to resync it. Some PG users have 3rd party tools like
> HVR which can do row-by-row resync (IIUC), but no matter how you slice
> it, we're talking about a lot of pain for people replicating large data
> sets between multiple systems. In most cases, the only/best option even
> with very large tables is to just make a fresh copy of the data - which
> can translate to a business outage of hours or even days.
>
> My favorite example is the SQL function pg_replication_slot_advance() -
> this could really help PG users find less painful solutions to broken
> decoding, however it's not really possible to /know/ an LSN to advance
> to without inspecting WAL. ISTM there's a strong use case here for a SQL
> interface on WAL inspection.
>
> Second: debugging and troubleshooting logical replication and decoding bugs.
>
> I helped track down a few logical replication bugs and get fixed into
> code at postgresql.org this past year. (But I give credit to others who
> are much better at C than I am, and who did a lot more work than I did
> on these bugs!)
>
> Logical decoding bugs are some of the hardest to fix - because all you
> have is a WAL stream, but you don't know the SQL or workload patterns or
> PG code paths which created that WAL stream.
>
> Dumping the WAL - knowing which objects and which types of operations
> are involved and stats like number of changes or number of
> subtransactions - this helps identify which transaction and what SQL in
> the application triggered the failure, which can help find short-term
> workarounds. Businesses need those short-term workarounds so they can
> keep going while we work on finding and fixing bugs, which can take some
> time. This is another place where I think a SQL interface to WAL would
> be helpful to PG users. Especially the ability to filter and trace a
> single transaction through a large number of WAL files in the directory.
>
> Third: statistics on WAL - especially full page writes. Giving users the
> full power of SQL allows much more sophisticated analysis of the WAL
> records. Personally, I'd probably find myself importing all the WAL
> stats into the DB anyway because SQL is my preferred data manipulation
> language.

Just to add to the above points, with the new extension pg_walinspect
we will have following advantages:
1) Usability - SQL callable functions will be easier to use for the
users/admins/developers.
2) Access Control - we can provide better access control for the WAL data/stats.
3) Emitting the actual WAL data(as bytea structure) and stats via SQL
callable functions will help to analyze and answer questions like how
much WAL data is being generated in the system, what kind of WAL data
it is, how many FPWs are happening and so on. Jermey has already given
more realistic use cases.
4) I came across this -  there's a similar capability in SQL server -
https://www.mssqltips.com/sqlservertip/3076/how-to-read-the-sql-server-database-transaction-log/

> >> Having the extension in core is actually the only way to avoid
> >> duplicated effort, by having shared code which the pg_dump binary and
> >> the extension both wrap or call.
> >
> > Uh, aren't you duplicating code by having pg_waldump as a command-line
> > tool and an extension?
>
> Well this whole conversation is just theoretical anyway until the code
> is shared.  :)  But if Bharath is writing functions to decode WAL, then
> wouldn't we just have pg_waldump use these same functions in order to
> avoid duplicating code?
>
> Bharath - was some code already posted and I just missed it?
>
> Looking at the proposed API from the initial email, I like that there's
> both stats functionality and WAL record inspection functionality
> (similar to pg_waldump). I like that there's the ability to pull the
> records as raw bytea data, however I think we're also going to want an
> ability in v1 of the patch to decode it (similar to pageinspect
> heap_page_item_attrs, etc).

I'm yet to start working on the patch. I will be doing it soon.

> Another feature that might be interesting down the road would be the
> ability to provide filtering of WAL records for security purposes. For
> example, allowing a user to only dump raw WAL records for one particular
> database, or maybe excluding WAL records that change system catalogs or
> the like. But I probably wouldn't start here, personally.

+1.

Regards,
Bharath Rupireddy.

От

Bharath Rupireddy

Дата:

04 января 2022 г., 19:31:42

On Thu, Nov 25, 2021 at 5:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Nov 25, 2021 at 3:49 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the
documentationpart now, I will be doing it a bit later.
 
> > >
> > > Please feel free to review and provide your thoughts.
> >
> > The v1 patch set was failing to compile on Windows. Here's the v2
> > patch set fixing that.
>
> I forgot to specify this: the v1 patch set was failing to compile on
> Windows with errors shown at [1]. Thanks to Julien Rouhaud who
> suggested to use PGDLLIMPORT in an off-list discussion.
>
> [1] (Link target) ->
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> forkNames [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> pg_comp_crc32c [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> wal_segment_size [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> RmgrTable [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   .\Release\pg_walinspect\pg_walinspect.dll : fatal error LNK1120: 4
> unresolved externals [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>
>     5 Error(s)

Here's the v3 patch-set with fixes for the compiler warnings reported
in the cf bot at
https://cirrus-ci.com/task/4979131497578496?logs=gcc_warning#L506.

Please review.

Regards,
Bharath Rupireddy.

On Sun, Feb 6, 2022 at 9:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jan 31, 2022 at 4:40 PM Greg Stark <stark@mit.edu> wrote:
> > So I looked at this patch and I have the same basic question as Bruce.
> > Do we really want to expose every binary tool associated with Postgres
> > as an extension? Obviously this is tempting for cloud provider users
> > which is not an unreasonable argument. But it does have consequences.
> >
> > 1) Some things like pg_waldump are running code that is not normally
> > under user control. This could have security issues or reliability
> > issues.
>
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL. And if you are, you may
> want to have the database tell you what's going on via SQL rather than
> command-line tools or operating system utilities. Imagine if we didn't
> have pg_stat_activity and you had to get that information by running a
> separate binary. Would anyone like that? Why is this case any
> different?
>
> A few years ago we exposed data from pg_control via SQL and similar
> concerns were raised - but it turns out to be pretty useful. I don't
> know why this shouldn't be equally useful. Sure, there's some
> duplication in functionality, but it's not a huge maintenance burden
> for the project, and people (including me) like having it available. I
> think the same things will be true here.
>
> If decoding WAL causes security problems, that's something we better
> fix, because WAL is constantly decoded on standbys and via logical
> decoding on systems all over the place. I agree that we can't let
> users supply their own hand-crafted WAL records to be decoded without
> causing more trouble than we can handle, but if it's not safe to
> decode the WAL the system generated than we are in a lot of trouble
> already.
>
> I hasten to say that I'm not endorsing every detail or indeed any
> detail of the proposed patch, and some of the concerns you mention
> later sound well-founded to me. But I disagree with the idea that we
> shouldn't have both a command-line utility that roots through files on
> disk and an SQL interface that works with a running system.

Thanks Robert for your comments.

> + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",
>
> But that's kind of out of place for an SQL interface. It makes it hard
> to write queries since things like the relid, block number etc are in
> the string. If I wanted to use these functions I would expect to be
> doing something like "select all the decoded records pertaining to
> block n".

Thanks Greg for your review of the patches. Since there can be
multiple blkref for WAL record type HEAP2 (for multi inserts
basically) [1], I couldn't find a better way to break it and represent
it as a non-text column. IMO this is simpler and users can easily find
out answers to "how many WAL records my relation generated between
lsn1 and lsn2 or how many WAL records of type Heap exist and so on?",
see [2]. I've also added a test case to just show this in 0002 patch.

Here's the v4 patch set that has the following changes along with
Greg's review comments addressed:

1) Added documentation as 0003 patch.
2) Removed CHECKPOINT commands from tests as it is unnecessary.
3) Added input validation code and tests.
4) A few more comments have been added.
5) Currently, only superusers can create the extension, but users with
the pg_monitor role can use the functions.
6) Test cases are basic yet they cover all the functions, error cases
with input validations, I don't think we need to add many more test
cases as suggested upthread, but I'm open to add a few more if I miss
any use-case.

Please review the v4 patch set further and let me know your thoughts.

[1]
rmgr: Heap2       len (rec/tot):     64/  8256, tx:          0, lsn:
0/014A9070, prev 0/014A8FF8, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0 FPW, blkref #1: rel
1663/12757/16384 blk 0
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB0C8, prev 0/014A9070, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 1
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB108, prev 0/014AB0C8, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 2
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB148, prev 0/014AB108, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 3
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB188, prev 0/014AB148, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 4

[2]
postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98',
'0/0157A160') where block_ref like '%16384%' and rmgr like 'Heap';
 count
-------
 10100
(1 row)

postgres=# select count(*) from t1;
 count
-------
 10100
(1 row)

postgres=#

postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98',
'0/0157A160') where block_ref like '%FPW%';
 count
-------
    78
(1 row)

postgres=#

Regards,
Bharath Rupireddy.

On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> I don't think that's the use case of this patch. Unless there is some
> other valid reason, I would suggest you remove it.

Removed the function pg_verify_raw_wal_record. Robert and Greg also
voted for removal upthread.

> > > Should we add a function that returns the pointer to the first and
> > > probably the last WAL record in the WAL segment? This would help users
> > > to inspect the wal records in the entire wal segment if they wish to
> > > do so.
> >
> > Good point. One can do this already with pg_get_wal_records_info and
> > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > idea about the WAL file it is in.
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> > limit 1;
> >     lsn    |    pg_walfile_name_offset
> > -----------+-------------------------------
> >  0/5000038 | (000000010000000000000005,56)
> > (1 row)
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> > limit 1;
> >     lsn    |       pg_walfile_name_offset
> > -----------+-------------------------------------
> >  0/5FFFFC0 | (000000010000000000000005,16777152)
> > (1 row)
> >
>
> The workaround you are suggesting is not very user friendly and FYKI
> pg_wal_records_info simply hangs at times when we specify the higher
> and lower limit of lsn in a wal file.
>
> To make things easier for the end users I would suggest we add a
> function that can return a valid first and last lsn in a walfile. The
> output of this function can be used to inspect the wal records in the
> entire wal file if they wish to do so and I am sure they will. So, it
> should be something like this:
>
> select first_valid_lsn, last_valid_lsn from
> pg_get_first_last_valid_wal_record('wal-segment-name');
>
> And above function can directly be used with pg_get_wal_records_info() like
>
> select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));
>
> I think this is a pretty basic ASK that we expect to be present in the
> module like this.

Added a new function that returns the first and last valid WAL record
LSN of a given WAL file.

> > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> > >
> > > I think we should allow all these functions to be executed in wait and
> > > *nowait* mode. If a user specifies nowait mode, the function should
> > > return if no WAL data is present, rather than waiting for new WAL data
> > > to become available, default behaviour could be anything you like.
> >
> > Currently, pg_walinspect uses read_local_xlog_page which waits in the
> > while(1) loop if a future LSN is specified. As read_local_xlog_page is
> > an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> > parameter, if we really need a wait/nowait mode behaviour, we need to
> > do extra things(either add a backend-level global wait variable, set
> > before XLogReadRecord, if set, read_local_xlog_page can just exit
> > without waiting and reset after the XLogReadRecord or add an extra
> > bool wait variable to XLogReaderState and use it in
> > read_local_xlog_page).
> >
>
> I am not asking to do any changes in the backend code. Please check -
> how pg_waldump does this when a user requests to stop once the endptr
> has reached. If not for all functions at least for a few functions we
> can do this if it is doable.

I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.

With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.

> > > +Datum
> > > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > > +{
> > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> > >
> > >
> > > We could probably have another variant of this function that would
> > > work even if the end pointer is not specified, in which case the
> > > default end pointer would be the last WAL record in the WAL segment.
> > > Currently it mandates the use of an end pointer which slightly reduces
> > > flexibility.
> >
> > Last WAL record in the WAL segment may not be of much use(one can
> > figure out the last valid WAL record in a wal file as mentioned
> > above), but the WAL records info till the latest current flush LSN of
> > the server would be a useful functionality. But that too, can be found
> > using something like "select lsn, prev_lsn, resource_manager from
> > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
> >
>
> What if a user wants to inspect all the valid wal records from a
> startptr (startlsn) and he doesn't know the endptr? Why should he/she
> be mandated to get the endptr and supply it to this function? I don't
> think we should force users to do that. I think this is again a very
> basic ASK that can be done in this version itself. It is not at all
> any advanced thing that we can think of doing in the future.

Agreed. Added new functions that emits wal records info/stats till the
end of the WAL at the moment.

> > > +
> > > +/*
> > > + * Get the first valid raw WAL record lsn.
> > > + */
> > > +Datum
> > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> > >
> > >
> > > I think this function should return a pointer to the nearest valid WAL
> > > record which can be the previous WAL record to the LSN entered by the
> > > user or the next WAL record. If a user unknowingly enters an lsn that
> > > does not exist then in such cases we should probably return the lsn of
> > > the previous WAL record instead of hanging or waiting for the new WAL
> > > record to arrive.
> >
> > Is it useful?
>
> It is useful in the same way as returning the next valid wal pointer
> is. Why should a user wait for the next valid wal pointer to be
> available instead the function should identify the previous valid wal
> record and return it and put an appropriate message to the user.
>
> If there's a strong reason, how about naming
> > pg_get_next_valid_wal_record_lsn returning the next valid wal record
> > LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> > valid wal record LSN ? If you think having two functions is too much,
> > then, how about pg_get_first_valid_wal_record_lsn returning both the
> > next valid wal record LSN and its previous wal record LSN?
> >
>
> The latter one looks better.

Modified.

Attaching v5 patch set, please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Ashutosh Sharma

Дата:

02 марта 2022 г., 17:41:52

Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

+--
+-- pg_get_wal_records_info()
+--
+CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
+    IN end_lsn pg_lsn,
+    IN wait_for_wal boolean DEFAULT false,
+    OUT lsn pg_lsn,

What does the wait_for_wal flag mean here when one has already
specified the start and end lsn? AFAIU, If a user has specified a
start and stop LSN, it means that the user knows the extent to which
he/she wants to display the WAL records in which case we need to stop
once the end lsn has reached . So what is the meaning of wait_for_wal
flag? Does it look sensible to have the wait_for_wal flag here? To me
it doesn't.

==

+--
+-- pg_get_wal_records_info_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
+    OUT lsn pg_lsn,
+    OUT prev_lsn pg_lsn,
+    OUT xid xid,

Why is this function required? Is pg_get_wal_records_info() alone not
enough? I think it is. See if we can make end_lsn optional in
pg_get_wal_records_info() and lets just have it alone. I think it can
do the job of pg_get_wal_records_info_till_end_of_wal function.

==

+--
+-- pg_get_wal_stats_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
+    OUT resource_manager text,
+    OUT count int8,

Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

==


+                       if (loc <= read_upto)
+                               break;
+
+                       /* Let's not wait for WAL to be available if
indicated */
+                       if (loc > read_upto &&
+                               state->private_data != NULL)
+                       {

Why loc > read_upto? The first if condition is (loc <= read_upto)
followed by the second if condition - (loc > read_upto). Is the first
if condition (loc <= read_upto) not enough to indicate that loc >
read_upto?

==

+#define IsEndOfWALReached(state) \
+               (state->private_data != NULL && \
+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->no_wait == true) && \
+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))


I think we should either use state or xlogreader. First line says
state->private_data and second line xlogreader->private_data.

==

+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))
+

There is a new patch coming to make the end of WAL messages less
scary. It introduces the EOW flag in xlogreaderstate maybe we can use
that instead of introducing new flags in private area to represent the
end of WAL.

==

+/*
+ * XLogReaderRoutine->page_read callback for reading local xlog files
+ *
+ * This function is same as read_local_xlog_page except that it works in both
+ * wait and no wait mode. The callers can specify about waiting in private_data
+ * of XLogReaderState.
+ */
+int
+read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
+                                          int reqLen, XLogRecPtr
targetRecPtr, char *cur_page)
+{
+       XLogRecPtr      read_upto,

Do we really need this function? Can't we make use of an existing WAL
reader function - read_local_xlog_page()?

--
With Regards,
Ashutosh Sharma.

On Fri, Feb 25, 2022 at 4:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > I don't think that's the use case of this patch. Unless there is some
> > other valid reason, I would suggest you remove it.
>
> Removed the function pg_verify_raw_wal_record. Robert and Greg also
> voted for removal upthread.
>
> > > > Should we add a function that returns the pointer to the first and
> > > > probably the last WAL record in the WAL segment? This would help users
> > > > to inspect the wal records in the entire wal segment if they wish to
> > > > do so.
> > >
> > > Good point. One can do this already with pg_get_wal_records_info and
> > > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > > idea about the WAL file it is in.
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> > > limit 1;
> > >     lsn    |    pg_walfile_name_offset
> > > -----------+-------------------------------
> > >  0/5000038 | (000000010000000000000005,56)
> > > (1 row)
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> > > limit 1;
> > >     lsn    |       pg_walfile_name_offset
> > > -----------+-------------------------------------
> > >  0/5FFFFC0 | (000000010000000000000005,16777152)
> > > (1 row)
> > >
> >
> > The workaround you are suggesting is not very user friendly and FYKI
> > pg_wal_records_info simply hangs at times when we specify the higher
> > and lower limit of lsn in a wal file.
> >
> > To make things easier for the end users I would suggest we add a
> > function that can return a valid first and last lsn in a walfile. The
> > output of this function can be used to inspect the wal records in the
> > entire wal file if they wish to do so and I am sure they will. So, it
> > should be something like this:
> >
> > select first_valid_lsn, last_valid_lsn from
> > pg_get_first_last_valid_wal_record('wal-segment-name');
> >
> > And above function can directly be used with pg_get_wal_records_info() like
> >
> > select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));
> >
> > I think this is a pretty basic ASK that we expect to be present in the
> > module like this.
>
> Added a new function that returns the first and last valid WAL record
> LSN of a given WAL file.
>
> > > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> > > >
> > > > I think we should allow all these functions to be executed in wait and
> > > > *nowait* mode. If a user specifies nowait mode, the function should
> > > > return if no WAL data is present, rather than waiting for new WAL data
> > > > to become available, default behaviour could be anything you like.
> > >
> > > Currently, pg_walinspect uses read_local_xlog_page which waits in the
> > > while(1) loop if a future LSN is specified. As read_local_xlog_page is
> > > an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> > > parameter, if we really need a wait/nowait mode behaviour, we need to
> > > do extra things(either add a backend-level global wait variable, set
> > > before XLogReadRecord, if set, read_local_xlog_page can just exit
> > > without waiting and reset after the XLogReadRecord or add an extra
> > > bool wait variable to XLogReaderState and use it in
> > > read_local_xlog_page).
> > >
> >
> > I am not asking to do any changes in the backend code. Please check -
> > how pg_waldump does this when a user requests to stop once the endptr
> > has reached. If not for all functions at least for a few functions we
> > can do this if it is doable.
>
> I've added a new function read_local_xlog_page_2 (similar to
> read_local_xlog_page but works in wait and no wait mode) and the
> callers can specify whether to wait or not wait using private_data.
> Actually, I wanted to use the private_data structure of
> read_local_xlog_page but the logical decoding already has context as
> private_data, that is why I had to have a new function. I know it
> creates a bit of duplicate code, but its cleaner than using
> backend-local variables or additional flags in XLogReaderState or
> adding wait/no-wait boolean to page_read callback. Any other
> suggestions are welcome here.
>
> With this, I'm able to have wait/no wait versions for any functions.
> But for now, I'm having wait/no wait for two functions
> (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> sense.
>
> > > > +Datum
> > > > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > > > +{
> > > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> > > >
> > > >
> > > > We could probably have another variant of this function that would
> > > > work even if the end pointer is not specified, in which case the
> > > > default end pointer would be the last WAL record in the WAL segment.
> > > > Currently it mandates the use of an end pointer which slightly reduces
> > > > flexibility.
> > >
> > > Last WAL record in the WAL segment may not be of much use(one can
> > > figure out the last valid WAL record in a wal file as mentioned
> > > above), but the WAL records info till the latest current flush LSN of
> > > the server would be a useful functionality. But that too, can be found
> > > using something like "select lsn, prev_lsn, resource_manager from
> > > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
> > >
> >
> > What if a user wants to inspect all the valid wal records from a
> > startptr (startlsn) and he doesn't know the endptr? Why should he/she
> > be mandated to get the endptr and supply it to this function? I don't
> > think we should force users to do that. I think this is again a very
> > basic ASK that can be done in this version itself. It is not at all
> > any advanced thing that we can think of doing in the future.
>
> Agreed. Added new functions that emits wal records info/stats till the
> end of the WAL at the moment.
>
> > > > +
> > > > +/*
> > > > + * Get the first valid raw WAL record lsn.
> > > > + */
> > > > +Datum
> > > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> > > >
> > > >
> > > > I think this function should return a pointer to the nearest valid WAL
> > > > record which can be the previous WAL record to the LSN entered by the
> > > > user or the next WAL record. If a user unknowingly enters an lsn that
> > > > does not exist then in such cases we should probably return the lsn of
> > > > the previous WAL record instead of hanging or waiting for the new WAL
> > > > record to arrive.
> > >
> > > Is it useful?
> >
> > It is useful in the same way as returning the next valid wal pointer
> > is. Why should a user wait for the next valid wal pointer to be
> > available instead the function should identify the previous valid wal
> > record and return it and put an appropriate message to the user.
> >
> > If there's a strong reason, how about naming
> > > pg_get_next_valid_wal_record_lsn returning the next valid wal record
> > > LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> > > valid wal record LSN ? If you think having two functions is too much,
> > > then, how about pg_get_first_valid_wal_record_lsn returning both the
> > > next valid wal record LSN and its previous wal record LSN?
> > >
> >
> > The latter one looks better.
>
> Modified.
>
> Attaching v5 patch set, please review it further.
>
> Regards,
> Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

02 марта 2022 г., 20:07:43

On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

Thanks for reviewing.

> +--
> +-- pg_get_wal_records_info()
> +--
> +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> +    IN end_lsn pg_lsn,
> +    IN wait_for_wal boolean DEFAULT false,
> +    OUT lsn pg_lsn,
>
> What does the wait_for_wal flag mean here when one has already
> specified the start and end lsn? AFAIU, If a user has specified a
> start and stop LSN, it means that the user knows the extent to which
> he/she wants to display the WAL records in which case we need to stop
> once the end lsn has reached . So what is the meaning of wait_for_wal
> flag? Does it look sensible to have the wait_for_wal flag here? To me
> it doesn't.

Users can always specify a future end_lsn and set wait_for_wal to
true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
wait for the WAL. IMO, this is useful. If you remember you were okay
with wait/nowait versions for some of the functions upthread [1]. I'm
not going to retain this behaviour for both
pg_get_wal_records_info/pg_get_wal_stats as it is similar to
pg_waldump's --follow option.

> ==
>
> +--
> +-- pg_get_wal_records_info_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> +    OUT lsn pg_lsn,
> +    OUT prev_lsn pg_lsn,
> +    OUT xid xid,
>
> Why is this function required? Is pg_get_wal_records_info() alone not
> enough? I think it is. See if we can make end_lsn optional in
> pg_get_wal_records_info() and lets just have it alone. I think it can
> do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> ==
>
> +--
> +-- pg_get_wal_stats_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> +    OUT resource_manager text,
> +    OUT count int8,
>
> Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

I'm doing the following input validations for these functions to not
cause any issues with invalid LSN. If I were to have the default value
for end_lsn as 0/0, I can't perform input validations right? That is
the reason I'm having separate functions {pg_get_wal_records_info,
pg_get_wal_stats}_till_end_of_wal() versions.

    /* Validate input. */
    if (XLogRecPtrIsInvalid(start_lsn))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("invalid WAL record start LSN")));

    if (XLogRecPtrIsInvalid(end_lsn))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("invalid WAL record end LSN")));

    if (start_lsn >= end_lsn)
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("WAL record start LSN must be less than end LSN")));

> ==
>
>
> +                       if (loc <= read_upto)
> +                               break;
> +
> +                       /* Let's not wait for WAL to be available if
> indicated */
> +                       if (loc > read_upto &&
> +                               state->private_data != NULL)
> +                       {
>
> Why loc > read_upto? The first if condition is (loc <= read_upto)
> followed by the second if condition - (loc > read_upto). Is the first
> if condition (loc <= read_upto) not enough to indicate that loc >
> read_upto?

Yeah, that's unnecessary, I improved the comment there and removed loc
> read_upto.

> ==
>
> +#define IsEndOfWALReached(state) \
> +               (state->private_data != NULL && \
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->no_wait == true) && \
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
>
> I think we should either use state or xlogreader. First line says
> state->private_data and second line xlogreader->private_data.

I've changed it to use state instead of xlogreader.

> ==
>
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
> +
>
> There is a new patch coming to make the end of WAL messages less
> scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> that instead of introducing new flags in private area to represent the
> end of WAL.

Yeah that would be great. But we never know which one gets committed
first. Until then it's not good to have dependencies on two "on-going"
patches. Later, we can change.

> ==
>
> +/*
> + * XLogReaderRoutine->page_read callback for reading local xlog files
> + *
> + * This function is same as read_local_xlog_page except that it works in both
> + * wait and no wait mode. The callers can specify about waiting in private_data
> + * of XLogReaderState.
> + */
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> +                                          int reqLen, XLogRecPtr
> targetRecPtr, char *cur_page)
> +{
> +       XLogRecPtr      read_upto,
>
> Do we really need this function? Can't we make use of an existing WAL
> reader function - read_local_xlog_page()?

I clearly explained the reasons upthread [2]. Please let me know if
you have more thoughts/doubts here, we can connect offlist.

Attaching v6 patch set with above review comments addressed. Please
review it further.

[1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);

I think we should allow all these functions to be executed in wait and
*nowait* mode. If a user specifies nowait mode, the function should
return if no WAL data is present, rather than waiting for new WAL data
to become available, default behaviour could be anything you like.

[2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.

With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.

Regards,
Bharath Rupireddy.

On Thu, Mar 3, 2022 at 10:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Added a new function that returns the first and last valid WAL record
> > LSN of a given WAL file.
>
> Sounds like fuzzy thinking to me. WAL records can cross file
> boundaries, and forgetting about that leads to all sorts of problems.
> Just give people one function that decodes a range of LSNs and call it
> good. Why do you need anything else? If people want to get the first
> record that begins in a segment or the first record any portion of
> which is in a particular segment or the last record that begins in a
> segment or the last record that ends in a segment or any other such
> thing, they can use a WHERE clause for that... and if you think they
> can't, then that should be good cause to rethink the return value of
> the one-and-only SRF that I think you need here.

Thanks Robert.

Thanks to others for your review comments.

Here's the v7 patch set. These patches are based on the motive that
"keep it simple and short yet effective and useful". With that in
mind, I have not implemented the wait mode for any of the functions
(as it doesn't look an effective use-case and requires adding a new
page_read callback, instead throw error if future LSN is specified),
also these functions will give WAL information on the current server's
timeline. Having said that, I'm open to adding new functions in future
once this initial version gets in, if there's a requirement and users
ask for the new functions.

Please review the v7 patch set and provide your thoughts.

Regards,
Bharath Rupireddy.

On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> >
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.

Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
upthread [1] which is having more simple-yet-useful-and-effective
functions.

> * Don't issue WARNINGs or other messages for ordinary situations, like
> when pg_get_wal_records_info() hits the end of WAL.

v7 patch-set [1] has no warnings, but the functions will error out if
future LSN is specified.

> * It feels like the APIs that allow waiting for the end of WAL are
> slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> behavior? Try to make the API more orthogonal, where a few basic
> functions can be combined to give you everything you need, rather than
> specifying extra parameters and issuing WARNINGs. I

v7 patch-set [1] onwards waiting mode has been removed for all of the
functions, again to keep things simple-yet-useful-and-effective.
However, we can always add new pg_walinspect functions that wait for
future WAL in the next versions once basic stuff gets committed and if
many users ask for it.

> * In the docs, include some example output. I don't see any output in
> the tests, which makes sense because it's mostly non-deterministic, but
> it would be helpful to see sample output of at least
> pg_get_wal_records_info().

+1. Added for pg_get_wal_records_info and pg_get_wal_stats.

> * Is pg_get_wal_stats() even necessary, or can you get the same
> information with a query over pg_get_wal_records_info()? For instance,
> if you want to group by transaction ID rather than rmgr, then
> pg_get_wal_stats() is useless.

Yes, you are right pg_get_wal_stats provides WAL stats per resource
manager which is similar to pg_waldump with --start, --end and --stats
option. It provides more information than pg_get_wal_records_info and
is a good way of getting stats than adding more columns to
pg_get_wal_records_info, calculating percentage in sql and having
group by clause. IMO, pg_get_wal_stats is more readable and useful.

> * Would be nice to have a pg_wal_file_is_valid() or similar, which
> would test that it exists, and the header matches the filename (e.g. if
> it was recycled but not used, that would count as invalid). I think
> pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> even if the file is valid -- for example, if a wal record spans many
> wal segments, the segments might look invalid because they contain no
> complete records, but the file itself is still valid and contains valid
> wal data.

Actually I haven't tried testing a single WAL record spanning many WAL
files yet(I'm happy to try it if someone suggests such a use-case). In
that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
have a problem because it just gives the next valid LSN and it's
previous LSN using existing WAL reader API XLogFindNextRecord(). It
opens up the WAL file segments using (some dots to connect -
page_read/read_local_xlog_page, WALRead,
segment_open/wal_segment_open). Thoughts?

I don't think it's necessary to have a function pg_wal_file_is_valid()
that given a WAL file name as input checks whether a WAL file exists
or not, probably not in the core (xlogfuncs.c) too. These kinds of
functions can open up challenges in terms of user input validation and
may cause unnecessary problems, please see some related discussion
[2].

> * Is there a reason you didn't include the timeline ID in
> pg_get_wal_records_info()?

I'm right now allowing the functions to read WAL from the current
server's timeline which I have mentioned in the docs. The server's
current timeline is available via pg_control_checkpoint()'s
timeline_id. So, having timeline_id as a column doesn't make sense.
Again this is to keep things simple-yet-useful-and-effective. However,
we can add new pg_walinspect functions to read WAL from historic as
well as current timelines in the next versions once basic stuff gets
committed and if many users ask for it.

+ <para>
+  All the functions of this module will provide the WAL information using the
+  current server's timeline ID.
+ </para>

> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

'trusted' extensions concept is added by commit 50fc694 [3]. Since
pg_walinspect deals with WAL, we strictly want to control who creates
and can execute functions exposed by it, so I don't know if 'trusted'
is a good idea here. Also, pageinspect isn't a 'trusted' extension.

> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

pg_read_all_data may not be the right choice, but pg_read_server_files
is. However, does it sound good if some functions are allowed to be
executed by users with a pg_monitor role and others
pg_get_raw_wal_record by users with pg_read_server_files? Since the
extension itself can be created by superusers, isn't the
pg_get_raw_wal_record sort of safe with pg_mointor itself?

 If hackers don't agree, I'm happy to grant execution on
pg_get_raw_wal_record() to the pg_read_server_files role.

Attaching the v8 patch-set resolving above comments and some tests for
checking function permissions. Please review it further.

[1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
[3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jan 29 18:42:43 2020 -0500

    Invent "trusted" extensions, and remove the pg_pltemplate catalog.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Robert Haas

Дата:

10 марта 2022 г., 23:00:35

On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

I'm not clear on the standard either, exactly, but might not that
allow the database owner to get a peek at what's happening in other
databases?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Stephen Frost

Дата:

10 марта 2022 г., 23:54:24

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
>
> I'm not clear on the standard either, exactly, but might not that
> allow the database owner to get a peek at what's happening in other
> databases?

The standard is basically that all of the functions it brings are
written to enforce the PG privilege system and you aren't able to use
the extension to bypass those privileges.  In some cases that means that
the C-language functions installed have if(!superuser) ereport() calls
throughout them- that's a fine answer, but it's perhaps not very helpful
to mark those as trusted.  In other cases, the C-language functions
installed don't directly provide access to data, such as the PostGIS
functions.

I've not looked back on this thread, but I'd expect pg_walinspect to
need those superuser checks and with those it *could* be marked as
trusted, but that again brings into question how useful it is to mark it
thusly.

In an ideal world, we might have a pg_readwal predefined role which
allows a role which was GRANT'd that role to be able to read WAL
traffic, and then the pg_walinspect extension could check that the
calling role has that predefined role, and other functions and
extensions could also check that rather than any existing superuser
checks.  A cloud provider or such could then include in their setup of a
new instance something like:

GRANT pg_readwal TO admin_user WITH ADMIN OPTION;

Presuming that there isn't anything that ends up in the WAL that's an
issue for the admin_user to have access to.

I certainly don't think we should allow either database owners or
regular users on a system the ability to access the WAL traffic of the
entire system.  More forcefully- we should *not* be throwing more access
rights towards $owners in general and should be thinking about how we
can allow admins, providers, whomever, the ability to control what
rights users are given.  If they're all lumped under 'owner' then
there's no way for people to provide granular access to just those
things they wish and intend to.

Thanks,

Stephen

Вложения

signature.asc

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Kyotaro Horiguchi

Дата:

11 марта 2022 г., 05:38:22

At Thu, 10 Mar 2022 22:15:42 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
> 
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
> 
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
> 
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
> 
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
> 
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
> 
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
> 
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
> 
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
> 
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
> 
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
> 
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
> 
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
> 
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
> 
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
> 
> + <para>
> +  All the functions of this module will provide the WAL information using the
> +  current server's timeline ID.
> + </para>
> 
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
> 
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, pageinspect isn't a 'trusted' extension.
> 
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
> 
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is. However, does it sound good if some functions are allowed to be
> executed by users with a pg_monitor role and others
> pg_get_raw_wal_record by users with pg_read_server_files? Since the
> extension itself can be created by superusers, isn't the
> pg_get_raw_wal_record sort of safe with pg_mointor itself?
> 
>  If hackers don't agree, I'm happy to grant execution on
> pg_get_raw_wal_record() to the pg_read_server_files role.
> 
> Attaching the v8 patch-set resolving above comments and some tests for
> checking function permissions. Please review it further.
> 
> [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
> [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Wed Jan 29 18:42:43 2020 -0500
> 
>     Invent "trusted" extensions, and remove the pg_pltemplate catalog.

I played with this a bit, and would like to share some thoughts on it.

It seems to me too rigorous that pg_get_wal_records_info/stats()
reject future LSNs as end-LSN and I think WARNING or INFO and stop at
the real end-of-WAL is more kind to users.  I think the same with the
restriction that start and end LSN are required to be different.

The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
functions, they tell me the beginning of the current insertion point
in error message.  On the other hand they don't accept the same
value as end-LSN.  I think it is right that they tell the current
insertion point and they should take the end-LSN as the LSN to stop
reading.

I think pg_get_wal_stats() is worth to have but I think it should be
implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
about FPI since pg_waldump doesn't but it is internally collected (of
course!) and easily revealed. If we do that, the
pg_get_wal_records_stats() would be reduced to the following SQL
statement

SELECT resource_manager resmgr,
       count(*) AS N,
       (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
       sum(total_length) AS "combined size",
       (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
       sum(fpi_len) AS fpilen,
       (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
       FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
          GROUP by resource_manager
       WINDOW tot AS ()
       ORDER BY "combined size" desc;

The only difference with pg_waldump is the statement above doesn't
show lines for the resource managers that don't contained in the
result of pg_get_wal_records_info(). But I don't think that matters.


Sometimes the field description has very long (28kb long) content. It
makes the result output almost unreadable and I had a bit hard time
struggling with the output full of '-'s.  I would like have a default
limit on the length of such fields that can be long but I'm not sure
we want that.


The difference between pg_get_wal_record_info and _records_ other than
the number of argument is the former accepts incorrect LSNs.

The following works,
 pg_get_wal_record_info('0/1000000');
 pg_get_wal_records_info('0/1000000');

but this doesn't
 pg_get_wal_records_info('0/1000000', '0/1000000');
> ERROR:  WAL start LSN must be less than end LSN

But the following works
 pg_get_wal_records_info('0/1000000', '0/1000029');
> 0/1000028 | 0/0      |   0

So I think we can consolidate the two functions as:

- pg_get_wal_records_info('0/1000000');

  (current behavior) find the first record and show all records
  thereafter.

- pg_get_wal_records_info('0/1000000', '0/1000000');

  finds the first record since the start lsn and show it.

- pg_get_wal_records_info('0/1000000', '0/1000030');

  finds the first record since the start lsn then show records up to
  the end-lsn.


And about pg_get_raw_wal_record(). I don't see any use-case of the
function alone on SQL interface.  Even if we need to inspect broken
WAL files, it needs profound knowledge of WAL format and tools that
doesn't work on SQL interface.

However like pageinspect, if we separate the WAL-record fetching and
parsing it could be thought as useful.

pg_get_wal_records_info woule be like:

SELECT * FROM pg_walinspect_parse(raw)
 FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));

And pg_get_wal_stats woule be like:

SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
 FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Kyotaro Horiguchi

Дата:

11 марта 2022 г., 05:52:49

Sorry, some minor non-syntactical corrections.

At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> I played with this a bit, and would like to share some thoughts on it.
> 
> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.
> 
> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.
> 
> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
> 
> SELECT resource_manager resmgr,
>        count(*) AS N,
>        (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>        sum(total_length) AS "combined size",
>        (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
>        sum(fpi_len) AS fpilen,
>        (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
>        FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
>           GROUP by resource_manager
>        WINDOW tot AS ()
>        ORDER BY "combined size" desc;
> 
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.
> 
> 
> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.
> 
> 
- The difference between pg_get_wal_record_info and _records_ other than
- the number of argument is the former accepts incorrect LSNs.

The discussion is somewhat confused after some twists and turns..  It
should be something like the following.

pg_get_wal_record_info and pg_get_wal_records_info are almost same
since the latter can show a single record.  However it is a bit
annoying to do that. Since, other than it doens't accept same LSNs for
start and end, it doesn't show a record when there' no record in the
specfied LSN range.  But I don't think there's no usefulness of the
behavior.

The following works,
 pg_get_wal_record_info('0/1000000');
 pg_get_wal_records_info('0/1000000');

but this doesn't
 pg_get_wal_records_info('0/1000000', '0/1000000');
> ERROR:  WAL start LSN must be less than end LSN

And the following shows no records.
 pg_get_wal_records_info('0/1000000', '0/1000001');
 pg_get_wal_records_info('0/1000000', '0/1000028');

But the following works
  pg_get_wal_records_info('0/1000000', '0/1000029');
> 0/1000028 | 0/0      |   0



> So I think we can consolidate the two functions as:
> 
> - pg_get_wal_records_info('0/1000000');
> 
>   (current behavior) find the first record and show all records
>   thereafter.
> 
> - pg_get_wal_records_info('0/1000000', '0/1000000');
> 
>   finds the first record since the start lsn and show it.
> 
> - pg_get_wal_records_info('0/1000000', '0/1000030');
> 
>   finds the first record since the start lsn then show records up to
>   the end-lsn.
> 
> 
> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
> 
> However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> 
> pg_get_wal_records_info woule be like:
> 
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
> 
> And pg_get_wal_stats woule be like:
> 
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));


Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Ashutosh Sharma

Дата:

11 марта 2022 г., 14:45:09

On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Sorry, some minor non-syntactical corrections.
>
> At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > I played with this a bit, and would like to share some thoughts on it.
> >
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
> >
> > The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> > functions, they tell me the beginning of the current insertion point
> > in error message.  On the other hand they don't accept the same
> > value as end-LSN.  I think it is right that they tell the current
> > insertion point and they should take the end-LSN as the LSN to stop
> > reading.
> >
> > I think pg_get_wal_stats() is worth to have but I think it should be
> > implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> > about FPI since pg_waldump doesn't but it is internally collected (of
> > course!) and easily revealed. If we do that, the
> > pg_get_wal_records_stats() would be reduced to the following SQL
> > statement
> >
> > SELECT resource_manager resmgr,
> >        count(*) AS N,
> >          (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
> >          sum(total_length) AS "combined size",
> >          (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
> >          sum(fpi_len) AS fpilen,
> >          (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
> >          FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
> >          GROUP by resource_manager
> >          WINDOW tot AS ()
> >          ORDER BY "combined size" desc;
> >
> > The only difference with pg_waldump is the statement above doesn't
> > show lines for the resource managers that don't contained in the
> > result of pg_get_wal_records_info(). But I don't think that matters.
> >
> >
> > Sometimes the field description has very long (28kb long) content. It
> > makes the result output almost unreadable and I had a bit hard time
> > struggling with the output full of '-'s.  I would like have a default
> > limit on the length of such fields that can be long but I'm not sure
> > we want that.
> >
> >
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.
>
> The following works,
>  pg_get_wal_record_info('0/1000000');

This does work but it doesn't show any WARNING message for the start
pointer adjustment. I think it should.

>  pg_get_wal_records_info('0/1000000');
>

I think this is fine. It should be working because the user hasn't
specified the end pointer so we assume the default end pointer is
end-of-WAL.

> but this doesn't
>  pg_get_wal_records_info('0/1000000', '0/1000000');
> > ERROR:  WAL start LSN must be less than end LSN
>

I think this behaviour is fine. We cannot have the same start and end
lsn pointers.

> And the following shows no records.
>  pg_get_wal_records_info('0/1000000', '0/1000001');
>  pg_get_wal_records_info('0/1000000', '0/1000028');
>

I think we should be erroring out here saying - couldn't find any
valid WAL record between given start and end lsn because there exists
no valid wal records between the specified start and end lsn pointers.

--
With Regards,
Ashutosh Sharma.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Ashutosh Sharma

Дата:

11 марта 2022 г., 15:05:50

On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.
>

So, do you want the pg_get_wal_record_info function to be removed as
we can use pg_get_wal_records_info() to achieve what it does?

--
With Regards,
Ashutosh Sharma.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

11 марта 2022 г., 15:15:17

On Fri, Mar 11, 2022 at 8:08 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > Attaching the v8 patch-set resolving above comments and some tests for
> > checking function permissions. Please review it further.
>
> I played with this a bit, and would like to share some thoughts on it.

Thanks a lot Kyotaro-san for reviewing.

> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.

Throwing error on future LSNs is the same behaviour for all of the
pg_walinspect function input LSNs. IMO it is a cleaner thing to do
rather than confuse the users with different behaviours for each
function. The principle is this - pg_walinspect functions can't show
future WAL info. Having said that, I agree to make it a WARNING
instead of ERROR, for the simple reason that ERROR aborts the txn and
the applications can retry without aborting the txn. For instance,
pg_terminate_backend emits a WARNING if the PID isn't a postgres
process id.

PS: WARNING may not be a better idea than ERROR if we turn
pg_get_wal_stats a SQL function, see my response below.

> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.

The future LSN is determined by this:

if (!RecoveryInProgress())
available_lsn = GetFlushRecPtr(NULL);
else
available_lsn = GetXLogReplayRecPtr(NULL);

GetFlushRecPtr returns last byte + 1 flushed meaning this is the end
LSN currently known in the server, but it is not the start LSN of the
last WAL record in the server. Same goes with GetXLogReplayRecPtr
which gives lastReplayedEndRecPtr end+1 position. I picked
GetFlushRecPtr and GetXLogReplayRecPtr to determine the future WAL LSN
because this is how read_local_xlog_page determines to read WAL upto
and goes for wait mode, but I wanted to avoid the wait mode completely
for all the pg_walinspect functions (to keep things simple for now),
hence doing the similar checks within the input validation code and
emitting warning.

And you are right when we emit something like below, users tend to use
0/15B6D68 (from the DETAIL message) as the end LSN. I don't want to
ignore this DETAIL message altogether as it gives an idea where the
server is. How about rephrasing the DETAIL message a bit, something
like "Database system flushed the WAL up to WAL LSN %X/% X.'' or some
other better phrasing?

WARNING:  WAL start LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15B6D68.

If the users aren't sure about what's the end record LSN, they can
just use pg_get_wal_records_info and pg_get_wal_stats without end LSN:
select * from pg_get_wal_records_info('0/15B6D68');
select * from pg_get_wal_stats('0/15B6D68');

> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
>
> SELECT resource_manager resmgr,
>        count(*) AS N,
>            (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>            sum(total_length) AS "combined size",
>            (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
>            sum(fpi_len) AS fpilen,
>            (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
>            FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
>            GROUP by resource_manager
>            WINDOW tot AS ()
>            ORDER BY "combined size" desc;
>
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.

Yeah, this is better. One problem with the above is when
pg_get_wal_records_info emits a warning for future LSN. But this
shouldn't stop us doing it via SQL. Instead I would let all the
pg_walinspect functions emit errors as opposed to WARNING. Thoughts?

postgres=# SELECT resource_manager, count(*) AS count,
(count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS count_percentage,
sum(total_length) AS combined_size,
(sum(total_length) * 100 / sum(sum(total_length)) OVER
tot)::numeric(5,2) AS combined_size_percentage
FROM pg_get_wal_records_info('0/10A3E50', '0/25B6F00')
GROUP BY resource_manager
WINDOW tot AS ()
ORDER BY combined_size desc;
WARNING:  WAL end LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15CAA70.
 resource_manager | count | count_percentage | combined_size |
combined_size_percentage
------------------+-------+------------------+---------------+--------------------------
                  |     1 |           100.00 |               |
(1 row)

> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.

Yeah, it's a text column, let's leave it as-is, if required users can
always ignore the description columns.

> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
>However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
>
> And pg_get_wal_stats woule be like:
>
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Imagine pg_get_raw_wal_record function feeding raw WAL record to an
external tool/extension that understands the WAL. Apart from this, I
don't have a concrete reason either. I'm open to removing this
function as well and adding it along with the raw WAL parsing function
in future.

I haven't thought about the raw WAL parsing functions for now. In
fact, there are many functions we can add to pg_walinspect - functions
with wait mode for future WAL, WAL parsing, function to return all the
WAL record info/stats given a WAL file name, functions to return WAL
info/stats from historic timelines as well, function to see if the
given WAL file is valid and so on. We can park these functions for
future versions of pg_walinspect once the extension itself with basic
yet-useful-and-effective functions gets in. I will make a note of
these functions and will work in future based on how pg_walinspect
gets received by the users and community out there.

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Ashutosh Sharma

Дата:

11 марта 2022 г., 17:23:06

Some comments on pg_walinspect-docc.patch this time:

+   <varlistentry>
+    <term>
+     <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn,
prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length
OUT int4, total_length OUT int4, description OUT text, block_ref OUT
text, data OUT bytea, data_len OUT int4)</function>
+    </term>

You may shorten this by mentioning just the function input parameters
and specify "returns record" like shown below. So no need to specify
all the OUT params.

pg_get_wal_record_info(in_lsn pg_lsn) returns record.

Please check the documentation for other functions for reference.

==

+    <term>
+     <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn
pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid,
resource_manager OUT text, length OUT int4, total_length OUT int4,
description OUT text, block_ref OUT text, data OUT bytea, data_len OUT
int4)</function>
+    </term>

Same comment applies here as well. In the return type you can just
mention - "returns setof record" like shown below:

pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records.

You may also check for such optimizations at other places. I might
have missed some.

==

+<screen>
+postgres=# select prev_lsn, xid, resource_manager, length,
total_length, block_ref from pg_get_wal_records_info('0/158A7F0',
'0/1591400');
+ prev_lsn  | xid | resource_manager | length | total_length |
                       block_ref

+-----------+-----+------------------+--------+--------------+--------------------------------------------------------------------------
+ 0/158A7B8 | 735 | Heap             |     54 |         7838 | blkref
#0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408
+ 0/158A7F0 | 735 | Btree            |     53 |         8133 | blkref
#0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112
+ 0/158C6A8 | 735 | Heap             |     53 |          873 | blkref
#0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372

Instead of specifying column names in the targetlist I think it's
better to use "*" so that it will display all the output columns. Also
you may shorten the gap between start and end lsn to reduce the output
size.

==

Any reason for not specifying author name in the .sgml file. Do you
want me to add my name to the author? :)

  <para>
   Ashutosh Sharma <email>ashu.coek88@gmail.com</email>
  </para>
 </sect2>

--
With Regards,
Ashutosh Sharma.

On Thu, Mar 10, 2022 at 10:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
>
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
>
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
>
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
>
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
>
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
>
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
>
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
>
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
>
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
>
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
>
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
>
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
>
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
>
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
>
> + <para>
> +  All the functions of this module will provide the WAL information using the
> +  current server's timeline ID.
> + </para>
>
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
>
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, pageinspect isn't a 'trusted' extension.
>
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
>
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is. However, does it sound good if some functions are allowed to be
> executed by users with a pg_monitor role and others
> pg_get_raw_wal_record by users with pg_read_server_files? Since the
> extension itself can be created by superusers, isn't the
> pg_get_raw_wal_record sort of safe with pg_mointor itself?
>
>  If hackers don't agree, I'm happy to grant execution on
> pg_get_raw_wal_record() to the pg_read_server_files role.
>
> Attaching the v8 patch-set resolving above comments and some tests for
> checking function permissions. Please review it further.
>
> [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
> [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Wed Jan 29 18:42:43 2020 -0500
>
>     Invent "trusted" extensions, and remove the pg_pltemplate catalog.
>
> Regards,
> Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

11 марта 2022 г., 18:42:25

On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> >
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.

I would like to reassert the usability of pg_get_wal_record_info and
pg_get_wal_records_info:

pg_get_wal_record_info(lsn):
if lsn is invalid i.e. '0/0' - throws an error
if lsn is future lsn - throws an error
if lsn looks okay, it figures out the next available valid WAL record
and returns info about that

pg_get_wal_records_info(start_lsn, end_lsn default null) -> if start
and end lsns are provided  no end_lsn would give the WAL records info
till the end of WAL,
if start_lsn is invalid i.e. '0/0' - throws an error
if start_lsn is future lsn - throws an error
if end_lsn isn't provided by the user - calculates the end_lsn as
server's current flush lsn
if end_lsn is provided by the user - throws an error if it's future LSN
if start_lsn and end_lsn look okay, it returns info about all WAL
records from the next available valid WAL record of start_lsn until
end_lsn

So, both pg_get_wal_record_info and pg_get_wal_records_info are necessary IMHO.

Coming to the behaviour when input lsn is '0/1000000', it's an issue
with XLogSegmentOffset(lsn, wal_segment_size) != 0 check, which I will
fix in the next version.

    if (*first_record != lsn && XLogSegmentOffset(lsn, wal_segment_size) != 0)
        ereport(WARNING,
                (errmsg_plural("first record is after %X/%X, at %X/%X,
skipping over %u byte",
                               "first record is after %X/%X, at %X/%X,
skipping over %u bytes",
                               (*first_record - lsn),
                               LSN_FORMAT_ARGS(lsn),
                               LSN_FORMAT_ARGS(*first_record),
                               (uint32) (*first_record - lsn))));

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Robert Haas

Дата:

11 марта 2022 г., 23:39:13

On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.

In his review just yesterday, Jeff suggested this: "Don't issue
WARNINGs or other messages for ordinary situations, like when
pg_get_wal_records_info() hits the end of WAL." I think he's entirely
right, and I don't think any patch that does otherwise should get
committed. It is worth remembering that the results of queries are
often examined by something other than a human being sitting at a psql
terminal. Any tool that uses this is going to want to understand what
happened from the result set, not by parsing strings that may show up
inside warning messages.

I think that the right answer here is to have a function that returns
one row per record parsed, and each row should also include the start
and end LSN of the record. If for some reason the WAL records return
start after the specified start LSN (e.g. because we skip over a page
header) or end before the specified end LSN (e.g. because we reach
end-of-WAL) the user can figure it out from looking at the LSNs in the
output rows and comparing them to the LSNs provided as input.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Jeff Davis

Дата:

12 марта 2022 г., 06:24:12

On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote:
> The standard is basically that all of the functions it brings are
> written to enforce the PG privilege system and you aren't able to use
> the extension to bypass those privileges.  In some cases that means
> that

Every extension should follow that standard, right? If it doesn't (e.g.
creating dangerous functions and granting them to public), then even
superuser should not install it.

> the C-language functions installed have if(!superuser) ereport()
> calls

I'm curious why not rely on the grant system where possible? I thought
we were trying to get away from explicit superuser checks.

> I've not looked back on this thread, but I'd expect pg_walinspect to
> need those superuser checks and with those it *could* be marked as
> trusted, but that again brings into question how useful it is to mark
> it
> thusly.

As long as any functions are safely accessible to public or a
predefined role, there is some utility for the 'trusted' marker.

As this patch is currently written, pg_monitor has access these
functions, though I don't think that's the right privilege level at
least for pg_get_raw_wal_record().

> I certainly don't think we should allow either database owners or
> regular users on a system the ability to access the WAL traffic of
> the
> entire system.

Agreed. That was not what I intended by asking if it should be marked
'trusted'. The marker only allows the non-superuser to run the CREATE
EXTENSION command; it's up to the extension script to decide whether
any non-superusers can do anything at all with the extension.

>   More forcefully- we should *not* be throwing more access
> rights towards $owners in general and should be thinking about how we
> can allow admins, providers, whomever, the ability to control what
> rights users are given.  If they're all lumped under 'owner' then
> there's no way for people to provide granular access to just those
> things they wish and intend to.

Not sure I understand, but that sounds like a larger discussion.

Regards,
    Jeff Davis

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

12 марта 2022 г., 14:43:12

On Fri, Mar 11, 2022 at 7:53 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Some comments on pg_walinspect-docc.patch this time:
>
> +   <varlistentry>
> +    <term>
> +     <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn,
> prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length
> OUT int4, total_length OUT int4, description OUT text, block_ref OUT
> text, data OUT bytea, data_len OUT int4)</function>
> +    </term>
>
> You may shorten this by mentioning just the function input parameters
> and specify "returns record" like shown below. So no need to specify
> all the OUT params.
>
> pg_get_wal_record_info(in_lsn pg_lsn) returns record.
>
> Please check the documentation for other functions for reference.
>
> ==
>
> +    <term>
> +     <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn
> pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid,
> resource_manager OUT text, length OUT int4, total_length OUT int4,
> description OUT text, block_ref OUT text, data OUT bytea, data_len OUT
> int4)</function>
> +    </term>
>
> Same comment applies here as well. In the return type you can just
> mention - "returns setof record" like shown below:
>
> pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records.
>
> You may also check for such optimizations at other places. I might
> have missed some.

I used the way verify_heapam shows the columns as it looks good IMO
and we can't show sample outputs for all of the functions in the
documentation.

> ==
>
> +<screen>
> +postgres=# select prev_lsn, xid, resource_manager, length,
> total_length, block_ref from pg_get_wal_records_info('0/158A7F0',
> '0/1591400');
> + prev_lsn  | xid | resource_manager | length | total_length |
>                        block_ref
>
+-----------+-----+------------------+--------+--------------+--------------------------------------------------------------------------
> + 0/158A7B8 | 735 | Heap             |     54 |         7838 | blkref
> #0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408
> + 0/158A7F0 | 735 | Btree            |     53 |         8133 | blkref
> #0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112
> + 0/158C6A8 | 735 | Heap             |     53 |          873 | blkref
> #0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372
>
> Instead of specifying column names in the targetlist I think it's
> better to use "*" so that it will display all the output columns. Also
> you may shorten the gap between start and end lsn to reduce the output
> size.

All columns are giving huge output, especially because of data and
description columns hence I'm not showing them in the sample output.

> ==
>
> Any reason for not specifying author name in the .sgml file. Do you
> want me to add my name to the author? :)

Haha. Thanks. I will add in the v9 patch set which I will post in a while.

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

12 марта 2022 г., 14:43:31

On Sat, Mar 12, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
>
> In his review just yesterday, Jeff suggested this: "Don't issue
> WARNINGs or other messages for ordinary situations, like when
> pg_get_wal_records_info() hits the end of WAL." I think he's entirely
> right, and I don't think any patch that does otherwise should get
> committed. It is worth remembering that the results of queries are
> often examined by something other than a human being sitting at a psql
> terminal. Any tool that uses this is going to want to understand what
> happened from the result set, not by parsing strings that may show up
> inside warning messages.
>
> I think that the right answer here is to have a function that returns
> one row per record parsed, and each row should also include the start
> and end LSN of the record. If for some reason the WAL records return
> start after the specified start LSN (e.g. because we skip over a page
> header) or end before the specified end LSN (e.g. because we reach
> end-of-WAL) the user can figure it out from looking at the LSNs in the
> output rows and comparing them to the LSNs provided as input.

Thanks Robert. I've removed the WARNING part and added end_lsn as suggested.

Thanks Kyotaro-san, Ashutosh and Jeff for your review. I tried to
address your review comments, if not all, but many.

Here's v9 patch-set please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Kyotaro Horiguchi

Дата:

14 марта 2022 г., 04:58:03

At Fri, 11 Mar 2022 15:39:13 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
> 
> In his review just yesterday, Jeff suggested this: "Don't issue
> WARNINGs or other messages for ordinary situations, like when
> pg_get_wal_records_info() hits the end of WAL." I think he's entirely
> right, and I don't think any patch that does otherwise should get

It depends on what we think is the "ordinary" here.  If we don't
expect that specified LSN range is not filled-out, the case above is
ordinary and no need for any WARNING nor INFO.  I'm fine with that
definition here.

> committed. It is worth remembering that the results of queries are
> often examined by something other than a human being sitting at a psql
> terminal. Any tool that uses this is going to want to understand what
> happened from the result set, not by parsing strings that may show up
> inside warning messages.

Right. I don't think it is right that WARNING is required to evaluate
the result. And I think that the WARNING like 'reached end-of-wal
before end LSN' is a kind that is not required in evaluation of the
result. Since each WAL row contains at least start LSN.

> I think that the right answer here is to have a function that returns
> one row per record parsed, and each row should also include the start
> and end LSN of the record. If for some reason the WAL records return
> start after the specified start LSN (e.g. because we skip over a page
> header) or end before the specified end LSN (e.g. because we reach
> end-of-WAL) the user can figure it out from looking at the LSNs in the
> output rows and comparing them to the LSNs provided as input.

I agree with you here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Stephen Frost

Дата:

14 марта 2022 г., 17:55:54

Greetings,

* Jeff Davis (pgsql@j-davis.com) wrote:
> On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote:
> > The standard is basically that all of the functions it brings are
> > written to enforce the PG privilege system and you aren't able to use
> > the extension to bypass those privileges.  In some cases that means
> > that
>
> Every extension should follow that standard, right? If it doesn't (e.g.
> creating dangerous functions and granting them to public), then even
> superuser should not install it.

Every extension that's intended to be installed on a multi-user PG
system where the admin cares about security in the database, sure.  I
disagree that this applies universally to every system or every
extension.  Those are standards that modules we distribute in contrib
should meet but I don't know that we necessarily have to have them for,
say, modules in test.

> > the C-language functions installed have if(!superuser) ereport()
> > calls
>
> I'm curious why not rely on the grant system where possible? I thought
> we were trying to get away from explicit superuser checks.

We don't yet have capabilities for everything.  I agree that we should
be getting away from explicit superuser checks and explained below how
we might be able to in this case.

> > I've not looked back on this thread, but I'd expect pg_walinspect to
> > need those superuser checks and with those it *could* be marked as
> > trusted, but that again brings into question how useful it is to mark
> > it
> > thusly.
>
> As long as any functions are safely accessible to public or a
> predefined role, there is some utility for the 'trusted' marker.

I'm not sure that I agree, though I'm also not sure that it's a useful
thing to debate.  Still, if all of the functions in a particular
extension have explicit if (!superuser) ereport() checks in them, then
while installing it is 'safe' and it could be marked as 'trusted'
there's very little point in doing so as the only person who can get
anything useful from those functions is a superuser, and a superuser can
install non-trusted extensions anyway.  How is it useful to mark such an
extension as 'trusted'?

> As this patch is currently written, pg_monitor has access these
> functions, though I don't think that's the right privilege level at
> least for pg_get_raw_wal_record().

Yeah, I agree that pg_monitor isn't the right thing for such a function
to be checking.

> > I certainly don't think we should allow either database owners or
> > regular users on a system the ability to access the WAL traffic of
> > the
> > entire system.
>
> Agreed. That was not what I intended by asking if it should be marked
> 'trusted'. The marker only allows the non-superuser to run the CREATE
> EXTENSION command; it's up to the extension script to decide whether
> any non-superusers can do anything at all with the extension.

Sure.

> >   More forcefully- we should *not* be throwing more access
> > rights towards $owners in general and should be thinking about how we
> > can allow admins, providers, whomever, the ability to control what
> > rights users are given.  If they're all lumped under 'owner' then
> > there's no way for people to provide granular access to just those
> > things they wish and intend to.
>
> Not sure I understand, but that sounds like a larger discussion.

The point I was trying to make is that it's better to move in the
direction of things like pg_read_all_data rather than just declaring
that the owner of a database can read all of the tables in that
database, as an example.  Maybe we want to implicitly have such
privilege for the owner of the database too, but we should first make it
something that's able to be GRANT'd out to non-owners so that it's not
required that all of those privileges be given out together at once.

Note that to be considered an 'owner' of an object you have to be a
member of the role that owns the object, which means that said role is
necessarily able to also become the owning role too.  Lumping lots of
privileges together- all the rights that being an 'owner' of the object
conveys, plus the ability to also become that role directly and do
things as that role, works actively against the general idea of 'least
privilege'.

Thanks,

Stephen

Вложения

signature.asc

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

15 марта 2022 г., 04:51:37

On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> > As this patch is currently written, pg_monitor has access these
> > functions, though I don't think that's the right privilege level at
> > least for pg_get_raw_wal_record().
>
> Yeah, I agree that pg_monitor isn't the right thing for such a function
> to be checking.

On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

The v9 patch set posted at [1] grants execution on
pg_get_raw_wal_record() to the pg_monitor role.

pg_read_all_data may not be the right choice, but pg_read_server_files
is as these functions do read the WAL files on the server. If okay,
I'm happy to grant execution on pg_get_raw_wal_record() to the
pg_read_server_files role.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

16 марта 2022 г., 10:41:11

On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> >
> > > As this patch is currently written, pg_monitor has access these
> > > functions, though I don't think that's the right privilege level at
> > > least for pg_get_raw_wal_record().
> >
> > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > to be checking.
>
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
>
> The v9 patch set posted at [1] grants execution on
> pg_get_raw_wal_record() to the pg_monitor role.
>
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is as these functions do read the WAL files on the server. If okay,
> I'm happy to grant execution on pg_get_raw_wal_record() to the
> pg_read_server_files role.
>
> Thoughts?
>
> [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com

Attaching v10 patch set which allows pg_get_raw_wal_record to be
executed by either superuser or users with pg_read_server_files role,
no other change from v9 patch set.

Please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Stephen Frost

Дата:

16 марта 2022 г., 17:26:59

Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> > >
> > > > As this patch is currently written, pg_monitor has access these
> > > > functions, though I don't think that's the right privilege level at
> > > > least for pg_get_raw_wal_record().
> > >
> > > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > > to be checking.
> >
> > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > >
> > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > > function should require pg_read_server_files? Or at least
> > > pg_read_all_data?
> >
> > The v9 patch set posted at [1] grants execution on
> > pg_get_raw_wal_record() to the pg_monitor role.
> >
> > pg_read_all_data may not be the right choice, but pg_read_server_files
> > is as these functions do read the WAL files on the server. If okay,
> > I'm happy to grant execution on pg_get_raw_wal_record() to the
> > pg_read_server_files role.
> >
> > Thoughts?
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com
>
> Attaching v10 patch set which allows pg_get_raw_wal_record to be
> executed by either superuser or users with pg_read_server_files role,
> no other change from v9 patch set.

In a quick look, that seems reasonable to me.  If folks want to give out
access to this function individually they're also able to do so, which
is good.  Doesn't seem worthwhile to introduce a new predefined role for
this one function.

Thanks,

Stephen

Вложения

signature.asc

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Ashutosh Sharma

Дата:

16 марта 2022 г., 18:19:12

I can see that the pg_get_wal_records_info function shows the details
of the WAL record whose existence is beyond the user specified
stop/end lsn pointer. See below:

ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
'0/01000029');
-[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
start_lsn        | 0/1000028
end_lsn          | 0/100009F
prev_lsn         | 0/0
xid              | 0
resource_manager | XLOG
record_length    | 114
fpi_length       | 0
description      | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli
1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB
1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0;
oldest running xid 0; shutdown
block_ref        |
data_length      | 88
data             |

\x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000

In this case, the end lsn pointer specified by the user is
'0/01000029'. There is only one WAL record which starts before this
specified end lsn pointer whose start pointer is at 01000028, but that
WAL record ends at 0/100009F which is way beyond the specified end
lsn. So, how come we are able to display the complete WAL record info?
AFAIU, end lsn is the lsn pointer where you need to stop reading the
WAL data. If that is true, then there exists no valid WAL record
between the start and end lsn in this particular case.

--
With Regards,
Ashutosh Sharma.

On Wed, Mar 16, 2022 at 7:56 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > >
> > > > > As this patch is currently written, pg_monitor has access these
> > > > > functions, though I don't think that's the right privilege level at
> > > > > least for pg_get_raw_wal_record().
> > > >
> > > > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > > > to be checking.
> > >
> > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > > >
> > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > > > function should require pg_read_server_files? Or at least
> > > > pg_read_all_data?
> > >
> > > The v9 patch set posted at [1] grants execution on
> > > pg_get_raw_wal_record() to the pg_monitor role.
> > >
> > > pg_read_all_data may not be the right choice, but pg_read_server_files
> > > is as these functions do read the WAL files on the server. If okay,
> > > I'm happy to grant execution on pg_get_raw_wal_record() to the
> > > pg_read_server_files role.
> > >
> > > Thoughts?
> > >
> > > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com
> >
> > Attaching v10 patch set which allows pg_get_raw_wal_record to be
> > executed by either superuser or users with pg_read_server_files role,
> > no other change from v9 patch set.
>
> In a quick look, that seems reasonable to me.  If folks want to give out
> access to this function individually they're also able to do so, which
> is good.  Doesn't seem worthwhile to introduce a new predefined role for
> this one function.
>
> Thanks,
>
> Stephen

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Kyotaro Horiguchi

Дата:

17 марта 2022 г., 08:18:38

At Wed, 16 Mar 2022 20:49:12 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in 
> I can see that the pg_get_wal_records_info function shows the details
> of the WAL record whose existence is beyond the user specified
> stop/end lsn pointer. See below:
> 
> ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
> '0/01000029');
> -[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> start_lsn        | 0/1000028
> end_lsn          | 0/100009F
> prev_lsn         | 0/0
...
> record_length    | 114
...
> In this case, the end lsn pointer specified by the user is
> '0/01000029'. There is only one WAL record which starts before this
> specified end lsn pointer whose start pointer is at 01000028, but that
> WAL record ends at 0/100009F which is way beyond the specified end
> lsn. So, how come we are able to display the complete WAL record info?
> AFAIU, end lsn is the lsn pointer where you need to stop reading the
> WAL data. If that is true, then there exists no valid WAL record
> between the start and end lsn in this particular case.

You're right considering how pg_waldump behaves. pg_waldump works
almost the way as you described above.  The record above actually ends
at 1000099 and pg_waldump shows that record by specifying -s 0/1000028
-e 0/100009a, but not for -e 0/1000099.

# I personally think the current behavior is fine, though..


It still suggests unspecifiable end-LSN..

> select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060');
> ERROR:  cannot accept future end LSN
> DETAIL:  Last known WAL LSN on the database system is 4/4C000060.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

17 марта 2022 г., 10:55:35

On Wed, Mar 16, 2022 at 8:49 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> I can see that the pg_get_wal_records_info function shows the details
> of the WAL record whose existence is beyond the user specified
> stop/end lsn pointer. See below:
>
> ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
> '0/01000029');
> -[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> start_lsn        | 0/1000028
> end_lsn          | 0/100009F
> prev_lsn         | 0/0
> xid              | 0
> resource_manager | XLOG
> record_length    | 114
> fpi_length       | 0
> description      | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli
> 1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB
> 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0;
> oldest running xid 0; shutdown
> block_ref        |
> data_length      | 88
> data             |
>
\x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000
>
> In this case, the end lsn pointer specified by the user is
> '0/01000029'. There is only one WAL record which starts before this
> specified end lsn pointer whose start pointer is at 01000028, but that
> WAL record ends at 0/100009F which is way beyond the specified end
> lsn. So, how come we are able to display the complete WAL record info?
> AFAIU, end lsn is the lsn pointer where you need to stop reading the
> WAL data. If that is true, then there exists no valid WAL record
> between the start and end lsn in this particular case.

Thanks Ashutosh, it's an edge case and I don't think we would want to
show a WAL record that ends at LSN after the user specified end-lsn
which doesn't look good. I fixed it in the v11 patch set. Now, the
pg_get_wal_records_info will show records only upto user specified
end_lsn, it doesn't show the last record which starts at LSN < end_lsn
but ends at LSN > end_lsn, see [1].

Please review the v11 patch set further.

[1]
postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/01000029');
 start_lsn | end_lsn | prev_lsn
-----------+---------+----------
(0 rows)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/100009F');
 start_lsn |  end_lsn  | prev_lsn
-----------+-----------+----------
 0/1000028 | 0/100009F | 0/0
(1 row)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/10000A0');
 start_lsn |  end_lsn  | prev_lsn
-----------+-----------+----------
 0/1000028 | 0/100009F | 0/0
(1 row)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/0100009E');
 start_lsn | end_lsn | prev_lsn
-----------+---------+----------
(0 rows)

Regards,
Bharath Rupireddy.

On Fri, Mar 18, 2022 at 8:07 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Hi Bharath,
>
> Due to recent commits on master, the pg_walinpect module is not
> compiling. Kindly update the patch.

Thanks Nitin. Here's an updated v12 patch-set. I will respond to
Andres comments in a while.

Regards,
Bharath Rupireddy.

On Tue, Mar 22, 2022 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
>
> > > To me duplicating this much code from waldump seems like a bad idea from a
> > > maintainability POV.
> >
> > Even if we were to put the above code from pg_walinspect and
> > pg_waldump into, say, walutils.c or some other existing file, there we
> > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
> > printf() sort of thing, isn't it clumsy?
>
> Why is that needed? Just use the stringinfo in both places? You're outputting
> the exact same thing in both places right now. There's already a stringinfo in
> XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was
> written), so you could just convert at least the relevant printfs in
> XLogDumpDisplayRecord().
>
> > Also, unnecessary if
> > conditions need to be executed for every record. For maintainability,
> > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
> > things in both places (of course this might sound dumbest way of doing
> > it, IMHO, it's sensible, given the if(pg_walinspect)-else
> > if(pg_waldump) sorts of code that we need to do in the common
> > functions). Thoughts?
>
> IMO we shouldn't merge this with as much duplication as there is right now,
> the notes don't change that it's a PITA to maintain.

Here's a refactoring patch that basically moves the pg_waldump's
functions and stats structures to xlogreader.h/.c so that the
pg_walinspect can reuse them. If it looks okay, I will send the
pg_walinspect patches based on it.

Regards,
Bharath Rupireddy.

Вложения

v13-0001-Refactor-pg_waldump-code.patch

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Kyotaro Horiguchi

Дата:

24 марта 2022 г., 07:52:46

At Wed, 23 Mar 2022 21:36:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Here's a refactoring patch that basically moves the pg_waldump's
> functions and stats structures to xlogreader.h/.c so that the
> pg_walinspect can reuse them. If it looks okay, I will send the
> pg_walinspect patches based on it.


+void
+XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
+                       uint32 *fpi_len, bool detailed_format,
+                       StringInfo buf)
...
+        if (detailed_format && delimiter)
+            appendStringInfoChar(buf, '\n');

It is odd that the variable "delimiter" is used as a bool in the
function, though it is a "char *", which I meant that it is used as
delimiter string (assuming that you might want to insert ", " between
two blkref descriptions).


+get_forkname(ForkNumber num)

forkNames[] is public and used in reinit.c.  I think we don't need
this function.


+#define MAX_XLINFO_TYPES 16
...
+    XLogRecStats    rmgr_stats[RM_NEXT_ID];
+    XLogRecStats    record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+} XLogStats;
+

This doesn't seem to be a part of xlogreader.  Couldn't we add a new
module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
a part of xlogreader, the xlogstats looks like a better place.


+#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
+                                  tot_len_pct, total_count, \

It doesn't need to be a macro. However in the first place I don't
think it is useful to have.  Rather it may be harmful since it doesn't
reduce complexity much but instead just hides details.  If we want to
avoid tedious repetitions of the same statements, a macro like the
following may work.

#define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
...
>   n_pct = CALC_PCT(n, total_count);
>   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
>   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
>   tot_len_pct = CALC_PCT(tot_len, total_len);

But it is not seem that different if we directly write out the detail.

>   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
>   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
>   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
>   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

24 марта 2022 г., 12:32:29

On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> +void
> +XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
> +                                          uint32 *fpi_len, bool detailed_format,
> +                                          StringInfo buf)
> ...
> +               if (detailed_format && delimiter)
> +                       appendStringInfoChar(buf, '\n');
>
> It is odd that the variable "delimiter" is used as a bool in the
> function, though it is a "char *", which I meant that it is used as
> delimiter string (assuming that you might want to insert ", " between
> two blkref descriptions).

I'm passing NULL if the delimiter isn't required (for pg_walinspect)
and I wanted to check if it's passed, so I was using the delimiter in
the condition. However, I now changed it to delimiter != NULL.

> +get_forkname(ForkNumber num)
>
> forkNames[] is public and used in reinit.c.  I think we don't need
> this function.

Yes. I removed it.

> +#define MAX_XLINFO_TYPES 16
> ...
> +       XLogRecStats    rmgr_stats[RM_NEXT_ID];
> +       XLogRecStats    record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
> +} XLogStats;
> +
>
> This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> a part of xlogreader, the xlogstats looks like a better place.

I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
structures, 1 macro, and 2 functions with no plan to add new stats
structures or functions. Since xlogreader is the one that reads the
WAL, and is being included by both backend and other modules (tools
and extensions) IMO it's the right place. However, I can specify in
xlogreader that if at all the new stats related structures or
functions are going to be added, it's good to move them into a new
header and .c file.

Thoughts?

> +#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
> +                                                                 tot_len_pct, total_count, \
>
> It doesn't need to be a macro. However in the first place I don't
> think it is useful to have.  Rather it may be harmful since it doesn't
> reduce complexity much but instead just hides details.  If we want to
> avoid tedious repetitions of the same statements, a macro like the
> following may work.
>
> #define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
> ...
> >   n_pct = CALC_PCT(n, total_count);
> >   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
> >   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
> >   tot_len_pct = CALC_PCT(tot_len, total_len);
>
> But it is not seem that different if we directly write out the detail.
>
> >   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
> >   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
> >   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
> >   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

I removed the XLOG_GET_STATS_PERCENTAGE macro.

Attaching v14 patch-set here with. It has bunch of other changes along
with the above:

1) Used STRICT for all the functions and introduced _till_end_of_wal
versions for pg_get_wal_records_info and pg_get_wal_stats.
2) Most of the code is reused between pg_walinspect and pg_waldump and
also within pg_walinspect.
3) Added read_local_xlog_page_no_wait without duplicating the code so
that the pg_walinspect functions don't wait even while finding the
first valid WAL record.
4) No function waits for future WAL lsn even to find the first valid WAL record.
5) Addressed the review comments raised upthread by Andres.

I hope this version makes the patch cleaner, please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Andres Freund

Дата:

24 марта 2022 г., 21:47:58

Hi,

On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote:
> On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
> > This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> > module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> > a part of xlogreader, the xlogstats looks like a better place.
> 
> I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
> structures, 1 macro, and 2 functions with no plan to add new stats
> structures or functions. Since xlogreader is the one that reads the
> WAL, and is being included by both backend and other modules (tools
> and extensions) IMO it's the right place. However, I can specify in
> xlogreader that if at all the new stats related structures or
> functions are going to be added, it's good to move them into a new
> header and .c file.

I don't like that location for XLogRecGetBlockRefInfo(). How about putting it
in xlogdesc.c - that kind of fits?

And what do you think about creating src/backend/access/rmgrdesc/stats.c for
XLogRecStoreStats()? It's not a perfect location, but not too bad either.

XLogRecGetLen() would be ok in xlogreader, but stats.c also would work?

Greetings,

Andres Freund

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

25 марта 2022 г., 09:41:47

On Fri, Mar 25, 2022 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote:
> > On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
> > > This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> > > module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> > > a part of xlogreader, the xlogstats looks like a better place.
> >
> > I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
> > structures, 1 macro, and 2 functions with no plan to add new stats
> > structures or functions. Since xlogreader is the one that reads the
> > WAL, and is being included by both backend and other modules (tools
> > and extensions) IMO it's the right place. However, I can specify in
> > xlogreader that if at all the new stats related structures or
> > functions are going to be added, it's good to move them into a new
> > header and .c file.
>
> I don't like that location for XLogRecGetBlockRefInfo(). How about putting it
> in xlogdesc.c - that kind of fits?

Done.

> And what do you think about creating src/backend/access/rmgrdesc/stats.c for
> XLogRecStoreStats()? It's not a perfect location, but not too bad either.
>
> XLogRecGetLen() would be ok in xlogreader, but stats.c also would work?

I've added a new xlogstats.c/.h (as suggested by Kyotaro-san as well)
file under src/backend/access/transam/. I don't think the new file
fits well under rmgrdesc.

Attaching v15 patch-set, please have a look at it.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

RKN Sai Krishna

Дата:

25 марта 2022 г., 18:07:31

Hi Bharath,

First look at the patch, bear with me if any of the following comments are repeated.

1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more meaningful to show the corresponding WAL record.

For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output and not the one with start and end lsn as ('0/17335E0', '0/1733617').

With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.

2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.

rkn=# select * from pg_get_wal_record('0/4041728');
start_lsn | end_lsn | prev_lsn | record_length |

record

-----------+-----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------
0/4041728 | 0/40421AF | 0/40416F0 | 2670 | \x6e0a0000d2020000f016040400000000000a0000fef802b400007f7f00000000408fe738a25500000300000000000000010000007f06000000
4000003b0a00000000000012000000100101007f7f7f7f0885e738a25500003000c815380a03000885e738a255000000007f7f7f7f7f7f0000000078674301296000003000f8150020042000000000709e1203909
dba01c89c8601609cd00030986008f8956804d202000000000000000000000000120006001f00030820ffff5f04000000000001400000010000000000000004000000000080bf0200030000000000000000006100
0000610000000000000000000000000000000000000000000000000000000000000000000000330100000000000000bc02000001000000010000000000803f00000000000000b0060000010000000000000017000

3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during recovery right?

4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the specified input lsn range, would it be better to output the existing WAL records displaying a message that it is a partial list of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out saying "requested WAL segment has already been removed"?

5. Following are very minor comments in the code

Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN
In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data
In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable
In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements verifying if the start lsn is a future lsn.

Thanks,

RKN

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

26 марта 2022 г., 08:01:01

On Fri, Mar 25, 2022 at 8:37 PM RKN Sai Krishna
<rknsaiforpostgres@gmail.com> wrote:
>
> Hi Bharath,
>
> First look at the patch, bear with me if any of the following comments are repeated.

Thanks RKN, for playing around with the patches.

> 1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more
meaningfulto show the corresponding WAL record. 

In general, all the functions will first look for a first valid WAL
record from the given input lsn/start lsn(XLogFindNextRecord) and then
give info of all the valid records including the first valid WAL
record until either the given end lsn or till end of WAL depending on
the function used.

> For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output
andnot the one with start and end lsn as ('0/17335E0', '0/1733617'). 

If  '0/17335E7' is an LSN containing a valid WAL record,
pg_get_wal_record gives the info of that, otherwise if there's any
next valid WAL record, it finds and gives that info. '0/17335E0' is
before '0/17335E7' the input lsn, so it doesn't show that record, but
the next valid record.

All the pg_walinspect functions don't look for the nearest valid WAL
record (could be previous to input lsn or next to input lsn), but they
look for the next valid WAL record. This is because the xlogreader
infra now has no API for backward iteration from a given LSN ( it has
XLogFindNextRecord and XLogReadRecord which scans the WAL in forward
direction). But, it's a good idea to have XLogFindPreviousRecord and
XLogReadPreviousRecord versions (as we have links for previous WAL
record in each WAL record) but that's a separate discussion.

> With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.

Yes.

> 2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.

I believe this is something psql does for larger column outputs for
pretty-display. When used in a non-psql client, the column values are
returned properly. Nothing to do with the pg_walinspect patches here.

> 3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during
recoveryright? 

There are functions that can be executable during recovery
pg_last_wal_receive_lsn, pg_last_wal_replay_lsn. The pg_walinspect
functions are useful even in recovery and I don't see a strong reason
to not support them. Hence, I'm right now supporting them.

> 4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the
specifiedinput lsn range, would it be better to output the existing WAL records displaying a message that it is a
partiallist of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out
saying"requested WAL segment has already been removed"? 

"requested WAL segment %s has already been removed" is a common error
across the xlogreader infra (see wal_segment_open) and I don't want to
invent a new behaviour. And all the segment_open callbacks report an
error when they are not finding the WAL file that they are looking
for.

> 5. Following are very minor comments in the code
>
> Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN

That's fine, because it actually returns curr_lsn via the function
param curr_lsn. However, I modified the comment a bit.

> In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data

No, we are just returning pointer to the string, not deep copying, see
CStringGetTextDatum. All the functions get executed within a
function's memory context and after handing off the results to the
client that gets deleted, deallocating all the memory.

> In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable

XLogRecGetInfo is not a function, it's a macro, so that's fine.
#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info)

> In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements
verifyingif the start lsn is a future lsn. 

The existing if conditions are:

    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    if (till_end_of_wal)
    if (start_lsn >= end_lsn)

I clubbed them like this:
    if (!till_end_of_wal)
    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    else if (till_end_of_wal)

Other if conditions are serving different purposes, so I'm leaving them as-is.

Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch,
others remain the same.

Regards,
Bharath Rupireddy.

On Wed, Apr 6, 2022 at 10:32 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2022-04-04 at 09:15 +0530, Bharath Rupireddy wrote:
> > My intention is to return the overall undecoded WAL record [5] i.e.
> > the data starting from XLogReadRecord's output [6] till length
> > XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed
> > to have this function, I also mentioned a possible use-case there.
>
> The current patch does not actually do this: it's returning a pointer
> into the DecodedXLogRecord struct, which doesn't have the raw bytes of
> the WAL record.
>
> To return the raw bytes of the record is not entirely trivial: it seems
> we have to look in the decoded record and either find a pointer into
> readBuf, or readRecordBuf, depending on whether the record crosses a
> boundary or not. If we find a good way to do this I'm fine keeping the
> function, but if not, we can leave it for v16.

With no immediate use of raw WAL data without a WAL record parsing
function, I'm dropping that function for now.

> > pg_get_wal_record_info returns the main data of the WAL record
> > (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update
> > and so on).
>
> We also discussed just removing the main data from the output here.
> It's not terribly useful, and could be arbitrarily large. Similar to
> how we leave out the backup block data and images.

Done.

> > As identified in [1], SQL-version of stats function is way slower in
> > normal cases, hence it was agreed (by Andres, Kyotaro-san and myself)
> > to have a C-function for stats.a pointer into
>
> Now I agree. We should also have an equivalent of "pg_waldump --
> stats=record" though, too.

Added a parameter per_record (with default being false, emitting
per-rmgr stats) to pg_get_wal_stats and
pg_get_wal_stats_till_end_of_wal, when set returns per-record stats,
much like pg_waldump.

> > Yes, that's already part of the description column (much like
> > pg_waldump does) and the users can still do it with GROUP BY and
> > HAVING clauses, see [4].
>
> I still think an extra column for the results of rm_identify() would
> make sense. Not critical, but seems useful.

Added rm_identify as record_type column in pg_get_wal_record_info,
pg_get_wal_records_info, pg_get_wal_record_info_till_end_of_wal.
Removed the rm_identify from the description column as it's
unnecessary now here.

> > As mentioned in [1], read_local_xlog_page_no_wait required because
> > the
> > functions can still wait in read_local_xlog_page for WAL while
> > finding
> > the first valid record after the given input LSN (the use case is
> > simple - just provide input LSN closer to server's current flush LSN,
> > may be off by 3 or 4 bytes).
>
> Did you look into using XLogReadAhead() rather than XLogReadRecord()?

I don't think XLogReadAhead will help either, as it calls page_read
callback, XLogReadAhead->XLogDecodeNextRecord->ReadPageInternal->page_read->read_local_xlog_page
(which again waits for future WAL).

Per our internal discussion, I'm keeping the
read_local_xlog_page_no_wait as it offers a better solution without
much code duplication.

> The name pg_get_wal_records_info bothers me slightly, but I don't have
> a better suggestion.

IMO, pg_get_wal_records_info looks okay, hence didn't change it.

> One other thought: functions like pg_logical_emit_message() return an
> LSN, but if you feed that into pg_walinspect you get the *next* record.
> That makes sense because pg_logical_emit_message() returns the result
> of XLogInsertRecord(), which is the end of the last inserted record.
> But it can be slightly annoying/confusing. I don't have any particular
> suggestion, but maybe it's worth a mention in the docs or something?

Yes, all the pg_walinspect functions would find the next valid WAL
record  to the input/start LSN and start returning the details from
then.

IMO, the descriptions of these functions have already specified it:

pg_get_wal_record_info
      Gets WAL record information of a given LSN. If the given LSN isn't
      containing a valid WAL record, it gives the information of the next
      available valid WAL record. This function emits an error if a future (the

all other functions say this:
     Gets information/statistics of all the valid WAL records between/from

Attaching v17 patch-set with the above review comments addressed.
Please have a look at it.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

07 апреля 2022 г., 13:05:07

On Wed, Apr 6, 2022 at 2:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Attaching v17 patch-set with the above review comments addressed.
> Please have a look at it.

Had to rebase because of 5c279a6d350 (Custom WAL Resource Managers.).
Please find v18 patch-set.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Amit Kapila

Дата:

11 апреля 2022 г., 13:51:26

On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>

I am facing the below doc build failure on my machine due to this work:

./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
Tabs appear in SGML/XML files
make: *** [check-tabs] Error 1

The attached patch fixes this for me.

-- 
With Regards,
Amit Kapila.

Вложения

fix_tabs_1.patch

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Bharath Rupireddy

Дата:

11 апреля 2022 г., 16:03:06

On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
>
> I am facing the below doc build failure on my machine due to this work:
>
> ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
> Tabs appear in SGML/XML files
> make: *** [check-tabs] Error 1
>
> The attached patch fixes this for me.

Thanks. It looks like there's a TAB in between. Your patch LGTM.

I'm wondering why this hasn't been caught in the build farm members
(or it may have been found but I'm missing to locate it.).

Can you please provide me with the doc build command to catch these
kinds of errors?

Regards,
Bharath Rupireddy.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От

Amit Kapila

Дата:

12 апреля 2022 г., 06:56:12

On Mon, Apr 11, 2022 at 6:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> >
> > I am facing the below doc build failure on my machine due to this work:
> >
> > ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
> > Tabs appear in SGML/XML files
> > make: *** [check-tabs] Error 1
> >
> > The attached patch fixes this for me.
>
> Thanks. It looks like there's a TAB in between. Your patch LGTM.
>
> I'm wondering why this hasn't been caught in the build farm members
> (or it may have been found but I'm missing to locate it.).
>
> Can you please provide me with the doc build command to catch these
> kinds of errors?
>

Nothing special. In the doc/src/sgml, I did make clean followed by make check.


-- 
With Regards,
Amit Kapila.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pg_walinspect - a new extension to get raw WAL data and WAL stats

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения