Обсуждение: pg_walinspect - a new extension to get raw WAL data and WAL stats

Поиск
Список
Период
Сортировка

pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
Hi,

While working on one of the internal features, we found that it is a
bit difficult to run pg_waldump for a normal user to know WAL info and
stats of a running postgres database instance in the cloud. Many a
times users or DBAs or developers would want to get and analyze
following:
1) raw WAL record associated with an LSN or raw WAL records between a
start LSN and end LSN for feeding to some other functionality
2) WAL statistics associated with an LSN or between start LSN and end
LSN for debugging or analytical purposes. The WAL stats are the number
of inserts, updates, deletes, index inserts, commits, checkpoints,
aborts, wal record sizes, FPI (Full Page Image) count etc. which are
basically everything that we get with pg_waldump --stats option plus
some other information as we may feel will be useful.

An available option is to use pg_waldump, a standalone program
emitting human readable WAL info into a standard output/file. This
works well when users have access to the system on which postgres is
running. But for a postgres database instance running in the cloud
environments, starting the pg_waldump, fetching and presenting its
output to the users in a structured way may be a bit hard to do.

How about we create a new extension, called pg_walinspect (synonymous
to pageinspect extension) with a bunch of SQL-callable functions that
get the raw WAL records or stats out of a running postgres database
instance in a more structured way that is easily consumable by all the
users or DBAs or developers? We can also provide these functionalities
into the core postgres (in xlogfuncs.c) instead of a new extension,
but we would like it to be pluggable so that the functions will be
used only if required.

[1] shows a rough sketch of the functions that the new pg_walinspect
extension can provide. These are not exhaustive; we can
add/remove/modify as we move further.

We would like to invite more thoughts from the hackers.

Credits: Thanks to Satya Narlapuram, Chen Liang (for some initial
work), Tianyu Zhang and Ashutosh Sharma (copied in cc) for internal
discussions.

[1]
a)  bytea pg_get_wal_record(pg_lsn lsn); and bytea
pg_get_wal_record(pg_lsn lsn, text wal_dir); - Returns a single row of
raw WAL record of bytea type. WAL data is read from pg_wal or
specified wal_dir directory.

b)  bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn); and
bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn, text
wal_dir); - Returns multiple rows of raw WAL records of bytea type,
one row per each WAL record. WAL data is read from pg_wal or specified
wal_dir directory.

CREATE TYPE walinspect_stats_type AS (stat1, stat2, stat3 …. statN);
c)  walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn); and
walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn, text wal_dir); -
Returns a single row of WAL record’s stats of walinspect_stats_type
type. WAL data is read from pg_wal or specified wal_dir directory.

d)  walinspect_stats_type[] pg_get_wal_stats(pg_lsn start_lsn, pg_lsn
end_lsn); and walinspect_stats_type[] pg_get_wal_stats(pg_lsn
start_lsn, pg_lsn end_lsn, text wal_dir); - Returns multiple rows of
WAL record stats of walinspect_stats_type type, one row per each WAL
record. WAL data is read from pg_wal or specified wal_dir directory.

e)  walinspect_stats_type  pg_get_wal_stats(bytea wal_record); -
Returns a single row of provided WAL record (wal_record) stats.

f)  walinspect_stats_type pg_get_wal_stats_aggr(pg_lsn start_lsn,
pg_lsn end_lsn); and walinspect_stats_type
pg_get_wal_stats_aggr(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir);
- Returns a single row of aggregated stats of all the WAL records
between start_lsn and end_lsn. WAL data is read from pg_wal or
specified wal_dir directory.

CREATE TYPE walinspect_lsn_range_type AS (pg_lsn start_lsn, pg_lsn end_lsn);
g)  walinspect_lsn_range_type   walinspect_get_lsn_range(text
wal_dir); - Returns a single row of start LSN and end LSN of the WAL
records available under pg_wal or specified wal_dir directory.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
"Bossart, Nathan"
Дата:
On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> How about we create a new extension, called pg_walinspect (synonymous
> to pageinspect extension) with a bunch of SQL-callable functions that
> get the raw WAL records or stats out of a running postgres database
> instance in a more structured way that is easily consumable by all the
> users or DBAs or developers? We can also provide these functionalities
> into the core postgres (in xlogfuncs.c) instead of a new extension,
> but we would like it to be pluggable so that the functions will be
> used only if required.

+1

Nathan


Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Michael Paquier
Дата:
On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote:
> +1

A backend approach has the advantage that you can use the proper locks
to make sure that a segment is not recycled or removed by a concurrent
checkpoint, so that would be reliable.
--
Michael

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
"Drouvot, Bertrand"
Дата:
On 9/10/21 12:49 AM, Bossart, Nathan wrote:
> On 9/8/21, 6:49 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
>> How about we create a new extension, called pg_walinspect (synonymous
>> to pageinspect extension) with a bunch of SQL-callable functions that
>> get the raw WAL records or stats out of a running postgres database
>> instance in a more structured way that is easily consumable by all the
>> users or DBAs or developers? We can also provide these functionalities
>> into the core postgres (in xlogfuncs.c) instead of a new extension,
>> but we would like it to be pluggable so that the functions will be
>> used only if required.
> +1
>
> Nathan
>
+1

Bertrand




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Sep 10, 2021 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 09, 2021 at 10:49:46PM +0000, Bossart, Nathan wrote:
> > +1
>
> A backend approach has the advantage that you can use the proper locks
> to make sure that a segment is not recycled or removed by a concurrent
> checkpoint, so that would be reliable.

Thanks for sharing your thoughts. IMO, using locks for showing WAL
stats isn't a good way, because these new functions may block the
checkpointer from removing/recycling the WAL files.  We don't want to
do that. If at all, user has asked stats of  an LSN/range of LSNs if
it is/they are available in the pg_wal directory, we provide the info
otherwise we can throw warnings/errors. This behaviour is pretty much
in sycn with what pg_waldump does right now.

And, some users may not need these new functions at all, so in such
cases going with an extension way makes it more usable.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bruce Momjian
Дата:
On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> While working on one of the internal features, we found that it is a
> bit difficult to run pg_waldump for a normal user to know WAL info and
> stats of a running postgres database instance in the cloud. Many a
> times users or DBAs or developers would want to get and analyze
> following:

Uh, are we going to implement everything that is only available at the
command line as an extension just for people who are using managed cloud
services where the command line is not available and the cloud provider
has not made that information accessible?  Seems like this might lead to
a lot of duplicated effort.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Chapman Flack
Дата:
On 10/05/21 18:07, Bruce Momjian wrote:
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services

One extension that runs a curated menu of command-line tools for you
and returns their stdout?

Regards,
-Chap



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeremy Schneider
Дата:
On 10/5/21 15:07, Bruce Momjian wrote:
> On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
>> While working on one of the internal features, we found that it is a
>> bit difficult to run pg_waldump for a normal user to know WAL info and
>> stats of a running postgres database instance in the cloud. Many a
>> times users or DBAs or developers would want to get and analyze
>> following:
> 
> Uh, are we going to implement everything that is only available at the
> command line as an extension just for people who are using managed cloud
> services where the command line is not available and the cloud provider
> has not made that information accessible?  Seems like this might lead to
> a lot of duplicated effort.

No. For most command line utilities, there's no reason to expose them in
SQL or they already are exposed in SQL. For example, everything in
pg_controldata is already available via SQL functions.

Specifically exposing pg_waldump functionality in SQL could speed up
finding bugs in the PG logical replication code. We found and fixed a
few over this past year, but there are more lurking out there.

Having the extension in core is actually the only way to avoid
duplicated effort, by having shared code which the pg_dump binary and
the extension both wrap or call.

-Jeremy

-- 
http://about.me/jeremy_schneider



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bruce Momjian
Дата:
On Tue, Oct  5, 2021 at 06:22:19PM -0400, Chapman Flack wrote:
> On 10/05/21 18:07, Bruce Momjian wrote:
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services
> 
> One extension that runs a curated menu of command-line tools for you
> and returns their stdout?

Yes, that would make sense, and something the cloud service providers
would write.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bruce Momjian
Дата:
On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> On 10/5/21 15:07, Bruce Momjian wrote:
> > On Wed, Sep  8, 2021 at 07:18:08PM +0530, Bharath Rupireddy wrote:
> >> While working on one of the internal features, we found that it is a
> >> bit difficult to run pg_waldump for a normal user to know WAL info and
> >> stats of a running postgres database instance in the cloud. Many a
> >> times users or DBAs or developers would want to get and analyze
> >> following:
> > 
> > Uh, are we going to implement everything that is only available at the
> > command line as an extension just for people who are using managed cloud
> > services where the command line is not available and the cloud provider
> > has not made that information accessible?  Seems like this might lead to
> > a lot of duplicated effort.
> 
> No. For most command line utilities, there's no reason to expose them in
> SQL or they already are exposed in SQL. For example, everything in
> pg_controldata is already available via SQL functions.

That's a good example.

> Specifically exposing pg_waldump functionality in SQL could speed up
> finding bugs in the PG logical replication code. We found and fixed a
> few over this past year, but there are more lurking out there.

Uh, why is pg_waldump more important than other command line tool
information?

> Having the extension in core is actually the only way to avoid
> duplicated effort, by having shared code which the pg_dump binary and
> the extension both wrap or call.

Uh, aren't you duplicating code by having pg_waldump as a command-line
tool and an extension?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeremy Schneider
Дата:
On 10/5/21 17:43, Bruce Momjian wrote:
> On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
>> Specifically exposing pg_waldump functionality in SQL could speed up
>> finding bugs in the PG logical replication code. We found and fixed a
>> few over this past year, but there are more lurking out there.
> 
> Uh, why is pg_waldump more important than other command line tool
> information?

Going down the list of all other utilities in src/bin:

* pg_amcheck, pg_config, pg_controldata: already available in SQL
* psql, pgbench, pg_dump: already available as client-side apps
* initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
possible use case outside server OS access, most of these are too low
level and don't even make sense in SQL
* pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
don't feel any deep interest myself

Speaking selfishly, there are a few reasons I would be specifically
interested in pg_waldump (the only remaining one on the list).

.

First, to better support existing features around logical replication
and decoding.

In particular, it seems inconsistent to me that all the replication
management SQL functions take LSNs as arguments - and yet there's no
SQL-based way to find the LSNs that you are supposed to pass into these
functions.

https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION

Over the past few years I've been pulled in to help several large PG
users who ran into these bugs, and it's very painful - because the only
real remediation is to drop and recreate the replication slot, which
means either re-copying all the data to the downstream system or
figuring out a way to resync it. Some PG users have 3rd party tools like
HVR which can do row-by-row resync (IIUC), but no matter how you slice
it, we're talking about a lot of pain for people replicating large data
sets between multiple systems. In most cases, the only/best option even
with very large tables is to just make a fresh copy of the data - which
can translate to a business outage of hours or even days.

My favorite example is the SQL function pg_replication_slot_advance() -
this could really help PG users find less painful solutions to broken
decoding, however it's not really possible to /know/ an LSN to advance
to without inspecting WAL. ISTM there's a strong use case here for a SQL
interface on WAL inspection.

.

Second: debugging and troubleshooting logical replication and decoding bugs.

I helped track down a few logical replication bugs and get fixed into
code at postgresql.org this past year. (But I give credit to others who
are much better at C than I am, and who did a lot more work than I did
on these bugs!)

Logical decoding bugs are some of the hardest to fix - because all you
have is a WAL stream, but you don't know the SQL or workload patterns or
PG code paths which created that WAL stream.

Dumping the WAL - knowing which objects and which types of operations
are involved and stats like number of changes or number of
subtransactions - this helps identify which transaction and what SQL in
the application triggered the failure, which can help find short-term
workarounds. Businesses need those short-term workarounds so they can
keep going while we work on finding and fixing bugs, which can take some
time. This is another place where I think a SQL interface to WAL would
be helpful to PG users. Especially the ability to filter and trace a
single transaction through a large number of WAL files in the directory.

.

Third: statistics on WAL - especially full page writes. Giving users the
full power of SQL allows much more sophisticated analysis of the WAL
records. Personally, I'd probably find myself importing all the WAL
stats into the DB anyway because SQL is my preferred data manipulation
language.


>> Having the extension in core is actually the only way to avoid
>> duplicated effort, by having shared code which the pg_dump binary and
>> the extension both wrap or call.
> 
> Uh, aren't you duplicating code by having pg_waldump as a command-line
> tool and an extension?

Well this whole conversation is just theoretical anyway until the code
is shared.  :)  But if Bharath is writing functions to decode WAL, then
wouldn't we just have pg_waldump use these same functions in order to
avoid duplicating code?

Bharath - was some code already posted and I just missed it?

Looking at the proposed API from the initial email, I like that there's
both stats functionality and WAL record inspection functionality
(similar to pg_waldump). I like that there's the ability to pull the
records as raw bytea data, however I think we're also going to want an
ability in v1 of the patch to decode it (similar to pageinspect
heap_page_item_attrs, etc).

Another feature that might be interesting down the road would be the
ability to provide filtering of WAL records for security purposes. For
example, allowing a user to only dump raw WAL records for one particular
database, or maybe excluding WAL records that change system catalogs or
the like. But I probably wouldn't start here, personally.

Now then.... as Blaise Pascal said in 1657 (and as was also said by
Winston Churchill, Mark Twain, etc).... "I'm sorry I wrote you such a
long letter; I didn't have time to write a short one."

-Jeremy


-- 
http://about.me/jeremy_schneider



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bruce Momjian
Дата:
On Wed, Oct  6, 2021 at 09:56:34AM -0700, Jeremy Schneider wrote:
> On 10/5/21 17:43, Bruce Momjian wrote:
> > On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> >> Specifically exposing pg_waldump functionality in SQL could speed up
> >> finding bugs in the PG logical replication code. We found and fixed a
> >> few over this past year, but there are more lurking out there.
> > 
> > Uh, why is pg_waldump more important than other command line tool
> > information?
> 
> Going down the list of all other utilities in src/bin:
> 
> * pg_amcheck, pg_config, pg_controldata: already available in SQL
> * psql, pgbench, pg_dump: already available as client-side apps
> * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
> pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
> possible use case outside server OS access, most of these are too low
> level and don't even make sense in SQL
> * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
> don't feel any deep interest myself
> 
> Speaking selfishly, there are a few reasons I would be specifically
> interested in pg_waldump (the only remaining one on the list).

This is the analysis I was looking for to understand if copying the
features of command-line tools in extensions was a wise direction.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Alvaro Herrera
Дата:
On 2021-Oct-06, Jeremy Schneider wrote:

> Well this whole conversation is just theoretical anyway until the code
> is shared.  :)  But if Bharath is writing functions to decode WAL, then
> wouldn't we just have pg_waldump use these same functions in order to
> avoid duplicating code?

Actually, a lot of the code is already shared, since the rmgrdesc
routines are in src/backend.  Keep in mind that it was there before
pg_xlogdump existed, to support WAL_DEBUG.  When pg_xlogdump was added,
what we did was allow that backend-only code be compilable in a frontend
environment.  Also, we already have xlogreader.

So pg_waldump itself is mostly scaffolding to let the frontend
environment get argument values to pass to backend-enabled code.  The
only really interesting, novel thing is the --stats mode ... and I bet
you can write that with some SQL-level aggregation of the raw data, no
need for any C code.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Oct 6, 2021 at 10:26 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
>
> On 10/5/21 17:43, Bruce Momjian wrote:
> > On Tue, Oct  5, 2021 at 03:30:07PM -0700, Jeremy Schneider wrote:
> >> Specifically exposing pg_waldump functionality in SQL could speed up
> >> finding bugs in the PG logical replication code. We found and fixed a
> >> few over this past year, but there are more lurking out there.
> >
> > Uh, why is pg_waldump more important than other command line tool
> > information?
>
> Going down the list of all other utilities in src/bin:
>
> * pg_amcheck, pg_config, pg_controldata: already available in SQL
> * psql, pgbench, pg_dump: already available as client-side apps
> * initdb, pg_archivecleanup, pg_basebackup, pg_checksums, pg_ctl,
> pg_resetwal, pg_rewind, pg_upgrade, pg_verifybackup: can't think of any
> possible use case outside server OS access, most of these are too low
> level and don't even make sense in SQL
> * pg_test_fsync, pg_test_timing: marginally interesting ideas in SQL,
> don't feel any deep interest myself
>
> Speaking selfishly, there are a few reasons I would be specifically
> interested in pg_waldump (the only remaining one on the list).

Thanks Jeremy for the analysis.

> First, to better support existing features around logical replication
> and decoding.
>
> In particular, it seems inconsistent to me that all the replication
> management SQL functions take LSNs as arguments - and yet there's no
> SQL-based way to find the LSNs that you are supposed to pass into these
> functions.
>
> https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-REPLICATION
>
> Over the past few years I've been pulled in to help several large PG
> users who ran into these bugs, and it's very painful - because the only
> real remediation is to drop and recreate the replication slot, which
> means either re-copying all the data to the downstream system or
> figuring out a way to resync it. Some PG users have 3rd party tools like
> HVR which can do row-by-row resync (IIUC), but no matter how you slice
> it, we're talking about a lot of pain for people replicating large data
> sets between multiple systems. In most cases, the only/best option even
> with very large tables is to just make a fresh copy of the data - which
> can translate to a business outage of hours or even days.
>
> My favorite example is the SQL function pg_replication_slot_advance() -
> this could really help PG users find less painful solutions to broken
> decoding, however it's not really possible to /know/ an LSN to advance
> to without inspecting WAL. ISTM there's a strong use case here for a SQL
> interface on WAL inspection.
>
> Second: debugging and troubleshooting logical replication and decoding bugs.
>
> I helped track down a few logical replication bugs and get fixed into
> code at postgresql.org this past year. (But I give credit to others who
> are much better at C than I am, and who did a lot more work than I did
> on these bugs!)
>
> Logical decoding bugs are some of the hardest to fix - because all you
> have is a WAL stream, but you don't know the SQL or workload patterns or
> PG code paths which created that WAL stream.
>
> Dumping the WAL - knowing which objects and which types of operations
> are involved and stats like number of changes or number of
> subtransactions - this helps identify which transaction and what SQL in
> the application triggered the failure, which can help find short-term
> workarounds. Businesses need those short-term workarounds so they can
> keep going while we work on finding and fixing bugs, which can take some
> time. This is another place where I think a SQL interface to WAL would
> be helpful to PG users. Especially the ability to filter and trace a
> single transaction through a large number of WAL files in the directory.
>
> Third: statistics on WAL - especially full page writes. Giving users the
> full power of SQL allows much more sophisticated analysis of the WAL
> records. Personally, I'd probably find myself importing all the WAL
> stats into the DB anyway because SQL is my preferred data manipulation
> language.

Just to add to the above points, with the new extension pg_walinspect
we will have following advantages:
1) Usability - SQL callable functions will be easier to use for the
users/admins/developers.
2) Access Control - we can provide better access control for the WAL data/stats.
3) Emitting the actual WAL data(as bytea structure) and stats via SQL
callable functions will help to analyze and answer questions like how
much WAL data is being generated in the system, what kind of WAL data
it is, how many FPWs are happening and so on. Jermey has already given
more realistic use cases.
4) I came across this -  there's a similar capability in SQL server -
https://www.mssqltips.com/sqlservertip/3076/how-to-read-the-sql-server-database-transaction-log/

> >> Having the extension in core is actually the only way to avoid
> >> duplicated effort, by having shared code which the pg_dump binary and
> >> the extension both wrap or call.
> >
> > Uh, aren't you duplicating code by having pg_waldump as a command-line
> > tool and an extension?
>
> Well this whole conversation is just theoretical anyway until the code
> is shared.  :)  But if Bharath is writing functions to decode WAL, then
> wouldn't we just have pg_waldump use these same functions in order to
> avoid duplicating code?
>
> Bharath - was some code already posted and I just missed it?
>
> Looking at the proposed API from the initial email, I like that there's
> both stats functionality and WAL record inspection functionality
> (similar to pg_waldump). I like that there's the ability to pull the
> records as raw bytea data, however I think we're also going to want an
> ability in v1 of the patch to decode it (similar to pageinspect
> heap_page_item_attrs, etc).

I'm yet to start working on the patch. I will be doing it soon.

> Another feature that might be interesting down the road would be the
> ability to provide filtering of WAL records for security purposes. For
> example, allowing a user to only dump raw WAL records for one particular
> database, or maybe excluding WAL records that change system catalogs or
> the like. But I probably wouldn't start here, personally.

+1.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Oct 7, 2021 at 10:43 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Looking at the proposed API from the initial email, I like that there's
> > both stats functionality and WAL record inspection functionality
> > (similar to pg_waldump). I like that there's the ability to pull the
> > records as raw bytea data, however I think we're also going to want an
> > ability in v1 of the patch to decode it (similar to pageinspect
> > heap_page_item_attrs, etc).
>
> I'm yet to start working on the patch. I will be doing it soon.

Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentation part now, I will be doing it a bit later.

Please feel free to review and provide your thoughts.

Regards,
Bharath Rupireddy.
Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Nov 18, 2021 at 6:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Oct 7, 2021 at 10:43 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Looking at the proposed API from the initial email, I like that there's
> > > both stats functionality and WAL record inspection functionality
> > > (similar to pg_waldump). I like that there's the ability to pull the
> > > records as raw bytea data, however I think we're also going to want an
> > > ability in v1 of the patch to decode it (similar to pageinspect
> > > heap_page_item_attrs, etc).
> >
> > I'm yet to start working on the patch. I will be doing it soon.
>
> Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the documentation
partnow, I will be doing it a bit later.
 
>
> Please feel free to review and provide your thoughts.

The v1 patch set was failing to compile on Windows. Here's the v2
patch set fixing that.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Nov 25, 2021 at 3:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the
documentationpart now, I will be doing it a bit later.
 
> >
> > Please feel free to review and provide your thoughts.
>
> The v1 patch set was failing to compile on Windows. Here's the v2
> patch set fixing that.

I forgot to specify this: the v1 patch set was failing to compile on
Windows with errors shown at [1]. Thanks to Julien Rouhaud who
suggested to use PGDLLIMPORT in an off-list discussion.

[1] (Link target) ->
  pg_walinspect.obj : error LNK2001: unresolved external symbol
forkNames [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
  pg_walinspect.obj : error LNK2001: unresolved external symbol
pg_comp_crc32c [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
  pg_walinspect.obj : error LNK2001: unresolved external symbol
wal_segment_size [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
  pg_walinspect.obj : error LNK2001: unresolved external symbol
RmgrTable [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
  .\Release\pg_walinspect\pg_walinspect.dll : fatal error LNK1120: 4
unresolved externals [C:\Users\bhara\postgres\pg_walinspect.vcxproj]

    5 Error(s)

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Nov 25, 2021 at 5:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Nov 25, 2021 at 3:49 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Thanks all. Here's the v1 patch set for the new extension pg_walinspect. Note that I didn't include the
documentationpart now, I will be doing it a bit later.
 
> > >
> > > Please feel free to review and provide your thoughts.
> >
> > The v1 patch set was failing to compile on Windows. Here's the v2
> > patch set fixing that.
>
> I forgot to specify this: the v1 patch set was failing to compile on
> Windows with errors shown at [1]. Thanks to Julien Rouhaud who
> suggested to use PGDLLIMPORT in an off-list discussion.
>
> [1] (Link target) ->
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> forkNames [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> pg_comp_crc32c [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> wal_segment_size [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   pg_walinspect.obj : error LNK2001: unresolved external symbol
> RmgrTable [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>   .\Release\pg_walinspect\pg_walinspect.dll : fatal error LNK1120: 4
> unresolved externals [C:\Users\bhara\postgres\pg_walinspect.vcxproj]
>
>     5 Error(s)

Here's the v3 patch-set with fixes for the compiler warnings reported
in the cf bot at
https://cirrus-ci.com/task/4979131497578496?logs=gcc_warning#L506.

Please review.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Greg Stark
Дата:
So I looked at this patch and I have the same basic question as Bruce.
Do we really want to expose every binary tool associated with Postgres
as an extension? Obviously this is tempting for cloud provider users
which is not an unreasonable argument. But it does have consequences.

1) Some things like pg_waldump are running code that is not normally
under user control. This could have security issues or reliability
issues.

On that front I'm especially concerned that pg_verify_raw_wal_record()
for example would let an attacker feed arbitrary hand crafted xlog
records into the parser which is not normally something a user can do.
If they feed it something it's not expecting it might be easy to cause
a crash and server restart.

There's also a bit of concern about data retention. Generally in
Postgres when rows are deleted there's very weak guarantees about the
data really being wiped. We definitely don't wipe it from disk of
course. And things like pageinspect could expose it long after it's
been deleted. But one might imagine after pageinspect shows it's gone
and/or after a vacuum full the data is actually purged. But then
something like pg_walinspect would make even that insufficient.

2) There's no documentation. I'm guessing you hesitated to write
documentation until the interface is settled but actually sometimes
writing documentation helps expose things in the interface that look
strange when you try to explain them.

3) And the interface does look a bit strange. Like what's the deal
with pg_get_wal_record_info_2() ? I gather it's just a SRF version of
pg_get_wal_record_info() but that's a strange name. And then what's
the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be
sufficient even for the specific case of a single record?

And the stats functions seem a bit out of place to me. If the SRF
returned the data in the right format the user should be able to do
aggregate queries to generate these stats easily enough. If anything a
simple SQL function to do the aggregations could be provided.

Now this is starting to get into the realm of bikeshedding but... Some
of the code is taken straight from pg_waldump and does things like:

+ appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",

But that's kind of out of place for an SQL interface. It makes it hard
to write queries since things like the relid, block number etc are in
the string. If I wanted to use these functions I would expect to be
doing something like "select all the decoded records pertaining to
block n".

All that said, I don't want to gatekeep based on this kind of
criticism. The existing code is based on pg_waldump and if we want an
extension to expose that then that's a reasonable place to start. We
can work on a better format for the data later it doesn't mean we
shouldn't start with something we have today.

4) This isn't really an issue with your patch at all but why on earth
do we have a bitvector for WAL compression methods?! Like, what does
it mean to have multiple compression methods set? That should just be
a separate field with values for each type of compression surely?

I suppose this raises the issue of what happens if someone fixes that.
They'll now have to update pg_waldump *and* pg_walinspect. I don't
think that would actually be a lot of work but it's definitely more
than just one. Also, perhaps they should be in the same contrib
directory so at least people won't forget there are two.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Greg Stark
Дата:
Additionally I've looked at the tests and I'm not sure but I don't
think this arrangement is going to work. I don't have the time to run
CLOBBER_CACHE and CLOBBER_CACHE_ALWAYS tests but I know they run
*really* slowly. So the test can't just do a CHECKPOINT and then trust
that the next few transactions will still be in the wal to decode
later. There could have been many more timed checkpoints in between.

I think the way to do it is to create either a backup label or a
replication slot. Then you can inspect the lsn of the label or slot
and decode all transactions between that point and the current point.

I also think you should try to have a wider set of wal records in that
range to test decoding records with and without full page writes, with
DDL records, etc.

I do like that the tests don't actually have the decoded record info
in the test though. But they can do a minimal effort to check that the
records they think they're testing are actually being tested. Insert
into a temporary table and then run a few queries with WHERE clauses
to test for a heap insert, btree insert test the right relid is
present, and test that a full page write is present (if full page
writes are enabled I guess). You don't need an exhaustive set of
checks because you're not testing that wal logging works properly,
just that the tests aren't accidentally passing because they're not
finding any interesting records.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Tue, Feb 1, 2022 at 3:10 AM Greg Stark <stark@mit.edu> wrote:
>
> So I looked at this patch and I have the same basic question as Bruce.

Thanks a lot for the comments.

> Do we really want to expose every binary tool associated with Postgres
> as an extension? Obviously this is tempting for cloud provider users
> which is not an unreasonable argument. But it does have consequences.

Perhaps not every tool needs to be exposed, but given the advantages
that pg_walinspect can provide it's a good candidate to have it as a
core extension. Some of the advantages are - debugging, WAL analysis,
feeding WAL stats and info to dashboards to show customers and answer
their queries, RCA etc., for educational purposes - one can understand
the WAL structure, stats, different record types etc. Another nice
thing is getting raw WAL data out of the running server (of course all
the users can't get it only the allowed ones, currently users with
pg_monitor role, if required we can change it to some other predefined
role). For instance, the raw WAL data can be fed to external page
repair tools to apply on a raw page (one can get this from pageinspect
extension).

> 1) Some things like pg_waldump are running code that is not normally
> under user control. This could have security issues or reliability
> issues.

I understand this and also I think the same concern applies to
pageinspect tool which exposes getting raw page data. The
pg_walinspect functions are currently accessible by the users with
pg_monitor role, if required we can change this to some other
predefined role.

> On that front I'm especially concerned that pg_verify_raw_wal_record()
> for example would let an attacker feed arbitrary hand crafted xlog
> records into the parser which is not normally something a user can do.
> If they feed it something it's not expecting it might be easy to cause
> a crash and server restart.

This function does nothing (no writes) to the server but just checking
the CRC of the WAL record. If at all one can make the server crash
with an input, then that should be a problem with the server code
which needs to be fixed. But IMO this function doesn't have a concern
as such.

> There's also a bit of concern about data retention. Generally in
> Postgres when rows are deleted there's very weak guarantees about the
> data really being wiped. We definitely don't wipe it from disk of
> course. And things like pageinspect could expose it long after it's
> been deleted. But one might imagine after pageinspect shows it's gone
> and/or after a vacuum full the data is actually purged. But then
> something like pg_walinspect would make even that insufficient.

The idea of pg_walinspect is to get the WAL info, data and stats out
of a running postgres server, if the WAL isn't available, the
functions would say that.

> 2) There's no documentation. I'm guessing you hesitated to write
> documentation until the interface is settled but actually sometimes
> writing documentation helps expose things in the interface that look
> strange when you try to explain them.

I will send out the new patch set with documentation soon.

> 3) And the interface does look a bit strange. Like what's the deal
> with pg_get_wal_record_info_2() ? I gather it's just a SRF version of
> pg_get_wal_record_info() but that's a strange name. And then what's
> the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be
> sufficient even for the specific case of a single record?

I agree, pg_get_wal_record_info_2 is a poor naming.
pg_get_wal_record_info_2 takes range of LSN (start and end) to give
the wal info, whereas pg_get_wal_record_info just takes one LSN. Maybe
I will change pg_get_wal_record_info_2 to pg_get_wal_record_info_range
or pg_get_wal_records_info or someother namign is better? If the
suggestion is to overload pg_get_wal_record_info one with single LSN
and another with start and end LSNs, I'm okay with that too.
Otherwise, I can have pg_get_wal_record_info with start and end LSN
(end LSN default to NULL) and return setof record.

> And the stats functions seem a bit out of place to me. If the SRF
> returned the data in the right format the user should be able to do
> aggregate queries to generate these stats easily enough. If anything a
> simple SQL function to do the aggregations could be provided.
>
> Now this is starting to get into the realm of bikeshedding but... Some
> of the code is taken straight from pg_waldump and does things like:
>
> + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",
>
> But that's kind of out of place for an SQL interface. It makes it hard
> to write queries since things like the relid, block number etc are in
> the string. If I wanted to use these functions I would expect to be
> doing something like "select all the decoded records pertaining to
> block n".

I will think more about this and change it in the next version of the
patch set, perhaps I will add more columns to the functions.

> All that said, I don't want to gatekeep based on this kind of
> criticism. The existing code is based on pg_waldump and if we want an
> extension to expose that then that's a reasonable place to start. We
> can work on a better format for the data later it doesn't mean we
> shouldn't start with something we have today.

IMO, we can always extend the functions in future, once the
pg_walinspect extension gets in with minimum number of much-required
and basic functions.

> I suppose this raises the issue of what happens if someone fixes that.
> They'll now have to update pg_waldump *and* pg_walinspect. I don't
> think that would actually be a lot of work but it's definitely more
> than just one. Also, perhaps they should be in the same contrib
> directory so at least people won't forget there are two.

Currently, all the tools are placed in src/bin and extensions are in
contrib directory. I don't think we ever keep the extension in src/bin
or vice versa. Having said, this maybe we can add comments on having
to change/fix in both pg_waldump and pg_walinspect. We also have to
deal with this situation in some of the existing tools such as
pg_controldata.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Robert Haas
Дата:
On Mon, Jan 31, 2022 at 4:40 PM Greg Stark <stark@mit.edu> wrote:
> So I looked at this patch and I have the same basic question as Bruce.
> Do we really want to expose every binary tool associated with Postgres
> as an extension? Obviously this is tempting for cloud provider users
> which is not an unreasonable argument. But it does have consequences.
>
> 1) Some things like pg_waldump are running code that is not normally
> under user control. This could have security issues or reliability
> issues.

For what it's worth, I am generally in favor of having something like
this in PostgreSQL. I think it's wrong of us to continue assuming that
everyone has command-line access. Even when that's true, it's not
necessarily convenient. If you choose to use a relational database,
you may be the sort of person who likes SQL. And if you are, you may
want to have the database tell you what's going on via SQL rather than
command-line tools or operating system utilities. Imagine if we didn't
have pg_stat_activity and you had to get that information by running a
separate binary. Would anyone like that? Why is this case any
different?

A few years ago we exposed data from pg_control via SQL and similar
concerns were raised - but it turns out to be pretty useful. I don't
know why this shouldn't be equally useful. Sure, there's some
duplication in functionality, but it's not a huge maintenance burden
for the project, and people (including me) like having it available. I
think the same things will be true here.

If decoding WAL causes security problems, that's something we better
fix, because WAL is constantly decoded on standbys and via logical
decoding on systems all over the place. I agree that we can't let
users supply their own hand-crafted WAL records to be decoded without
causing more trouble than we can handle, but if it's not safe to
decode the WAL the system generated than we are in a lot of trouble
already.

I hasten to say that I'm not endorsing every detail or indeed any
detail of the proposed patch, and some of the concerns you mention
later sound well-founded to me. But I disagree with the idea that we
shouldn't have both a command-line utility that roots through files on
disk and an SQL interface that works with a running system.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Sun, Feb 6, 2022 at 9:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jan 31, 2022 at 4:40 PM Greg Stark <stark@mit.edu> wrote:
> > So I looked at this patch and I have the same basic question as Bruce.
> > Do we really want to expose every binary tool associated with Postgres
> > as an extension? Obviously this is tempting for cloud provider users
> > which is not an unreasonable argument. But it does have consequences.
> >
> > 1) Some things like pg_waldump are running code that is not normally
> > under user control. This could have security issues or reliability
> > issues.
>
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL. And if you are, you may
> want to have the database tell you what's going on via SQL rather than
> command-line tools or operating system utilities. Imagine if we didn't
> have pg_stat_activity and you had to get that information by running a
> separate binary. Would anyone like that? Why is this case any
> different?
>
> A few years ago we exposed data from pg_control via SQL and similar
> concerns were raised - but it turns out to be pretty useful. I don't
> know why this shouldn't be equally useful. Sure, there's some
> duplication in functionality, but it's not a huge maintenance burden
> for the project, and people (including me) like having it available. I
> think the same things will be true here.
>
> If decoding WAL causes security problems, that's something we better
> fix, because WAL is constantly decoded on standbys and via logical
> decoding on systems all over the place. I agree that we can't let
> users supply their own hand-crafted WAL records to be decoded without
> causing more trouble than we can handle, but if it's not safe to
> decode the WAL the system generated than we are in a lot of trouble
> already.
>
> I hasten to say that I'm not endorsing every detail or indeed any
> detail of the proposed patch, and some of the concerns you mention
> later sound well-founded to me. But I disagree with the idea that we
> shouldn't have both a command-line utility that roots through files on
> disk and an SQL interface that works with a running system.

Thanks Robert for your comments.

> + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",
>
> But that's kind of out of place for an SQL interface. It makes it hard
> to write queries since things like the relid, block number etc are in
> the string. If I wanted to use these functions I would expect to be
> doing something like "select all the decoded records pertaining to
> block n".

Thanks Greg for your review of the patches. Since there can be
multiple blkref for WAL record type HEAP2 (for multi inserts
basically) [1], I couldn't find a better way to break it and represent
it as a non-text column. IMO this is simpler and users can easily find
out answers to "how many WAL records my relation generated between
lsn1 and lsn2 or how many WAL records of type Heap exist and so on?",
see [2]. I've also added a test case to just show this in 0002 patch.

Here's the v4 patch set that has the following changes along with
Greg's review comments addressed:

1) Added documentation as 0003 patch.
2) Removed CHECKPOINT commands from tests as it is unnecessary.
3) Added input validation code and tests.
4) A few more comments have been added.
5) Currently, only superusers can create the extension, but users with
the pg_monitor role can use the functions.
6) Test cases are basic yet they cover all the functions, error cases
with input validations, I don't think we need to add many more test
cases as suggested upthread, but I'm open to add a few more if I miss
any use-case.

Please review the v4 patch set further and let me know your thoughts.

[1]
rmgr: Heap2       len (rec/tot):     64/  8256, tx:          0, lsn:
0/014A9070, prev 0/014A8FF8, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0 FPW, blkref #1: rel
1663/12757/16384 blk 0
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB0C8, prev 0/014A9070, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 1
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB108, prev 0/014AB0C8, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 2
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB148, prev 0/014AB108, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 3
rmgr: Heap2       len (rec/tot):     59/    59, tx:          0, lsn:
0/014AB188, prev 0/014AB148, desc: VISIBLE cutoff xid 709 flags 0x01,
blkref #0: rel 1663/12757/16384 fork vm blk 0, blkref #1: rel
1663/12757/16384 blk 4

[2]
postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98',
'0/0157A160') where block_ref like '%16384%' and rmgr like 'Heap';
 count
-------
 10100
(1 row)

postgres=# select count(*) from t1;
 count
-------
 10100
(1 row)

postgres=#

postgres=# select count(*) from pg_get_wal_records_info('0/13C0A98',
'0/0157A160') where block_ref like '%FPW%';
 count
-------
    78
(1 row)

postgres=#

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Peter Geoghegan
Дата:
On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL. And if you are, you may
> want to have the database tell you what's going on via SQL rather than
> command-line tools or operating system utilities. Imagine if we didn't
> have pg_stat_activity and you had to get that information by running a
> separate binary. Would anyone like that? Why is this case any
> different?

+1. An SQL interface is significantly easier to work with. Especially
because it can use the built-in LSN type, pg_lsn.

I don't find the slippery slope argument convincing. There aren't that
many other things that are like pg_waldump, but haven't already been
exposed via an SQL interface. Offhand, I can't think of any.

-- 
Peter Geoghegan



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Andrew Dunstan
Дата:
On 2/6/22 10:45, Robert Haas wrote:
> For what it's worth, I am generally in favor of having something like
> this in PostgreSQL. I think it's wrong of us to continue assuming that
> everyone has command-line access. Even when that's true, it's not
> necessarily convenient. If you choose to use a relational database,
> you may be the sort of person who likes SQL.



Almost completely off topic, but this reminded me of an incident about
30 years ago at my first gig as an SA/DBA. There was an application
programmer who insisted on loading a set of values from a text file into
a temp table (it was Ingres, anyone remember that?). Why? Because he
knew how to write "Select * from mytable order by mycol" but didn't know
how to drive the Unix sort utility at the command line. When I was
unable to restrain myself from smiling at this he got very angry and
yelled at me loudly.

So, yes, some people do like SQL and hate the command line.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Feb 10, 2022 at 9:55 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > For what it's worth, I am generally in favor of having something like
> > this in PostgreSQL. I think it's wrong of us to continue assuming that
> > everyone has command-line access. Even when that's true, it's not
> > necessarily convenient. If you choose to use a relational database,
> > you may be the sort of person who likes SQL. And if you are, you may
> > want to have the database tell you what's going on via SQL rather than
> > command-line tools or operating system utilities. Imagine if we didn't
> > have pg_stat_activity and you had to get that information by running a
> > separate binary. Would anyone like that? Why is this case any
> > different?
>
> +1. An SQL interface is significantly easier to work with. Especially
> because it can use the built-in LSN type, pg_lsn.
>
> I don't find the slippery slope argument convincing. There aren't that
> many other things that are like pg_waldump, but haven't already been
> exposed via an SQL interface. Offhand, I can't think of any.

On Sat, Feb 12, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote:
>
> Almost completely off topic, but this reminded me of an incident about
> 30 years ago at my first gig as an SA/DBA. There was an application
> programmer who insisted on loading a set of values from a text file into
> a temp table (it was Ingres, anyone remember that?). Why? Because he
> knew how to write "Select * from mytable order by mycol" but didn't know
> how to drive the Unix sort utility at the command line. When I was
> unable to restrain myself from smiling at this he got very angry and
> yelled at me loudly.
>
> So, yes, some people do like SQL and hate the command line.

Thanks a lot  for the comments. I'm looking forward to the review of
the latest v4 patches posted at [1].

[1] https://www.postgresql.org/message-id/CALj2ACUS9%2B54QGPtUjk76dcYW-AMKp3hPe-U%2BpQo2-GpE4kjtA%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
Here are few comments:

+/*
+ * Verify the authenticity of the given raw WAL record.
+ */
+Datum
+pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
+{


Do we really need this function? I see that whenever the record is
read, we verify it. So could there be a scenario where any of these
functions would return an invalid WAL record?

--

Should we add a function that returns the pointer to the first and
probably the last WAL record in the WAL segment? This would help users
to inspect the wal records in the entire wal segment if they wish to
do so.

--

+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);

I think we should allow all these functions to be executed in wait and
*nowait* mode. If a user specifies nowait mode, the function should
return if no WAL data is present, rather than waiting for new WAL data
to become available, default behaviour could be anything you like.

--

+Datum
+pg_get_wal_records_info(PG_FUNCTION_ARGS)
+{
+#define PG_GET_WAL_RECORDS_INFO_COLS 10


We could probably have another variant of this function that would
work even if the end pointer is not specified, in which case the
default end pointer would be the last WAL record in the WAL segment.
Currently it mandates the use of an end pointer which slightly reduces
flexibility.

--

+
+/*
+ * Get the first valid raw WAL record lsn.
+ */
+Datum
+pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)


I think this function should return a pointer to the nearest valid WAL
record which can be the previous WAL record to the LSN entered by the
user or the next WAL record. If a user unknowingly enters an lsn that
does not exist then in such cases we should probably return the lsn of
the previous WAL record instead of hanging or waiting for the new WAL
record to arrive.

--

Another important point I would like to mention here is - have we made
an attempt to ensure that we try to share as much of code with
pg_waldump as possible so that if any changes happens in the
pg_waldump in future it gets applied here as well and additionally it
will also reduce the code duplication.

I haven't yet looked into the code in detail. I will have a look at it
asap. thanks.

--
With Regards,
Ashutosh Sharma.

On Sat, Feb 12, 2022 at 5:03 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Feb 10, 2022 at 9:55 PM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Sun, Feb 6, 2022 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > For what it's worth, I am generally in favor of having something like
> > > this in PostgreSQL. I think it's wrong of us to continue assuming that
> > > everyone has command-line access. Even when that's true, it's not
> > > necessarily convenient. If you choose to use a relational database,
> > > you may be the sort of person who likes SQL. And if you are, you may
> > > want to have the database tell you what's going on via SQL rather than
> > > command-line tools or operating system utilities. Imagine if we didn't
> > > have pg_stat_activity and you had to get that information by running a
> > > separate binary. Would anyone like that? Why is this case any
> > > different?
> >
> > +1. An SQL interface is significantly easier to work with. Especially
> > because it can use the built-in LSN type, pg_lsn.
> >
> > I don't find the slippery slope argument convincing. There aren't that
> > many other things that are like pg_waldump, but haven't already been
> > exposed via an SQL interface. Offhand, I can't think of any.
>
> On Sat, Feb 12, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> >
> > Almost completely off topic, but this reminded me of an incident about
> > 30 years ago at my first gig as an SA/DBA. There was an application
> > programmer who insisted on loading a set of values from a text file into
> > a temp table (it was Ingres, anyone remember that?). Why? Because he
> > knew how to write "Select * from mytable order by mycol" but didn't know
> > how to drive the Unix sort utility at the command line. When I was
> > unable to restrain myself from smiling at this he got very angry and
> > yelled at me loudly.
> >
> > So, yes, some people do like SQL and hate the command line.
>
> Thanks a lot  for the comments. I'm looking forward to the review of
> the latest v4 patches posted at [1].
>
> [1] https://www.postgresql.org/message-id/CALj2ACUS9%2B54QGPtUjk76dcYW-AMKp3hPe-U%2BpQo2-GpE4kjtA%40mail.gmail.com
>
> Regards,
> Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Mon, Feb 14, 2022 at 8:32 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Here are few comments:

Thanks for reviewing the patches.

> +/*
> + * Verify the authenticity of the given raw WAL record.
> + */
> +Datum
> +pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
> +{
>
>
> Do we really need this function? I see that whenever the record is
> read, we verify it. So could there be a scenario where any of these
> functions would return an invalid WAL record?

Yes, this function can be useful. Imagine a case where raw WAL records
are fetched from one server using pg_get_wal_record_info and sent over
the network to another server (for fixing some of the corrupted data
pages or for whatever reasons), using pg_verify_raw_wal_record one can
verify authenticity.

> Should we add a function that returns the pointer to the first and
> probably the last WAL record in the WAL segment? This would help users
> to inspect the wal records in the entire wal segment if they wish to
> do so.

Good point. One can do this already with pg_get_wal_records_info and
pg_walfile_name_offset. Usually, the LSN format itself can give an
idea about the WAL file it is in.

postgres=# select lsn, pg_walfile_name_offset(lsn) from
pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
limit 1;
    lsn    |    pg_walfile_name_offset
-----------+-------------------------------
 0/5000038 | (000000010000000000000005,56)
(1 row)

postgres=# select lsn, pg_walfile_name_offset(lsn) from
pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
limit 1;
    lsn    |       pg_walfile_name_offset
-----------+-------------------------------------
 0/5FFFFC0 | (000000010000000000000005,16777152)
(1 row)

Having said that, we can always add a function or a view (with the
above sort of queries) to pg_walinspect  - given an LSN can give the
valid start record in that wal file (by following previous lsn links)
and valid end record lsn. IMO, that's not required now, maybe later
once the initial version of pg_walinspect gets committed, as we
already have a way to achieve what we wanted here.

> +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
>
> I think we should allow all these functions to be executed in wait and
> *nowait* mode. If a user specifies nowait mode, the function should
> return if no WAL data is present, rather than waiting for new WAL data
> to become available, default behaviour could be anything you like.

Currently, pg_walinspect uses read_local_xlog_page which waits in the
while(1) loop if a future LSN is specified. As read_local_xlog_page is
an implementation of XLogPageReadCB, which doesn't have a wait/nowait
parameter, if we really need a wait/nowait mode behaviour, we need to
do extra things(either add a backend-level global wait variable, set
before XLogReadRecord, if set, read_local_xlog_page can just exit
without waiting and reset after the XLogReadRecord or add an extra
bool wait variable to XLogReaderState and use it in
read_local_xlog_page).

Another problem with the wait mode is - wait until when? Because we
don't want to wait forever by specifying a really really future LSN,
maybe you could think of adding a timeout (if the future LSN hasn't
generated the given timeout, then just return). As I said upthread, I
think all of these functions can be parked to future pg_walinspect
versions once it gets committed with most-useful functions as proposed
in the v4 patch set.

> +Datum
> +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> +{
> +#define PG_GET_WAL_RECORDS_INFO_COLS 10
>
>
> We could probably have another variant of this function that would
> work even if the end pointer is not specified, in which case the
> default end pointer would be the last WAL record in the WAL segment.
> Currently it mandates the use of an end pointer which slightly reduces
> flexibility.

Last WAL record in the WAL segment may not be of much use(one can
figure out the last valid WAL record in a wal file as mentioned
above), but the WAL records info till the latest current flush LSN of
the server would be a useful functionality. But that too, can be found
using something like "select lsn, prev_lsn, resource_manager from
pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"

> +
> +/*
> + * Get the first valid raw WAL record lsn.
> + */
> +Datum
> +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
>
>
> I think this function should return a pointer to the nearest valid WAL
> record which can be the previous WAL record to the LSN entered by the
> user or the next WAL record. If a user unknowingly enters an lsn that
> does not exist then in such cases we should probably return the lsn of
> the previous WAL record instead of hanging or waiting for the new WAL
> record to arrive.

Is it useful? If there's a strong reason, how about naming
pg_get_next_valid_wal_record_lsn returning the next valid wal record
LSN and pg_get_previous_valid_wal_record_lsn returning the previous
valid wal record LSN ? If you think having two functions is too much,
then, how about pg_get_first_valid_wal_record_lsn returning both the
next valid wal record LSN and its previous wal record LSN?

> Another important point I would like to mention here is - have we made
> an attempt to ensure that we try to share as much of code with
> pg_waldump as possible so that if any changes happens in the
> pg_waldump in future it gets applied here as well and additionally it
> will also reduce the code duplication.

I tried, please have a look at the patch. Also, I added a note at the
beginning of pg_walinspect and pg_waldump to consider fixing
issues/changing the code in both the places also.

> I haven't yet looked into the code in detail. I will have a look at it
> asap. thanks.

That will be great.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Robert Haas
Дата:
On Tue, Feb 15, 2022 at 2:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > +/*
> > + * Verify the authenticity of the given raw WAL record.
> > + */
> > +Datum
> > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
> > +{
> >
> >
> > Do we really need this function? I see that whenever the record is
> > read, we verify it. So could there be a scenario where any of these
> > functions would return an invalid WAL record?
>
> Yes, this function can be useful. Imagine a case where raw WAL records
> are fetched from one server using pg_get_wal_record_info and sent over
> the network to another server (for fixing some of the corrupted data
> pages or for whatever reasons), using pg_verify_raw_wal_record one can
> verify authenticity.

As I also said before, and so did Greg, I think giving the user a way
to supply WAL records that we will then try to decode is never going
to be OK. It's going to be a recipe for security bugs and crash bugs,
and there's no compelling use case for it that I can see. I support
this patch set only to the extent that it decodes locally generated
WAL read directly from the WAL stream.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Feb 16, 2022 at 1:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Feb 15, 2022 at 2:31 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > +/*
> > > + * Verify the authenticity of the given raw WAL record.
> > > + */
> > > +Datum
> > > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
> > > +{
> > >
> > >
> > > Do we really need this function? I see that whenever the record is
> > > read, we verify it. So could there be a scenario where any of these
> > > functions would return an invalid WAL record?
> >
> > Yes, this function can be useful. Imagine a case where raw WAL records
> > are fetched from one server using pg_get_wal_record_info and sent over
> > the network to another server (for fixing some of the corrupted data
> > pages or for whatever reasons), using pg_verify_raw_wal_record one can
> > verify authenticity.
>
> As I also said before, and so did Greg, I think giving the user a way
> to supply WAL records that we will then try to decode is never going
> to be OK. It's going to be a recipe for security bugs and crash bugs,
> and there's no compelling use case for it that I can see. I support
> this patch set only to the extent that it decodes locally generated
> WAL read directly from the WAL stream.

Agreed, I will remove pg_verify_raw_wal_record function in the next
version of the patch set. Thanks.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
On Wed, Feb 16, 2022 at 1:01 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Feb 14, 2022 at 8:32 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Here are few comments:
>
> Thanks for reviewing the patches.
>
> > +/*
> > + * Verify the authenticity of the given raw WAL record.
> > + */
> > +Datum
> > +pg_verify_raw_wal_record(PG_FUNCTION_ARGS)
> > +{
> >
> >
> > Do we really need this function? I see that whenever the record is
> > read, we verify it. So could there be a scenario where any of these
> > functions would return an invalid WAL record?
>
> Yes, this function can be useful. Imagine a case where raw WAL records
> are fetched from one server using pg_get_wal_record_info and sent over
> the network to another server (for fixing some of the corrupted data
> pages or for whatever reasons), using pg_verify_raw_wal_record one can
> verify authenticity.
>

I don't think that's the use case of this patch. Unless there is some
other valid reason, I would suggest you remove it.

> > Should we add a function that returns the pointer to the first and
> > probably the last WAL record in the WAL segment? This would help users
> > to inspect the wal records in the entire wal segment if they wish to
> > do so.
>
> Good point. One can do this already with pg_get_wal_records_info and
> pg_walfile_name_offset. Usually, the LSN format itself can give an
> idea about the WAL file it is in.
>
> postgres=# select lsn, pg_walfile_name_offset(lsn) from
> pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> limit 1;
>     lsn    |    pg_walfile_name_offset
> -----------+-------------------------------
>  0/5000038 | (000000010000000000000005,56)
> (1 row)
>
> postgres=# select lsn, pg_walfile_name_offset(lsn) from
> pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> limit 1;
>     lsn    |       pg_walfile_name_offset
> -----------+-------------------------------------
>  0/5FFFFC0 | (000000010000000000000005,16777152)
> (1 row)
>

The workaround you are suggesting is not very user friendly and FYKI
pg_wal_records_info simply hangs at times when we specify the higher
and lower limit of lsn in a wal file.

To make things easier for the end users I would suggest we add a
function that can return a valid first and last lsn in a walfile. The
output of this function can be used to inspect the wal records in the
entire wal file if they wish to do so and I am sure they will. So, it
should be something like this:

select first_valid_lsn, last_valid_lsn from
pg_get_first_last_valid_wal_record('wal-segment-name');

And above function can directly be used with pg_get_wal_records_info() like

select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));

I think this is a pretty basic ASK that we expect to be present in the
module like this.

> > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> >
> > I think we should allow all these functions to be executed in wait and
> > *nowait* mode. If a user specifies nowait mode, the function should
> > return if no WAL data is present, rather than waiting for new WAL data
> > to become available, default behaviour could be anything you like.
>
> Currently, pg_walinspect uses read_local_xlog_page which waits in the
> while(1) loop if a future LSN is specified. As read_local_xlog_page is
> an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> parameter, if we really need a wait/nowait mode behaviour, we need to
> do extra things(either add a backend-level global wait variable, set
> before XLogReadRecord, if set, read_local_xlog_page can just exit
> without waiting and reset after the XLogReadRecord or add an extra
> bool wait variable to XLogReaderState and use it in
> read_local_xlog_page).
>

I am not asking to do any changes in the backend code. Please check -
how pg_waldump does this when a user requests to stop once the endptr
has reached. If not for all functions at least for a few functions we
can do this if it is doable.

>
> > +Datum
> > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > +{
> > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> >
> >
> > We could probably have another variant of this function that would
> > work even if the end pointer is not specified, in which case the
> > default end pointer would be the last WAL record in the WAL segment.
> > Currently it mandates the use of an end pointer which slightly reduces
> > flexibility.
>
> Last WAL record in the WAL segment may not be of much use(one can
> figure out the last valid WAL record in a wal file as mentioned
> above), but the WAL records info till the latest current flush LSN of
> the server would be a useful functionality. But that too, can be found
> using something like "select lsn, prev_lsn, resource_manager from
> pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
>

What if a user wants to inspect all the valid wal records from a
startptr (startlsn) and he doesn't know the endptr? Why should he/she
be mandated to get the endptr and supply it to this function? I don't
think we should force users to do that. I think this is again a very
basic ASK that can be done in this version itself. It is not at all
any advanced thing that we can think of doing in the future.

> > +
> > +/*
> > + * Get the first valid raw WAL record lsn.
> > + */
> > +Datum
> > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> >
> >
> > I think this function should return a pointer to the nearest valid WAL
> > record which can be the previous WAL record to the LSN entered by the
> > user or the next WAL record. If a user unknowingly enters an lsn that
> > does not exist then in such cases we should probably return the lsn of
> > the previous WAL record instead of hanging or waiting for the new WAL
> > record to arrive.
>
> Is it useful?

It is useful in the same way as returning the next valid wal pointer
is. Why should a user wait for the next valid wal pointer to be
available instead the function should identify the previous valid wal
record and return it and put an appropriate message to the user.

If there's a strong reason, how about naming
> pg_get_next_valid_wal_record_lsn returning the next valid wal record
> LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> valid wal record LSN ? If you think having two functions is too much,
> then, how about pg_get_first_valid_wal_record_lsn returning both the
> next valid wal record LSN and its previous wal record LSN?
>

The latter one looks better.

--
With Regards,
Ashutosh Sharma.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> I don't think that's the use case of this patch. Unless there is some
> other valid reason, I would suggest you remove it.

Removed the function pg_verify_raw_wal_record. Robert and Greg also
voted for removal upthread.

> > > Should we add a function that returns the pointer to the first and
> > > probably the last WAL record in the WAL segment? This would help users
> > > to inspect the wal records in the entire wal segment if they wish to
> > > do so.
> >
> > Good point. One can do this already with pg_get_wal_records_info and
> > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > idea about the WAL file it is in.
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> > limit 1;
> >     lsn    |    pg_walfile_name_offset
> > -----------+-------------------------------
> >  0/5000038 | (000000010000000000000005,56)
> > (1 row)
> >
> > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> > limit 1;
> >     lsn    |       pg_walfile_name_offset
> > -----------+-------------------------------------
> >  0/5FFFFC0 | (000000010000000000000005,16777152)
> > (1 row)
> >
>
> The workaround you are suggesting is not very user friendly and FYKI
> pg_wal_records_info simply hangs at times when we specify the higher
> and lower limit of lsn in a wal file.
>
> To make things easier for the end users I would suggest we add a
> function that can return a valid first and last lsn in a walfile. The
> output of this function can be used to inspect the wal records in the
> entire wal file if they wish to do so and I am sure they will. So, it
> should be something like this:
>
> select first_valid_lsn, last_valid_lsn from
> pg_get_first_last_valid_wal_record('wal-segment-name');
>
> And above function can directly be used with pg_get_wal_records_info() like
>
> select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));
>
> I think this is a pretty basic ASK that we expect to be present in the
> module like this.

Added a new function that returns the first and last valid WAL record
LSN of a given WAL file.

> > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> > >
> > > I think we should allow all these functions to be executed in wait and
> > > *nowait* mode. If a user specifies nowait mode, the function should
> > > return if no WAL data is present, rather than waiting for new WAL data
> > > to become available, default behaviour could be anything you like.
> >
> > Currently, pg_walinspect uses read_local_xlog_page which waits in the
> > while(1) loop if a future LSN is specified. As read_local_xlog_page is
> > an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> > parameter, if we really need a wait/nowait mode behaviour, we need to
> > do extra things(either add a backend-level global wait variable, set
> > before XLogReadRecord, if set, read_local_xlog_page can just exit
> > without waiting and reset after the XLogReadRecord or add an extra
> > bool wait variable to XLogReaderState and use it in
> > read_local_xlog_page).
> >
>
> I am not asking to do any changes in the backend code. Please check -
> how pg_waldump does this when a user requests to stop once the endptr
> has reached. If not for all functions at least for a few functions we
> can do this if it is doable.

I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.

With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.

> > > +Datum
> > > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > > +{
> > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> > >
> > >
> > > We could probably have another variant of this function that would
> > > work even if the end pointer is not specified, in which case the
> > > default end pointer would be the last WAL record in the WAL segment.
> > > Currently it mandates the use of an end pointer which slightly reduces
> > > flexibility.
> >
> > Last WAL record in the WAL segment may not be of much use(one can
> > figure out the last valid WAL record in a wal file as mentioned
> > above), but the WAL records info till the latest current flush LSN of
> > the server would be a useful functionality. But that too, can be found
> > using something like "select lsn, prev_lsn, resource_manager from
> > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
> >
>
> What if a user wants to inspect all the valid wal records from a
> startptr (startlsn) and he doesn't know the endptr? Why should he/she
> be mandated to get the endptr and supply it to this function? I don't
> think we should force users to do that. I think this is again a very
> basic ASK that can be done in this version itself. It is not at all
> any advanced thing that we can think of doing in the future.

Agreed. Added new functions that emits wal records info/stats till the
end of the WAL at the moment.

> > > +
> > > +/*
> > > + * Get the first valid raw WAL record lsn.
> > > + */
> > > +Datum
> > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> > >
> > >
> > > I think this function should return a pointer to the nearest valid WAL
> > > record which can be the previous WAL record to the LSN entered by the
> > > user or the next WAL record. If a user unknowingly enters an lsn that
> > > does not exist then in such cases we should probably return the lsn of
> > > the previous WAL record instead of hanging or waiting for the new WAL
> > > record to arrive.
> >
> > Is it useful?
>
> It is useful in the same way as returning the next valid wal pointer
> is. Why should a user wait for the next valid wal pointer to be
> available instead the function should identify the previous valid wal
> record and return it and put an appropriate message to the user.
>
> If there's a strong reason, how about naming
> > pg_get_next_valid_wal_record_lsn returning the next valid wal record
> > LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> > valid wal record LSN ? If you think having two functions is too much,
> > then, how about pg_get_first_valid_wal_record_lsn returning both the
> > next valid wal record LSN and its previous wal record LSN?
> >
>
> The latter one looks better.

Modified.

Attaching v5 patch set, please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

+--
+-- pg_get_wal_records_info()
+--
+CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
+    IN end_lsn pg_lsn,
+    IN wait_for_wal boolean DEFAULT false,
+    OUT lsn pg_lsn,

What does the wait_for_wal flag mean here when one has already
specified the start and end lsn? AFAIU, If a user has specified a
start and stop LSN, it means that the user knows the extent to which
he/she wants to display the WAL records in which case we need to stop
once the end lsn has reached . So what is the meaning of wait_for_wal
flag? Does it look sensible to have the wait_for_wal flag here? To me
it doesn't.

==

+--
+-- pg_get_wal_records_info_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
+    OUT lsn pg_lsn,
+    OUT prev_lsn pg_lsn,
+    OUT xid xid,

Why is this function required? Is pg_get_wal_records_info() alone not
enough? I think it is. See if we can make end_lsn optional in
pg_get_wal_records_info() and lets just have it alone. I think it can
do the job of pg_get_wal_records_info_till_end_of_wal function.

==

+--
+-- pg_get_wal_stats_till_end_of_wal()
+--
+CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
+    OUT resource_manager text,
+    OUT count int8,

Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

==


+                       if (loc <= read_upto)
+                               break;
+
+                       /* Let's not wait for WAL to be available if
indicated */
+                       if (loc > read_upto &&
+                               state->private_data != NULL)
+                       {

Why loc > read_upto? The first if condition is (loc <= read_upto)
followed by the second if condition - (loc > read_upto). Is the first
if condition (loc <= read_upto) not enough to indicate that loc >
read_upto?

==

+#define IsEndOfWALReached(state) \
+               (state->private_data != NULL && \
+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->no_wait == true) && \
+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))


I think we should either use state or xlogreader. First line says
state->private_data and second line xlogreader->private_data.

==

+               (((ReadLocalXLOGPage2Private *)
xlogreader->private_data)->reached_end_of_wal == true))
+

There is a new patch coming to make the end of WAL messages less
scary. It introduces the EOW flag in xlogreaderstate maybe we can use
that instead of introducing new flags in private area to represent the
end of WAL.

==

+/*
+ * XLogReaderRoutine->page_read callback for reading local xlog files
+ *
+ * This function is same as read_local_xlog_page except that it works in both
+ * wait and no wait mode. The callers can specify about waiting in private_data
+ * of XLogReaderState.
+ */
+int
+read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
+                                          int reqLen, XLogRecPtr
targetRecPtr, char *cur_page)
+{
+       XLogRecPtr      read_upto,

Do we really need this function? Can't we make use of an existing WAL
reader function - read_local_xlog_page()?

--
With Regards,
Ashutosh Sharma.

On Fri, Feb 25, 2022 at 4:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > I don't think that's the use case of this patch. Unless there is some
> > other valid reason, I would suggest you remove it.
>
> Removed the function pg_verify_raw_wal_record. Robert and Greg also
> voted for removal upthread.
>
> > > > Should we add a function that returns the pointer to the first and
> > > > probably the last WAL record in the WAL segment? This would help users
> > > > to inspect the wal records in the entire wal segment if they wish to
> > > > do so.
> > >
> > > Good point. One can do this already with pg_get_wal_records_info and
> > > pg_walfile_name_offset. Usually, the LSN format itself can give an
> > > idea about the WAL file it is in.
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc
> > > limit 1;
> > >     lsn    |    pg_walfile_name_offset
> > > -----------+-------------------------------
> > >  0/5000038 | (000000010000000000000005,56)
> > > (1 row)
> > >
> > > postgres=# select lsn, pg_walfile_name_offset(lsn) from
> > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc
> > > limit 1;
> > >     lsn    |       pg_walfile_name_offset
> > > -----------+-------------------------------------
> > >  0/5FFFFC0 | (000000010000000000000005,16777152)
> > > (1 row)
> > >
> >
> > The workaround you are suggesting is not very user friendly and FYKI
> > pg_wal_records_info simply hangs at times when we specify the higher
> > and lower limit of lsn in a wal file.
> >
> > To make things easier for the end users I would suggest we add a
> > function that can return a valid first and last lsn in a walfile. The
> > output of this function can be used to inspect the wal records in the
> > entire wal file if they wish to do so and I am sure they will. So, it
> > should be something like this:
> >
> > select first_valid_lsn, last_valid_lsn from
> > pg_get_first_last_valid_wal_record('wal-segment-name');
> >
> > And above function can directly be used with pg_get_wal_records_info() like
> >
> > select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment'));
> >
> > I think this is a pretty basic ASK that we expect to be present in the
> > module like this.
>
> Added a new function that returns the first and last valid WAL record
> LSN of a given WAL file.
>
> > > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> > > >
> > > > I think we should allow all these functions to be executed in wait and
> > > > *nowait* mode. If a user specifies nowait mode, the function should
> > > > return if no WAL data is present, rather than waiting for new WAL data
> > > > to become available, default behaviour could be anything you like.
> > >
> > > Currently, pg_walinspect uses read_local_xlog_page which waits in the
> > > while(1) loop if a future LSN is specified. As read_local_xlog_page is
> > > an implementation of XLogPageReadCB, which doesn't have a wait/nowait
> > > parameter, if we really need a wait/nowait mode behaviour, we need to
> > > do extra things(either add a backend-level global wait variable, set
> > > before XLogReadRecord, if set, read_local_xlog_page can just exit
> > > without waiting and reset after the XLogReadRecord or add an extra
> > > bool wait variable to XLogReaderState and use it in
> > > read_local_xlog_page).
> > >
> >
> > I am not asking to do any changes in the backend code. Please check -
> > how pg_waldump does this when a user requests to stop once the endptr
> > has reached. If not for all functions at least for a few functions we
> > can do this if it is doable.
>
> I've added a new function read_local_xlog_page_2 (similar to
> read_local_xlog_page but works in wait and no wait mode) and the
> callers can specify whether to wait or not wait using private_data.
> Actually, I wanted to use the private_data structure of
> read_local_xlog_page but the logical decoding already has context as
> private_data, that is why I had to have a new function. I know it
> creates a bit of duplicate code, but its cleaner than using
> backend-local variables or additional flags in XLogReaderState or
> adding wait/no-wait boolean to page_read callback. Any other
> suggestions are welcome here.
>
> With this, I'm able to have wait/no wait versions for any functions.
> But for now, I'm having wait/no wait for two functions
> (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> sense.
>
> > > > +Datum
> > > > +pg_get_wal_records_info(PG_FUNCTION_ARGS)
> > > > +{
> > > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10
> > > >
> > > >
> > > > We could probably have another variant of this function that would
> > > > work even if the end pointer is not specified, in which case the
> > > > default end pointer would be the last WAL record in the WAL segment.
> > > > Currently it mandates the use of an end pointer which slightly reduces
> > > > flexibility.
> > >
> > > Last WAL record in the WAL segment may not be of much use(one can
> > > figure out the last valid WAL record in a wal file as mentioned
> > > above), but the WAL records info till the latest current flush LSN of
> > > the server would be a useful functionality. But that too, can be found
> > > using something like "select lsn, prev_lsn, resource_manager from
> > > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());"
> > >
> >
> > What if a user wants to inspect all the valid wal records from a
> > startptr (startlsn) and he doesn't know the endptr? Why should he/she
> > be mandated to get the endptr and supply it to this function? I don't
> > think we should force users to do that. I think this is again a very
> > basic ASK that can be done in this version itself. It is not at all
> > any advanced thing that we can think of doing in the future.
>
> Agreed. Added new functions that emits wal records info/stats till the
> end of the WAL at the moment.
>
> > > > +
> > > > +/*
> > > > + * Get the first valid raw WAL record lsn.
> > > > + */
> > > > +Datum
> > > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS)
> > > >
> > > >
> > > > I think this function should return a pointer to the nearest valid WAL
> > > > record which can be the previous WAL record to the LSN entered by the
> > > > user or the next WAL record. If a user unknowingly enters an lsn that
> > > > does not exist then in such cases we should probably return the lsn of
> > > > the previous WAL record instead of hanging or waiting for the new WAL
> > > > record to arrive.
> > >
> > > Is it useful?
> >
> > It is useful in the same way as returning the next valid wal pointer
> > is. Why should a user wait for the next valid wal pointer to be
> > available instead the function should identify the previous valid wal
> > record and return it and put an appropriate message to the user.
> >
> > If there's a strong reason, how about naming
> > > pg_get_next_valid_wal_record_lsn returning the next valid wal record
> > > LSN and pg_get_previous_valid_wal_record_lsn returning the previous
> > > valid wal record LSN ? If you think having two functions is too much,
> > > then, how about pg_get_first_valid_wal_record_lsn returning both the
> > > next valid wal record LSN and its previous wal record LSN?
> > >
> >
> > The latter one looks better.
>
> Modified.
>
> Attaching v5 patch set, please review it further.
>
> Regards,
> Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Some review comments on v5 patch (v5-0001-pg_walinspect.patch)

Thanks for reviewing.

> +--
> +-- pg_get_wal_records_info()
> +--
> +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> +    IN end_lsn pg_lsn,
> +    IN wait_for_wal boolean DEFAULT false,
> +    OUT lsn pg_lsn,
>
> What does the wait_for_wal flag mean here when one has already
> specified the start and end lsn? AFAIU, If a user has specified a
> start and stop LSN, it means that the user knows the extent to which
> he/she wants to display the WAL records in which case we need to stop
> once the end lsn has reached . So what is the meaning of wait_for_wal
> flag? Does it look sensible to have the wait_for_wal flag here? To me
> it doesn't.

Users can always specify a future end_lsn and set wait_for_wal to
true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
wait for the WAL. IMO, this is useful. If you remember you were okay
with wait/nowait versions for some of the functions upthread [1]. I'm
not going to retain this behaviour for both
pg_get_wal_records_info/pg_get_wal_stats as it is similar to
pg_waldump's --follow option.

> ==
>
> +--
> +-- pg_get_wal_records_info_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> +    OUT lsn pg_lsn,
> +    OUT prev_lsn pg_lsn,
> +    OUT xid xid,
>
> Why is this function required? Is pg_get_wal_records_info() alone not
> enough? I think it is. See if we can make end_lsn optional in
> pg_get_wal_records_info() and lets just have it alone. I think it can
> do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> ==
>
> +--
> +-- pg_get_wal_stats_till_end_of_wal()
> +--
> +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> +    OUT resource_manager text,
> +    OUT count int8,
>
> Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?

I'm doing the following input validations for these functions to not
cause any issues with invalid LSN. If I were to have the default value
for end_lsn as 0/0, I can't perform input validations right? That is
the reason I'm having separate functions {pg_get_wal_records_info,
pg_get_wal_stats}_till_end_of_wal() versions.

    /* Validate input. */
    if (XLogRecPtrIsInvalid(start_lsn))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("invalid WAL record start LSN")));

    if (XLogRecPtrIsInvalid(end_lsn))
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("invalid WAL record end LSN")));

    if (start_lsn >= end_lsn)
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("WAL record start LSN must be less than end LSN")));

> ==
>
>
> +                       if (loc <= read_upto)
> +                               break;
> +
> +                       /* Let's not wait for WAL to be available if
> indicated */
> +                       if (loc > read_upto &&
> +                               state->private_data != NULL)
> +                       {
>
> Why loc > read_upto? The first if condition is (loc <= read_upto)
> followed by the second if condition - (loc > read_upto). Is the first
> if condition (loc <= read_upto) not enough to indicate that loc >
> read_upto?

Yeah, that's unnecessary, I improved the comment there and removed loc
> read_upto.

> ==
>
> +#define IsEndOfWALReached(state) \
> +               (state->private_data != NULL && \
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->no_wait == true) && \
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
>
> I think we should either use state or xlogreader. First line says
> state->private_data and second line xlogreader->private_data.

I've changed it to use state instead of xlogreader.

> ==
>
> +               (((ReadLocalXLOGPage2Private *)
> xlogreader->private_data)->reached_end_of_wal == true))
> +
>
> There is a new patch coming to make the end of WAL messages less
> scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> that instead of introducing new flags in private area to represent the
> end of WAL.

Yeah that would be great. But we never know which one gets committed
first. Until then it's not good to have dependencies on two "on-going"
patches. Later, we can change.

> ==
>
> +/*
> + * XLogReaderRoutine->page_read callback for reading local xlog files
> + *
> + * This function is same as read_local_xlog_page except that it works in both
> + * wait and no wait mode. The callers can specify about waiting in private_data
> + * of XLogReaderState.
> + */
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> +                                          int reqLen, XLogRecPtr
> targetRecPtr, char *cur_page)
> +{
> +       XLogRecPtr      read_upto,
>
> Do we really need this function? Can't we make use of an existing WAL
> reader function - read_local_xlog_page()?

I clearly explained the reasons upthread [2]. Please let me know if
you have more thoughts/doubts here, we can connect offlist.

Attaching v6 patch set with above review comments addressed. Please
review it further.

[1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
+PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
+PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
+PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
+PG_FUNCTION_INFO_V1(pg_get_wal_records_info);

I think we should allow all these functions to be executed in wait and
*nowait* mode. If a user specifies nowait mode, the function should
return if no WAL data is present, rather than waiting for new WAL data
to become available, default behaviour could be anything you like.

[2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
I've added a new function read_local_xlog_page_2 (similar to
read_local_xlog_page but works in wait and no wait mode) and the
callers can specify whether to wait or not wait using private_data.
Actually, I wanted to use the private_data structure of
read_local_xlog_page but the logical decoding already has context as
private_data, that is why I had to have a new function. I know it
creates a bit of duplicate code, but its cleaner than using
backend-local variables or additional flags in XLogReaderState or
adding wait/no-wait boolean to page_read callback. Any other
suggestions are welcome here.

With this, I'm able to have wait/no wait versions for any functions.
But for now, I'm having wait/no wait for two functions
(pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
sense.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
On Wed, Mar 2, 2022 at 10:37 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Some review comments on v5 patch (v5-0001-pg_walinspect.patch)
>
> Thanks for reviewing.
>
> > +--
> > +-- pg_get_wal_records_info()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> > +    IN end_lsn pg_lsn,
> > +    IN wait_for_wal boolean DEFAULT false,
> > +    OUT lsn pg_lsn,
> >
> > What does the wait_for_wal flag mean here when one has already
> > specified the start and end lsn? AFAIU, If a user has specified a
> > start and stop LSN, it means that the user knows the extent to which
> > he/she wants to display the WAL records in which case we need to stop
> > once the end lsn has reached . So what is the meaning of wait_for_wal
> > flag? Does it look sensible to have the wait_for_wal flag here? To me
> > it doesn't.
>
> Users can always specify a future end_lsn and set wait_for_wal to
> true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
> wait for the WAL. IMO, this is useful. If you remember you were okay
> with wait/nowait versions for some of the functions upthread [1]. I'm
> not going to retain this behaviour for both
> pg_get_wal_records_info/pg_get_wal_stats as it is similar to
> pg_waldump's --follow option.
>

It is not at all similar to pg_waldumps behaviour. Please check the
behaviour of pg_waldump properly. Does it wait for any wal records
when a user has specified a stop pointer? It doesn't and it shouldn't.
I mean does it even make sense to wait for the WAL when a stop pointer
is specified? And it's quite understandable that if a user has asked
pg_walinspect to stop at a certain point, it must. Also, What if there
are already WAL records after the stop pointer, in that case does it
even make sense to have a wait flag. WHat would be the meaning of the
wait flag in that case?

Further, have you checked wait_for_wal flag behaviour, is it even working?

> >
> > +--
> > +-- pg_get_wal_records_info_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT lsn pg_lsn,
> > +    OUT prev_lsn pg_lsn,
> > +    OUT xid xid,
> >
> > Why is this function required? Is pg_get_wal_records_info() alone not
> > enough? I think it is. See if we can make end_lsn optional in
> > pg_get_wal_records_info() and lets just have it alone. I think it can
> > do the job of pg_get_wal_records_info_till_end_of_wal function.
> >
> > ==
> >
> > +--
> > +-- pg_get_wal_stats_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT resource_manager text,
> > +    OUT count int8,
> >
> > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?
>
> I'm doing the following input validations for these functions to not
> cause any issues with invalid LSN. If I were to have the default value
> for end_lsn as 0/0, I can't perform input validations right? That is
> the reason I'm having separate functions {pg_get_wal_records_info,
> pg_get_wal_stats}_till_end_of_wal() versions.
>

You can do it. Please check pg_waldump to understand how it is done
there. You cannot have multiple functions doing different things when
one single function can do all the job.

> > ==
> >
> >
> > +                       if (loc <= read_upto)
> > +                               break;
> > +
> > +                       /* Let's not wait for WAL to be available if
> > indicated */
> > +                       if (loc > read_upto &&
> > +                               state->private_data != NULL)
> > +                       {
> >
> > Why loc > read_upto? The first if condition is (loc <= read_upto)
> > followed by the second if condition - (loc > read_upto). Is the first
> > if condition (loc <= read_upto) not enough to indicate that loc >
> > read_upto?
>
> Yeah, that's unnecessary, I improved the comment there and removed loc
> > read_upto.
>
> > ==
> >
> > +#define IsEndOfWALReached(state) \
> > +               (state->private_data != NULL && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->no_wait == true) && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> >
> > I think we should either use state or xlogreader. First line says
> > state->private_data and second line xlogreader->private_data.
>
> I've changed it to use state instead of xlogreader.
>
> > ==
> >
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> > +
> >
> > There is a new patch coming to make the end of WAL messages less
> > scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> > that instead of introducing new flags in private area to represent the
> > end of WAL.
>
> Yeah that would be great. But we never know which one gets committed
> first. Until then it's not good to have dependencies on two "on-going"
> patches. Later, we can change.
>
> > ==
> >
> > +/*
> > + * XLogReaderRoutine->page_read callback for reading local xlog files
> > + *
> > + * This function is same as read_local_xlog_page except that it works in both
> > + * wait and no wait mode. The callers can specify about waiting in private_data
> > + * of XLogReaderState.
> > + */
> > +int
> > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> > +                                          int reqLen, XLogRecPtr
> > targetRecPtr, char *cur_page)
> > +{
> > +       XLogRecPtr      read_upto,
> >
> > Do we really need this function? Can't we make use of an existing WAL
> > reader function - read_local_xlog_page()?
>
> I clearly explained the reasons upthread [2]. Please let me know if
> you have more thoughts/doubts here, we can connect offlist.
>
> Attaching v6 patch set with above review comments addressed. Please
> review it further.
>
> [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
> +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
>
> I think we should allow all these functions to be executed in wait and
> *nowait* mode. If a user specifies nowait mode, the function should
> return if no WAL data is present, rather than waiting for new WAL data
> to become available, default behaviour could be anything you like.
>
> [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
> I've added a new function read_local_xlog_page_2 (similar to
> read_local_xlog_page but works in wait and no wait mode) and the
> callers can specify whether to wait or not wait using private_data.
> Actually, I wanted to use the private_data structure of
> read_local_xlog_page but the logical decoding already has context as
> private_data, that is why I had to have a new function. I know it
> creates a bit of duplicate code, but its cleaner than using
> backend-local variables or additional flags in XLogReaderState or
> adding wait/no-wait boolean to page_read callback. Any other
> suggestions are welcome here.
>
> With this, I'm able to have wait/no wait versions for any functions.
> But for now, I'm having wait/no wait for two functions
> (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> sense.
>
> Regards,
> Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
Hi.

+#ifdef FRONTEND
+/*
+ * Functions that are currently not needed in the backend, but are better
+ * implemented inside xlogreader.c because of the internal facilities available
+ * here.
+ */
+
 #endif                            /* FRONTEND */

Why didn't you remove the emptied #ifdef section?

+int
+read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
+                       int reqLen, XLogRecPtr targetRecPtr, char *cur_page)

The difference with the original function is this function has one
additional if-block amid. I think we can insert the code directly in
the original function.

+            /*
+             * We are trying to read future WAL. Let's not wait for WAL to be
+             * available if indicated.
+             */
+            if (state->private_data != NULL)

However, in the first place it seems to me there's not need for the
function to take care of no_wait affairs.

If, for expample, pg_get_wal_record_info() with no_wait = true, it is
enough that the function identifies the bleeding edge of WAL then loop
until the LSN.  So I think no need for the new function, nor for any
modification on the origical function.

The changes will reduce the footprint of the patch largely, I think.

At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Some review comments on v5 patch (v5-0001-pg_walinspect.patch)
> 
> Thanks for reviewing.
> 
> > +--
> > +-- pg_get_wal_records_info()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> > +    IN end_lsn pg_lsn,
> > +    IN wait_for_wal boolean DEFAULT false,
> > +    OUT lsn pg_lsn,
> >
> > What does the wait_for_wal flag mean here when one has already
> > specified the start and end lsn? AFAIU, If a user has specified a
> > start and stop LSN, it means that the user knows the extent to which
> > he/she wants to display the WAL records in which case we need to stop
> > once the end lsn has reached . So what is the meaning of wait_for_wal
> > flag? Does it look sensible to have the wait_for_wal flag here? To me
> > it doesn't.
> 
> Users can always specify a future end_lsn and set wait_for_wal to
> true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
> wait for the WAL. IMO, this is useful. If you remember you were okay
> with wait/nowait versions for some of the functions upthread [1]. I'm
> not going to retain this behaviour for both
> pg_get_wal_records_info/pg_get_wal_stats as it is similar to
> pg_waldump's --follow option.

I agree to this for now. However, I prefer that NULL or invalid
end_lsn is equivalent to pg_current_wal_lsn().

> > ==
> >
> > +--
> > +-- pg_get_wal_records_info_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT lsn pg_lsn,
> > +    OUT prev_lsn pg_lsn,
> > +    OUT xid xid,
> >
> > Why is this function required? Is pg_get_wal_records_info() alone not
> > enough? I think it is. See if we can make end_lsn optional in
> > pg_get_wal_records_info() and lets just have it alone. I think it can
> > do the job of pg_get_wal_records_info_till_end_of_wal function.

I rather agree to Ashutosh. This feature can be covered by
pg_get_wal_records(start_lsn, NULL, false).

> > ==
> >
> > +--
> > +-- pg_get_wal_stats_till_end_of_wal()
> > +--
> > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> > +    OUT resource_manager text,
> > +    OUT count int8,
> >
> > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?
> 
> I'm doing the following input validations for these functions to not
> cause any issues with invalid LSN. If I were to have the default value
> for end_lsn as 0/0, I can't perform input validations right? That is
> the reason I'm having separate functions {pg_get_wal_records_info,
> pg_get_wal_stats}_till_end_of_wal() versions.
> 
>     /* Validate input. */
>     if (XLogRecPtrIsInvalid(start_lsn))
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("invalid WAL record start LSN")));
> 
>     if (XLogRecPtrIsInvalid(end_lsn))
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("invalid WAL record end LSN")));
> 
>     if (start_lsn >= end_lsn)
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("WAL record start LSN must be less than end LSN")));

I don't think that validations are worth doing at least for the first
two, as far as that value has a special meaning. I see it useful if
pg_get_wal_records_info() means dump the all available records for the
moment, or records of the last segment, page or something.

> > ==
> >
> >
> > +                       if (loc <= read_upto)
> > +                               break;
> > +
> > +                       /* Let's not wait for WAL to be available if
> > indicated */
> > +                       if (loc > read_upto &&
> > +                               state->private_data != NULL)
> > +                       {
> >
> > Why loc > read_upto? The first if condition is (loc <= read_upto)
> > followed by the second if condition - (loc > read_upto). Is the first
> > if condition (loc <= read_upto) not enough to indicate that loc >
> > read_upto?
> 
> Yeah, that's unnecessary, I improved the comment there and removed loc
> > read_upto.
> 
> > ==
> >
> > +#define IsEndOfWALReached(state) \
> > +               (state->private_data != NULL && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->no_wait == true) && \
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> >
> > I think we should either use state or xlogreader. First line says
> > state->private_data and second line xlogreader->private_data.
> 
> I've changed it to use state instead of xlogreader.
> 
> > ==
> >
> > +               (((ReadLocalXLOGPage2Private *)
> > xlogreader->private_data)->reached_end_of_wal == true))
> > +
> >
> > There is a new patch coming to make the end of WAL messages less
> > scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> > that instead of introducing new flags in private area to represent the
> > end of WAL.
> 
> Yeah that would be great. But we never know which one gets committed
> first. Until then it's not good to have dependencies on two "on-going"
> patches. Later, we can change.
> 
> > ==
> >
> > +/*
> > + * XLogReaderRoutine->page_read callback for reading local xlog files
> > + *
> > + * This function is same as read_local_xlog_page except that it works in both
> > + * wait and no wait mode. The callers can specify about waiting in private_data
> > + * of XLogReaderState.
> > + */
> > +int
> > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> > +                                          int reqLen, XLogRecPtr
> > targetRecPtr, char *cur_page)
> > +{
> > +       XLogRecPtr      read_upto,
> >
> > Do we really need this function? Can't we make use of an existing WAL
> > reader function - read_local_xlog_page()?
> 
> I clearly explained the reasons upthread [2]. Please let me know if
> you have more thoughts/doubts here, we can connect offlist.

*I* also think the function is not needed, as explained above.  Why do
we need that function while we know how far we can read WAL records
*before* calling the function?

> Attaching v6 patch set with above review comments addressed. Please
> review it further.
> 
> [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
> +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> 
> I think we should allow all these functions to be executed in wait and
> *nowait* mode. If a user specifies nowait mode, the function should
> return if no WAL data is present, rather than waiting for new WAL data
> to become available, default behaviour could be anything you like.
> 
> [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
> I've added a new function read_local_xlog_page_2 (similar to
> read_local_xlog_page but works in wait and no wait mode) and the
> callers can specify whether to wait or not wait using private_data.
> Actually, I wanted to use the private_data structure of
> read_local_xlog_page but the logical decoding already has context as
> private_data, that is why I had to have a new function. I know it
> creates a bit of duplicate code, but its cleaner than using
> backend-local variables or additional flags in XLogReaderState or
> adding wait/no-wait boolean to page_read callback. Any other
> suggestions are welcome here.
> 
> With this, I'm able to have wait/no wait versions for any functions.
> But for now, I'm having wait/no wait for two functions
> (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> sense.
> 
> Regards,
> Bharath Rupireddy.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Nitin Jadhav
Дата:
Here are a few comments.

1)
> > > ==
> > >
> > > +--
> > > +-- pg_get_wal_records_info_till_end_of_wal()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > > +    OUT lsn pg_lsn,
> > > +    OUT prev_lsn pg_lsn,
> > > +    OUT xid xid,
> > >
> > > Why is this function required? Is pg_get_wal_records_info() alone not
> > > enough? I think it is. See if we can make end_lsn optional in
> > > pg_get_wal_records_info() and lets just have it alone. I think it can
> > > do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> I rather agree to Ashutosh. This feature can be covered by
> pg_get_wal_records(start_lsn, NULL, false).
> I don't think that validations are worth doing at least for the first
> two, as far as that value has a special meaning. I see it useful if
> pg_get_wal_records_info() means dump the all available records for the
> moment, or records of the last segment, page or something.
> *I* also think the function is not needed, as explained above.  Why do
> we need that function while we know how far we can read WAL records
> *before* calling the function?

I agree with this. The function prototype comes first and the
validation can be done accordingly. I feel we can even support
'pg_get_wal_record_info' with the same name. All 3 function's
objectives are the same. So it is better to use the same name
(pg_wal_record_info) with different prototypes.

2) The function 'pg_get_first_valid_wal_record_lsn' looks redundant as
we are getting the same information from the function
'pg_get_first_and_last_valid_wal_record_lsn'. With this function, we
can easily fetch the first lsn. So IMO we should remove
'pg_get_first_valid_wal_record_lsn'.

3) The word 'get' should be removed from the function name(*_get_*) as
all the functions of the extension are used only to get the
information. It will also sync with xlogfuncs's naming conventions
like pg_current_wal_lsn, pg_walfile_name, etc.

4) The function names can be modified with lesser words by retaining
the existing meaning.
:s/pg_get_raw_wal_record/pg_wal_raw_record
:s/pg_get_first_valid_wal_record_lsn/pg_wal_first_lsn
:s/pg_get_first_and_last_valid_wal_record_lsn/pg_wal_first_and_last_lsn
:s/pg_get_wal_record_info/pg_wal_record_info
:s/pg_get_wal_stats/pg_wal_stats

5) Even 'pg_get_wal_stats' and 'pg_get_wal_stats_till_end_of_wal' can
be clubbed as one function.

The above comments are trying to simplify the extension APIs and to
make it easy for the user to understand and use it.

Thanks & Regards,
Nitin Jadhav




On Thu, Mar 3, 2022 at 8:20 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Hi.
>
> +#ifdef FRONTEND
> +/*
> + * Functions that are currently not needed in the backend, but are better
> + * implemented inside xlogreader.c because of the internal facilities available
> + * here.
> + */
> +
>  #endif                                                 /* FRONTEND */
>
> Why didn't you remove the emptied #ifdef section?
>
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> +                                          int reqLen, XLogRecPtr targetRecPtr, char *cur_page)
>
> The difference with the original function is this function has one
> additional if-block amid. I think we can insert the code directly in
> the original function.
>
> +                       /*
> +                        * We are trying to read future WAL. Let's not wait for WAL to be
> +                        * available if indicated.
> +                        */
> +                       if (state->private_data != NULL)
>
> However, in the first place it seems to me there's not need for the
> function to take care of no_wait affairs.
>
> If, for expample, pg_get_wal_record_info() with no_wait = true, it is
> enough that the function identifies the bleeding edge of WAL then loop
> until the LSN.  So I think no need for the new function, nor for any
> modification on the origical function.
>
> The changes will reduce the footprint of the patch largely, I think.
>
> At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > >
> > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch)
> >
> > Thanks for reviewing.
> >
> > > +--
> > > +-- pg_get_wal_records_info()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> > > +    IN end_lsn pg_lsn,
> > > +    IN wait_for_wal boolean DEFAULT false,
> > > +    OUT lsn pg_lsn,
> > >
> > > What does the wait_for_wal flag mean here when one has already
> > > specified the start and end lsn? AFAIU, If a user has specified a
> > > start and stop LSN, it means that the user knows the extent to which
> > > he/she wants to display the WAL records in which case we need to stop
> > > once the end lsn has reached . So what is the meaning of wait_for_wal
> > > flag? Does it look sensible to have the wait_for_wal flag here? To me
> > > it doesn't.
> >
> > Users can always specify a future end_lsn and set wait_for_wal to
> > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
> > wait for the WAL. IMO, this is useful. If you remember you were okay
> > with wait/nowait versions for some of the functions upthread [1]. I'm
> > not going to retain this behaviour for both
> > pg_get_wal_records_info/pg_get_wal_stats as it is similar to
> > pg_waldump's --follow option.
>
> I agree to this for now. However, I prefer that NULL or invalid
> end_lsn is equivalent to pg_current_wal_lsn().
>
> > > ==
> > >
> > > +--
> > > +-- pg_get_wal_records_info_till_end_of_wal()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > > +    OUT lsn pg_lsn,
> > > +    OUT prev_lsn pg_lsn,
> > > +    OUT xid xid,
> > >
> > > Why is this function required? Is pg_get_wal_records_info() alone not
> > > enough? I think it is. See if we can make end_lsn optional in
> > > pg_get_wal_records_info() and lets just have it alone. I think it can
> > > do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> I rather agree to Ashutosh. This feature can be covered by
> pg_get_wal_records(start_lsn, NULL, false).
>
> > > ==
> > >
> > > +--
> > > +-- pg_get_wal_stats_till_end_of_wal()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> > > +    OUT resource_manager text,
> > > +    OUT count int8,
> > >
> > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?
> >
> > I'm doing the following input validations for these functions to not
> > cause any issues with invalid LSN. If I were to have the default value
> > for end_lsn as 0/0, I can't perform input validations right? That is
> > the reason I'm having separate functions {pg_get_wal_records_info,
> > pg_get_wal_stats}_till_end_of_wal() versions.
> >
> >     /* Validate input. */
> >     if (XLogRecPtrIsInvalid(start_lsn))
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("invalid WAL record start LSN")));
> >
> >     if (XLogRecPtrIsInvalid(end_lsn))
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("invalid WAL record end LSN")));
> >
> >     if (start_lsn >= end_lsn)
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("WAL record start LSN must be less than end LSN")));
>
> I don't think that validations are worth doing at least for the first
> two, as far as that value has a special meaning. I see it useful if
> pg_get_wal_records_info() means dump the all available records for the
> moment, or records of the last segment, page or something.
>
> > > ==
> > >
> > >
> > > +                       if (loc <= read_upto)
> > > +                               break;
> > > +
> > > +                       /* Let's not wait for WAL to be available if
> > > indicated */
> > > +                       if (loc > read_upto &&
> > > +                               state->private_data != NULL)
> > > +                       {
> > >
> > > Why loc > read_upto? The first if condition is (loc <= read_upto)
> > > followed by the second if condition - (loc > read_upto). Is the first
> > > if condition (loc <= read_upto) not enough to indicate that loc >
> > > read_upto?
> >
> > Yeah, that's unnecessary, I improved the comment there and removed loc
> > > read_upto.
> >
> > > ==
> > >
> > > +#define IsEndOfWALReached(state) \
> > > +               (state->private_data != NULL && \
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->no_wait == true) && \
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->reached_end_of_wal == true))
> > >
> > > I think we should either use state or xlogreader. First line says
> > > state->private_data and second line xlogreader->private_data.
> >
> > I've changed it to use state instead of xlogreader.
> >
> > > ==
> > >
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->reached_end_of_wal == true))
> > > +
> > >
> > > There is a new patch coming to make the end of WAL messages less
> > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> > > that instead of introducing new flags in private area to represent the
> > > end of WAL.
> >
> > Yeah that would be great. But we never know which one gets committed
> > first. Until then it's not good to have dependencies on two "on-going"
> > patches. Later, we can change.
> >
> > > ==
> > >
> > > +/*
> > > + * XLogReaderRoutine->page_read callback for reading local xlog files
> > > + *
> > > + * This function is same as read_local_xlog_page except that it works in both
> > > + * wait and no wait mode. The callers can specify about waiting in private_data
> > > + * of XLogReaderState.
> > > + */
> > > +int
> > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> > > +                                          int reqLen, XLogRecPtr
> > > targetRecPtr, char *cur_page)
> > > +{
> > > +       XLogRecPtr      read_upto,
> > >
> > > Do we really need this function? Can't we make use of an existing WAL
> > > reader function - read_local_xlog_page()?
> >
> > I clearly explained the reasons upthread [2]. Please let me know if
> > you have more thoughts/doubts here, we can connect offlist.
>
> *I* also think the function is not needed, as explained above.  Why do
> we need that function while we know how far we can read WAL records
> *before* calling the function?
>
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.
> >
> > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
> > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> >
> > I think we should allow all these functions to be executed in wait and
> > *nowait* mode. If a user specifies nowait mode, the function should
> > return if no WAL data is present, rather than waiting for new WAL data
> > to become available, default behaviour could be anything you like.
> >
> > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
> > I've added a new function read_local_xlog_page_2 (similar to
> > read_local_xlog_page but works in wait and no wait mode) and the
> > callers can specify whether to wait or not wait using private_data.
> > Actually, I wanted to use the private_data structure of
> > read_local_xlog_page but the logical decoding already has context as
> > private_data, that is why I had to have a new function. I know it
> > creates a bit of duplicate code, but its cleaner than using
> > backend-local variables or additional flags in XLogReaderState or
> > adding wait/no-wait boolean to page_read callback. Any other
> > suggestions are welcome here.
> >
> > With this, I'm able to have wait/no wait versions for any functions.
> > But for now, I'm having wait/no wait for two functions
> > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> > sense.
> >
> > Regards,
> > Bharath Rupireddy.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
I think we should also see if we can allow end users to input timeline
information with the pg_walinspect functions. This will help the end
users to get information about WAL records from previous timeline
which can be helpful in case of restored servers.

--
With Regards,
Ashutosh Sharma.

On Thu, Mar 3, 2022 at 8:20 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Hi.
>
> +#ifdef FRONTEND
> +/*
> + * Functions that are currently not needed in the backend, but are better
> + * implemented inside xlogreader.c because of the internal facilities available
> + * here.
> + */
> +
>  #endif                                                 /* FRONTEND */
>
> Why didn't you remove the emptied #ifdef section?
>
> +int
> +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> +                                          int reqLen, XLogRecPtr targetRecPtr, char *cur_page)
>
> The difference with the original function is this function has one
> additional if-block amid. I think we can insert the code directly in
> the original function.
>
> +                       /*
> +                        * We are trying to read future WAL. Let's not wait for WAL to be
> +                        * available if indicated.
> +                        */
> +                       if (state->private_data != NULL)
>
> However, in the first place it seems to me there's not need for the
> function to take care of no_wait affairs.
>
> If, for expample, pg_get_wal_record_info() with no_wait = true, it is
> enough that the function identifies the bleeding edge of WAL then loop
> until the LSN.  So I think no need for the new function, nor for any
> modification on the origical function.
>
> The changes will reduce the footprint of the patch largely, I think.
>
> At Wed, 2 Mar 2022 22:37:43 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Wed, Mar 2, 2022 at 8:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > >
> > > Some review comments on v5 patch (v5-0001-pg_walinspect.patch)
> >
> > Thanks for reviewing.
> >
> > > +--
> > > +-- pg_get_wal_records_info()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
> > > +    IN end_lsn pg_lsn,
> > > +    IN wait_for_wal boolean DEFAULT false,
> > > +    OUT lsn pg_lsn,
> > >
> > > What does the wait_for_wal flag mean here when one has already
> > > specified the start and end lsn? AFAIU, If a user has specified a
> > > start and stop LSN, it means that the user knows the extent to which
> > > he/she wants to display the WAL records in which case we need to stop
> > > once the end lsn has reached . So what is the meaning of wait_for_wal
> > > flag? Does it look sensible to have the wait_for_wal flag here? To me
> > > it doesn't.
> >
> > Users can always specify a future end_lsn and set wait_for_wal to
> > true, then the pg_get_wal_records_info/pg_get_wal_stats functions can
> > wait for the WAL. IMO, this is useful. If you remember you were okay
> > with wait/nowait versions for some of the functions upthread [1]. I'm
> > not going to retain this behaviour for both
> > pg_get_wal_records_info/pg_get_wal_stats as it is similar to
> > pg_waldump's --follow option.
>
> I agree to this for now. However, I prefer that NULL or invalid
> end_lsn is equivalent to pg_current_wal_lsn().
>
> > > ==
> > >
> > > +--
> > > +-- pg_get_wal_records_info_till_end_of_wal()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_records_info_till_end_of_wal(IN start_lsn pg_lsn,
> > > +    OUT lsn pg_lsn,
> > > +    OUT prev_lsn pg_lsn,
> > > +    OUT xid xid,
> > >
> > > Why is this function required? Is pg_get_wal_records_info() alone not
> > > enough? I think it is. See if we can make end_lsn optional in
> > > pg_get_wal_records_info() and lets just have it alone. I think it can
> > > do the job of pg_get_wal_records_info_till_end_of_wal function.
>
> I rather agree to Ashutosh. This feature can be covered by
> pg_get_wal_records(start_lsn, NULL, false).
>
> > > ==
> > >
> > > +--
> > > +-- pg_get_wal_stats_till_end_of_wal()
> > > +--
> > > +CREATE FUNCTION pg_get_wal_stats_till_end_of_wal(IN start_lsn pg_lsn,
> > > +    OUT resource_manager text,
> > > +    OUT count int8,
> > >
> > > Above comment applies to this function as well. Isn't pg_get_wal_stats() enough?
> >
> > I'm doing the following input validations for these functions to not
> > cause any issues with invalid LSN. If I were to have the default value
> > for end_lsn as 0/0, I can't perform input validations right? That is
> > the reason I'm having separate functions {pg_get_wal_records_info,
> > pg_get_wal_stats}_till_end_of_wal() versions.
> >
> >     /* Validate input. */
> >     if (XLogRecPtrIsInvalid(start_lsn))
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("invalid WAL record start LSN")));
> >
> >     if (XLogRecPtrIsInvalid(end_lsn))
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("invalid WAL record end LSN")));
> >
> >     if (start_lsn >= end_lsn)
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("WAL record start LSN must be less than end LSN")));
>
> I don't think that validations are worth doing at least for the first
> two, as far as that value has a special meaning. I see it useful if
> pg_get_wal_records_info() means dump the all available records for the
> moment, or records of the last segment, page or something.
>
> > > ==
> > >
> > >
> > > +                       if (loc <= read_upto)
> > > +                               break;
> > > +
> > > +                       /* Let's not wait for WAL to be available if
> > > indicated */
> > > +                       if (loc > read_upto &&
> > > +                               state->private_data != NULL)
> > > +                       {
> > >
> > > Why loc > read_upto? The first if condition is (loc <= read_upto)
> > > followed by the second if condition - (loc > read_upto). Is the first
> > > if condition (loc <= read_upto) not enough to indicate that loc >
> > > read_upto?
> >
> > Yeah, that's unnecessary, I improved the comment there and removed loc
> > > read_upto.
> >
> > > ==
> > >
> > > +#define IsEndOfWALReached(state) \
> > > +               (state->private_data != NULL && \
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->no_wait == true) && \
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->reached_end_of_wal == true))
> > >
> > > I think we should either use state or xlogreader. First line says
> > > state->private_data and second line xlogreader->private_data.
> >
> > I've changed it to use state instead of xlogreader.
> >
> > > ==
> > >
> > > +               (((ReadLocalXLOGPage2Private *)
> > > xlogreader->private_data)->reached_end_of_wal == true))
> > > +
> > >
> > > There is a new patch coming to make the end of WAL messages less
> > > scary. It introduces the EOW flag in xlogreaderstate maybe we can use
> > > that instead of introducing new flags in private area to represent the
> > > end of WAL.
> >
> > Yeah that would be great. But we never know which one gets committed
> > first. Until then it's not good to have dependencies on two "on-going"
> > patches. Later, we can change.
> >
> > > ==
> > >
> > > +/*
> > > + * XLogReaderRoutine->page_read callback for reading local xlog files
> > > + *
> > > + * This function is same as read_local_xlog_page except that it works in both
> > > + * wait and no wait mode. The callers can specify about waiting in private_data
> > > + * of XLogReaderState.
> > > + */
> > > +int
> > > +read_local_xlog_page_2(XLogReaderState *state, XLogRecPtr targetPagePtr,
> > > +                                          int reqLen, XLogRecPtr
> > > targetRecPtr, char *cur_page)
> > > +{
> > > +       XLogRecPtr      read_upto,
> > >
> > > Do we really need this function? Can't we make use of an existing WAL
> > > reader function - read_local_xlog_page()?
> >
> > I clearly explained the reasons upthread [2]. Please let me know if
> > you have more thoughts/doubts here, we can connect offlist.
>
> *I* also think the function is not needed, as explained above.  Why do
> we need that function while we know how far we can read WAL records
> *before* calling the function?
>
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.
> >
> > [1] https://www.postgresql.org/message-id/CAE9k0P%3D9SReU_613TXytZmpwL3ZRpnC5zrf96UoNCATKpK-UxQ%40mail.gmail.com
> > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn);
> > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info);
> > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info);
> >
> > I think we should allow all these functions to be executed in wait and
> > *nowait* mode. If a user specifies nowait mode, the function should
> > return if no WAL data is present, rather than waiting for new WAL data
> > to become available, default behaviour could be anything you like.
> >
> > [2] https://www.postgresql.org/message-id/CALj2ACUtqWX95uAj2VNJED0PnixEeQ%3D0MEzpouLi%2Bzd_iTugRA%40mail.gmail.com
> > I've added a new function read_local_xlog_page_2 (similar to
> > read_local_xlog_page but works in wait and no wait mode) and the
> > callers can specify whether to wait or not wait using private_data.
> > Actually, I wanted to use the private_data structure of
> > read_local_xlog_page but the logical decoding already has context as
> > private_data, that is why I had to have a new function. I know it
> > creates a bit of duplicate code, but its cleaner than using
> > backend-local variables or additional flags in XLogReaderState or
> > adding wait/no-wait boolean to page_read callback. Any other
> > suggestions are welcome here.
> >
> > With this, I'm able to have wait/no wait versions for any functions.
> > But for now, I'm having wait/no wait for two functions
> > (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more
> > sense.
> >
> > Regards,
> > Bharath Rupireddy.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Robert Haas
Дата:
On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Added a new function that returns the first and last valid WAL record
> LSN of a given WAL file.

Sounds like fuzzy thinking to me. WAL records can cross file
boundaries, and forgetting about that leads to all sorts of problems.
Just give people one function that decodes a range of LSNs and call it
good. Why do you need anything else? If people want to get the first
record that begins in a segment or the first record any portion of
which is in a particular segment or the last record that begins in a
segment or the last record that ends in a segment or any other such
thing, they can use a WHERE clause for that... and if you think they
can't, then that should be good cause to rethink the return value of
the one-and-only SRF that I think you need here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Mar 3, 2022 at 10:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Added a new function that returns the first and last valid WAL record
> > LSN of a given WAL file.
>
> Sounds like fuzzy thinking to me. WAL records can cross file
> boundaries, and forgetting about that leads to all sorts of problems.
> Just give people one function that decodes a range of LSNs and call it
> good. Why do you need anything else? If people want to get the first
> record that begins in a segment or the first record any portion of
> which is in a particular segment or the last record that begins in a
> segment or the last record that ends in a segment or any other such
> thing, they can use a WHERE clause for that... and if you think they
> can't, then that should be good cause to rethink the return value of
> the one-and-only SRF that I think you need here.

Thanks Robert.

Thanks to others for your review comments.

Here's the v7 patch set. These patches are based on the motive that
"keep it simple and short yet effective and useful". With that in
mind, I have not implemented the wait mode for any of the functions
(as it doesn't look an effective use-case and requires adding a new
page_read callback, instead throw error if future LSN is specified),
also these functions will give WAL information on the current server's
timeline. Having said that, I'm open to adding new functions in future
once this initial version gets in, if there's a requirement and users
ask for the new functions.

Please review the v7 patch set and provide your thoughts.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
Thanks Bharath for working on all my review comments. I took a quick
look at the new version of the patch (v7-pg_walinspect.patch) and this
version looks a lot better. I'll do some detailed review later (maybe
next week or so) and let you know my further comments, if any.

--
With Regards,
Ashutosh Sharma.

On Fri, Mar 4, 2022 at 3:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 3, 2022 at 10:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Added a new function that returns the first and last valid WAL record
> > > LSN of a given WAL file.
> >
> > Sounds like fuzzy thinking to me. WAL records can cross file
> > boundaries, and forgetting about that leads to all sorts of problems.
> > Just give people one function that decodes a range of LSNs and call it
> > good. Why do you need anything else? If people want to get the first
> > record that begins in a segment or the first record any portion of
> > which is in a particular segment or the last record that begins in a
> > segment or the last record that ends in a segment or any other such
> > thing, they can use a WHERE clause for that... and if you think they
> > can't, then that should be good cause to rethink the return value of
> > the one-and-only SRF that I think you need here.
>
> Thanks Robert.
>
> Thanks to others for your review comments.
>
> Here's the v7 patch set. These patches are based on the motive that
> "keep it simple and short yet effective and useful". With that in
> mind, I have not implemented the wait mode for any of the functions
> (as it doesn't look an effective use-case and requires adding a new
> page_read callback, instead throw error if future LSN is specified),
> also these functions will give WAL information on the current server's
> timeline. Having said that, I'm open to adding new functions in future
> once this initial version gets in, if there's a requirement and users
> ask for the new functions.
>
> Please review the v7 patch set and provide your thoughts.
>
> Regards,
> Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeff Davis
Дата:
On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> 
> Attaching v6 patch set with above review comments addressed. Please
> review it further.

* Don't issue WARNINGs or other messages for ordinary situations, like
when pg_get_wal_records_info() hits the end of WAL.

* It feels like the APIs that allow waiting for the end of WAL are
slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting 
behavior? Try to make the API more orthogonal, where a few basic
functions can be combined to give you everything you need, rather than
specifying extra parameters and issuing WARNINGs. I 

* In the docs, include some example output. I don't see any output in
the tests, which makes sense because it's mostly non-deterministic, but
it would be helpful to see sample output of at least
pg_get_wal_records_info().

* Is pg_get_wal_stats() even necessary, or can you get the same
information with a query over pg_get_wal_records_info()? For instance,
if you want to group by transaction ID rather than rmgr, then
pg_get_wal_stats() is useless.

* Would be nice to have a pg_wal_file_is_valid() or similar, which
would test that it exists, and the header matches the filename (e.g. if
it was recycled but not used, that would count as invalid). I think
pg_get_first_valid_wal_record_lsn() would make some cases look invalid
even if the file is valid -- for example, if a wal record spans many
wal segments, the segments might look invalid because they contain no
complete records, but the file itself is still valid and contains valid
wal data.

* Is there a reason you didn't include the timeline ID in
pg_get_wal_records_info()?

* Can we mark this extension 'trusted'? I'm not 100% clear on the
standards for that marker, but it seems reasonable for a database owner
with the right privileges might want to install it.

* pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
function should require pg_read_server_files? Or at least
pg_read_all_data?

Regards,
    Jeff Davis





Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> >
> > Attaching v6 patch set with above review comments addressed. Please
> > review it further.

Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
upthread [1] which is having more simple-yet-useful-and-effective
functions.

> * Don't issue WARNINGs or other messages for ordinary situations, like
> when pg_get_wal_records_info() hits the end of WAL.

v7 patch-set [1] has no warnings, but the functions will error out if
future LSN is specified.

> * It feels like the APIs that allow waiting for the end of WAL are
> slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> behavior? Try to make the API more orthogonal, where a few basic
> functions can be combined to give you everything you need, rather than
> specifying extra parameters and issuing WARNINGs. I

v7 patch-set [1] onwards waiting mode has been removed for all of the
functions, again to keep things simple-yet-useful-and-effective.
However, we can always add new pg_walinspect functions that wait for
future WAL in the next versions once basic stuff gets committed and if
many users ask for it.

> * In the docs, include some example output. I don't see any output in
> the tests, which makes sense because it's mostly non-deterministic, but
> it would be helpful to see sample output of at least
> pg_get_wal_records_info().

+1. Added for pg_get_wal_records_info and pg_get_wal_stats.

> * Is pg_get_wal_stats() even necessary, or can you get the same
> information with a query over pg_get_wal_records_info()? For instance,
> if you want to group by transaction ID rather than rmgr, then
> pg_get_wal_stats() is useless.

Yes, you are right pg_get_wal_stats provides WAL stats per resource
manager which is similar to pg_waldump with --start, --end and --stats
option. It provides more information than pg_get_wal_records_info and
is a good way of getting stats than adding more columns to
pg_get_wal_records_info, calculating percentage in sql and having
group by clause. IMO, pg_get_wal_stats is more readable and useful.

> * Would be nice to have a pg_wal_file_is_valid() or similar, which
> would test that it exists, and the header matches the filename (e.g. if
> it was recycled but not used, that would count as invalid). I think
> pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> even if the file is valid -- for example, if a wal record spans many
> wal segments, the segments might look invalid because they contain no
> complete records, but the file itself is still valid and contains valid
> wal data.

Actually I haven't tried testing a single WAL record spanning many WAL
files yet(I'm happy to try it if someone suggests such a use-case). In
that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
have a problem because it just gives the next valid LSN and it's
previous LSN using existing WAL reader API XLogFindNextRecord(). It
opens up the WAL file segments using (some dots to connect -
page_read/read_local_xlog_page, WALRead,
segment_open/wal_segment_open). Thoughts?

I don't think it's necessary to have a function pg_wal_file_is_valid()
that given a WAL file name as input checks whether a WAL file exists
or not, probably not in the core (xlogfuncs.c) too. These kinds of
functions can open up challenges in terms of user input validation and
may cause unnecessary problems, please see some related discussion
[2].

> * Is there a reason you didn't include the timeline ID in
> pg_get_wal_records_info()?

I'm right now allowing the functions to read WAL from the current
server's timeline which I have mentioned in the docs. The server's
current timeline is available via pg_control_checkpoint()'s
timeline_id. So, having timeline_id as a column doesn't make sense.
Again this is to keep things simple-yet-useful-and-effective. However,
we can add new pg_walinspect functions to read WAL from historic as
well as current timelines in the next versions once basic stuff gets
committed and if many users ask for it.

+ <para>
+  All the functions of this module will provide the WAL information using the
+  current server's timeline ID.
+ </para>

> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

'trusted' extensions concept is added by commit 50fc694 [3]. Since
pg_walinspect deals with WAL, we strictly want to control who creates
and can execute functions exposed by it, so I don't know if 'trusted'
is a good idea here. Also, pageinspect isn't a 'trusted' extension.

> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

pg_read_all_data may not be the right choice, but pg_read_server_files
is. However, does it sound good if some functions are allowed to be
executed by users with a pg_monitor role and others
pg_get_raw_wal_record by users with pg_read_server_files? Since the
extension itself can be created by superusers, isn't the
pg_get_raw_wal_record sort of safe with pg_mointor itself?

 If hackers don't agree, I'm happy to grant execution on
pg_get_raw_wal_record() to the pg_read_server_files role.

Attaching the v8 patch-set resolving above comments and some tests for
checking function permissions. Please review it further.

[1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
[3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jan 29 18:42:43 2020 -0500

    Invent "trusted" extensions, and remove the pg_pltemplate catalog.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Robert Haas
Дата:
On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
> * Can we mark this extension 'trusted'? I'm not 100% clear on the
> standards for that marker, but it seems reasonable for a database owner
> with the right privileges might want to install it.

I'm not clear on the standard either, exactly, but might not that
allow the database owner to get a peek at what's happening in other
databases?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Stephen Frost
Дата:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 10, 2022 at 3:22 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
>
> I'm not clear on the standard either, exactly, but might not that
> allow the database owner to get a peek at what's happening in other
> databases?

The standard is basically that all of the functions it brings are
written to enforce the PG privilege system and you aren't able to use
the extension to bypass those privileges.  In some cases that means that
the C-language functions installed have if(!superuser) ereport() calls
throughout them- that's a fine answer, but it's perhaps not very helpful
to mark those as trusted.  In other cases, the C-language functions
installed don't directly provide access to data, such as the PostGIS
functions.

I've not looked back on this thread, but I'd expect pg_walinspect to
need those superuser checks and with those it *could* be marked as
trusted, but that again brings into question how useful it is to mark it
thusly.

In an ideal world, we might have a pg_readwal predefined role which
allows a role which was GRANT'd that role to be able to read WAL
traffic, and then the pg_walinspect extension could check that the
calling role has that predefined role, and other functions and
extensions could also check that rather than any existing superuser
checks.  A cloud provider or such could then include in their setup of a
new instance something like:

GRANT pg_readwal TO admin_user WITH ADMIN OPTION;

Presuming that there isn't anything that ends up in the WAL that's an
issue for the admin_user to have access to.

I certainly don't think we should allow either database owners or
regular users on a system the ability to access the WAL traffic of the
entire system.  More forcefully- we should *not* be throwing more access
rights towards $owners in general and should be thinking about how we
can allow admins, providers, whomever, the ability to control what
rights users are given.  If they're all lumped under 'owner' then
there's no way for people to provide granular access to just those
things they wish and intend to.

Thanks,

Stephen

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Thu, 10 Mar 2022 22:15:42 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
> 
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
> 
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
> 
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
> 
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
> 
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
> 
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
> 
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
> 
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
> 
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
> 
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
> 
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
> 
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
> 
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
> 
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
> 
> + <para>
> +  All the functions of this module will provide the WAL information using the
> +  current server's timeline ID.
> + </para>
> 
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
> 
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, pageinspect isn't a 'trusted' extension.
> 
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
> 
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is. However, does it sound good if some functions are allowed to be
> executed by users with a pg_monitor role and others
> pg_get_raw_wal_record by users with pg_read_server_files? Since the
> extension itself can be created by superusers, isn't the
> pg_get_raw_wal_record sort of safe with pg_mointor itself?
> 
>  If hackers don't agree, I'm happy to grant execution on
> pg_get_raw_wal_record() to the pg_read_server_files role.
> 
> Attaching the v8 patch-set resolving above comments and some tests for
> checking function permissions. Please review it further.
> 
> [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
> [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Wed Jan 29 18:42:43 2020 -0500
> 
>     Invent "trusted" extensions, and remove the pg_pltemplate catalog.

I played with this a bit, and would like to share some thoughts on it.

It seems to me too rigorous that pg_get_wal_records_info/stats()
reject future LSNs as end-LSN and I think WARNING or INFO and stop at
the real end-of-WAL is more kind to users.  I think the same with the
restriction that start and end LSN are required to be different.

The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
functions, they tell me the beginning of the current insertion point
in error message.  On the other hand they don't accept the same
value as end-LSN.  I think it is right that they tell the current
insertion point and they should take the end-LSN as the LSN to stop
reading.

I think pg_get_wal_stats() is worth to have but I think it should be
implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
about FPI since pg_waldump doesn't but it is internally collected (of
course!) and easily revealed. If we do that, the
pg_get_wal_records_stats() would be reduced to the following SQL
statement

SELECT resource_manager resmgr,
       count(*) AS N,
       (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
       sum(total_length) AS "combined size",
       (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
       sum(fpi_len) AS fpilen,
       (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
       FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
          GROUP by resource_manager
       WINDOW tot AS ()
       ORDER BY "combined size" desc;

The only difference with pg_waldump is the statement above doesn't
show lines for the resource managers that don't contained in the
result of pg_get_wal_records_info(). But I don't think that matters.


Sometimes the field description has very long (28kb long) content. It
makes the result output almost unreadable and I had a bit hard time
struggling with the output full of '-'s.  I would like have a default
limit on the length of such fields that can be long but I'm not sure
we want that.


The difference between pg_get_wal_record_info and _records_ other than
the number of argument is the former accepts incorrect LSNs.

The following works,
 pg_get_wal_record_info('0/1000000');
 pg_get_wal_records_info('0/1000000');

but this doesn't
 pg_get_wal_records_info('0/1000000', '0/1000000');
> ERROR:  WAL start LSN must be less than end LSN

But the following works
 pg_get_wal_records_info('0/1000000', '0/1000029');
> 0/1000028 | 0/0      |   0

So I think we can consolidate the two functions as:

- pg_get_wal_records_info('0/1000000');

  (current behavior) find the first record and show all records
  thereafter.

- pg_get_wal_records_info('0/1000000', '0/1000000');

  finds the first record since the start lsn and show it.

- pg_get_wal_records_info('0/1000000', '0/1000030');

  finds the first record since the start lsn then show records up to
  the end-lsn.


And about pg_get_raw_wal_record(). I don't see any use-case of the
function alone on SQL interface.  Even if we need to inspect broken
WAL files, it needs profound knowledge of WAL format and tools that
doesn't work on SQL interface.

However like pageinspect, if we separate the WAL-record fetching and
parsing it could be thought as useful.

pg_get_wal_records_info woule be like:

SELECT * FROM pg_walinspect_parse(raw)
 FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));

And pg_get_wal_stats woule be like:

SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
 FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
Sorry, some minor non-syntactical corrections.

At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> I played with this a bit, and would like to share some thoughts on it.
> 
> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.
> 
> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.
> 
> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
> 
> SELECT resource_manager resmgr,
>        count(*) AS N,
>        (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>        sum(total_length) AS "combined size",
>        (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
>        sum(fpi_len) AS fpilen,
>        (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
>        FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
>           GROUP by resource_manager
>        WINDOW tot AS ()
>        ORDER BY "combined size" desc;
> 
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.
> 
> 
> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.
> 
> 
- The difference between pg_get_wal_record_info and _records_ other than
- the number of argument is the former accepts incorrect LSNs.

The discussion is somewhat confused after some twists and turns..  It
should be something like the following.

pg_get_wal_record_info and pg_get_wal_records_info are almost same
since the latter can show a single record.  However it is a bit
annoying to do that. Since, other than it doens't accept same LSNs for
start and end, it doesn't show a record when there' no record in the
specfied LSN range.  But I don't think there's no usefulness of the
behavior.

The following works,
 pg_get_wal_record_info('0/1000000');
 pg_get_wal_records_info('0/1000000');

but this doesn't
 pg_get_wal_records_info('0/1000000', '0/1000000');
> ERROR:  WAL start LSN must be less than end LSN

And the following shows no records.
 pg_get_wal_records_info('0/1000000', '0/1000001');
 pg_get_wal_records_info('0/1000000', '0/1000028');

But the following works
  pg_get_wal_records_info('0/1000000', '0/1000029');
> 0/1000028 | 0/0      |   0



> So I think we can consolidate the two functions as:
> 
> - pg_get_wal_records_info('0/1000000');
> 
>   (current behavior) find the first record and show all records
>   thereafter.
> 
> - pg_get_wal_records_info('0/1000000', '0/1000000');
> 
>   finds the first record since the start lsn and show it.
> 
> - pg_get_wal_records_info('0/1000000', '0/1000030');
> 
>   finds the first record since the start lsn then show records up to
>   the end-lsn.
> 
> 
> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
> 
> However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> 
> pg_get_wal_records_info woule be like:
> 
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
> 
> And pg_get_wal_stats woule be like:
> 
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));


Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Sorry, some minor non-syntactical corrections.
>
> At Fri, 11 Mar 2022 11:38:22 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > I played with this a bit, and would like to share some thoughts on it.
> >
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
> >
> > The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> > functions, they tell me the beginning of the current insertion point
> > in error message.  On the other hand they don't accept the same
> > value as end-LSN.  I think it is right that they tell the current
> > insertion point and they should take the end-LSN as the LSN to stop
> > reading.
> >
> > I think pg_get_wal_stats() is worth to have but I think it should be
> > implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> > about FPI since pg_waldump doesn't but it is internally collected (of
> > course!) and easily revealed. If we do that, the
> > pg_get_wal_records_stats() would be reduced to the following SQL
> > statement
> >
> > SELECT resource_manager resmgr,
> >        count(*) AS N,
> >          (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
> >          sum(total_length) AS "combined size",
> >          (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
> >          sum(fpi_len) AS fpilen,
> >          (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
> >          FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
> >          GROUP by resource_manager
> >          WINDOW tot AS ()
> >          ORDER BY "combined size" desc;
> >
> > The only difference with pg_waldump is the statement above doesn't
> > show lines for the resource managers that don't contained in the
> > result of pg_get_wal_records_info(). But I don't think that matters.
> >
> >
> > Sometimes the field description has very long (28kb long) content. It
> > makes the result output almost unreadable and I had a bit hard time
> > struggling with the output full of '-'s.  I would like have a default
> > limit on the length of such fields that can be long but I'm not sure
> > we want that.
> >
> >
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.
>
> The following works,
>  pg_get_wal_record_info('0/1000000');

This does work but it doesn't show any WARNING message for the start
pointer adjustment. I think it should.

>  pg_get_wal_records_info('0/1000000');
>

I think this is fine. It should be working because the user hasn't
specified the end pointer so we assume the default end pointer is
end-of-WAL.

> but this doesn't
>  pg_get_wal_records_info('0/1000000', '0/1000000');
> > ERROR:  WAL start LSN must be less than end LSN
>

I think this behaviour is fine. We cannot have the same start and end
lsn pointers.

> And the following shows no records.
>  pg_get_wal_records_info('0/1000000', '0/1000001');
>  pg_get_wal_records_info('0/1000000', '0/1000028');
>

I think we should be erroring out here saying - couldn't find any
valid WAL record between given start and end lsn because there exists
no valid wal records between the specified start and end lsn pointers.

--
With Regards,
Ashutosh Sharma.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.
>

So, do you want the pg_get_wal_record_info function to be removed as
we can use pg_get_wal_records_info() to achieve what it does?

--
With Regards,
Ashutosh Sharma.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 11, 2022 at 8:08 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> > Attaching the v8 patch-set resolving above comments and some tests for
> > checking function permissions. Please review it further.
>
> I played with this a bit, and would like to share some thoughts on it.

Thanks a lot Kyotaro-san for reviewing.

> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.

Throwing error on future LSNs is the same behaviour for all of the
pg_walinspect function input LSNs. IMO it is a cleaner thing to do
rather than confuse the users with different behaviours for each
function. The principle is this - pg_walinspect functions can't show
future WAL info. Having said that, I agree to make it a WARNING
instead of ERROR, for the simple reason that ERROR aborts the txn and
the applications can retry without aborting the txn. For instance,
pg_terminate_backend emits a WARNING if the PID isn't a postgres
process id.

PS: WARNING may not be a better idea than ERROR if we turn
pg_get_wal_stats a SQL function, see my response below.

> The definition of end-lsn is fuzzy here.  If I fed a future LSN to the
> functions, they tell me the beginning of the current insertion point
> in error message.  On the other hand they don't accept the same
> value as end-LSN.  I think it is right that they tell the current
> insertion point and they should take the end-LSN as the LSN to stop
> reading.

The future LSN is determined by this:

if (!RecoveryInProgress())
available_lsn = GetFlushRecPtr(NULL);
else
available_lsn = GetXLogReplayRecPtr(NULL);

GetFlushRecPtr returns last byte + 1 flushed meaning this is the end
LSN currently known in the server, but it is not the start LSN of the
last WAL record in the server. Same goes with GetXLogReplayRecPtr
which gives lastReplayedEndRecPtr end+1 position. I picked
GetFlushRecPtr and GetXLogReplayRecPtr to determine the future WAL LSN
because this is how read_local_xlog_page determines to read WAL upto
and goes for wait mode, but I wanted to avoid the wait mode completely
for all the pg_walinspect functions (to keep things simple for now),
hence doing the similar checks within the input validation code and
emitting warning.

And you are right when we emit something like below, users tend to use
0/15B6D68 (from the DETAIL message) as the end LSN. I don't want to
ignore this DETAIL message altogether as it gives an idea where the
server is. How about rephrasing the DETAIL message a bit, something
like "Database system flushed the WAL up to WAL LSN %X/% X.'' or some
other better phrasing?

WARNING:  WAL start LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15B6D68.

If the users aren't sure about what's the end record LSN, they can
just use pg_get_wal_records_info and pg_get_wal_stats without end LSN:
select * from pg_get_wal_records_info('0/15B6D68');
select * from pg_get_wal_stats('0/15B6D68');

> I think pg_get_wal_stats() is worth to have but I think it should be
> implemented in SQL.  Currently pg_get_wal_records_info() doesn't tell
> about FPI since pg_waldump doesn't but it is internally collected (of
> course!) and easily revealed. If we do that, the
> pg_get_wal_records_stats() would be reduced to the following SQL
> statement
>
> SELECT resource_manager resmgr,
>        count(*) AS N,
>            (count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS "%N",
>            sum(total_length) AS "combined size",
>            (sum(total_length) * 100 / sum(sum(total_length)) OVER tot)::numeric(5,2) AS "%combined size",
>            sum(fpi_len) AS fpilen,
>            (sum(fpi_len) * 100 / sum(sum(fpi_len)) OVER tot)::numeric(5,2) AS "%fpilen"
>            FROM pg_get_wal_records_info('0/1000000', '0/175DD7f')
>            GROUP by resource_manager
>            WINDOW tot AS ()
>            ORDER BY "combined size" desc;
>
> The only difference with pg_waldump is the statement above doesn't
> show lines for the resource managers that don't contained in the
> result of pg_get_wal_records_info(). But I don't think that matters.

Yeah, this is better. One problem with the above is when
pg_get_wal_records_info emits a warning for future LSN. But this
shouldn't stop us doing it via SQL. Instead I would let all the
pg_walinspect functions emit errors as opposed to WARNING. Thoughts?

postgres=# SELECT resource_manager, count(*) AS count,
(count(*) * 100 / sum(count(*)) OVER tot)::numeric(5,2) AS count_percentage,
sum(total_length) AS combined_size,
(sum(total_length) * 100 / sum(sum(total_length)) OVER
tot)::numeric(5,2) AS combined_size_percentage
FROM pg_get_wal_records_info('0/10A3E50', '0/25B6F00')
GROUP BY resource_manager
WINDOW tot AS ()
ORDER BY combined_size desc;
WARNING:  WAL end LSN cannot be a future WAL LSN
DETAIL:  Last known WAL LSN on the database system is 0/15CAA70.
 resource_manager | count | count_percentage | combined_size |
combined_size_percentage
------------------+-------+------------------+---------------+--------------------------
                  |     1 |           100.00 |               |
(1 row)

> Sometimes the field description has very long (28kb long) content. It
> makes the result output almost unreadable and I had a bit hard time
> struggling with the output full of '-'s.  I would like have a default
> limit on the length of such fields that can be long but I'm not sure
> we want that.

Yeah, it's a text column, let's leave it as-is, if required users can
always ignore the description columns.

> And about pg_get_raw_wal_record(). I don't see any use-case of the
> function alone on SQL interface.  Even if we need to inspect broken
> WAL files, it needs profound knowledge of WAL format and tools that
> doesn't work on SQL interface.
>However like pageinspect, if we separate the WAL-record fetching and
> parsing it could be thought as useful.
> SELECT * FROM pg_walinspect_parse(raw)
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn));
>
> And pg_get_wal_stats woule be like:
>
> SELECT * FROM pg_walinpect_stat(pg_walinspect_parse(raw))
>  FROM (SELECT * FROM pg_walinspect_get_raw(start_lsn, end_lsn)));

Imagine pg_get_raw_wal_record function feeding raw WAL record to an
external tool/extension that understands the WAL. Apart from this, I
don't have a concrete reason either. I'm open to removing this
function as well and adding it along with the raw WAL parsing function
in future.

I haven't thought about the raw WAL parsing functions for now. In
fact, there are many functions we can add to pg_walinspect - functions
with wait mode for future WAL, WAL parsing, function to return all the
WAL record info/stats given a WAL file name, functions to return WAL
info/stats from historic timelines as well, function to see if the
given WAL file is valid and so on. We can park these functions for
future versions of pg_walinspect once the extension itself with basic
yet-useful-and-effective functions gets in. I will make a note of
these functions and will work in future based on how pg_walinspect
gets received by the users and community out there.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
Some comments on pg_walinspect-docc.patch this time:

+   <varlistentry>
+    <term>
+     <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn,
prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length
OUT int4, total_length OUT int4, description OUT text, block_ref OUT
text, data OUT bytea, data_len OUT int4)</function>
+    </term>

You may shorten this by mentioning just the function input parameters
and specify "returns record" like shown below. So no need to specify
all the OUT params.

pg_get_wal_record_info(in_lsn pg_lsn) returns record.

Please check the documentation for other functions for reference.

==

+    <term>
+     <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn
pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid,
resource_manager OUT text, length OUT int4, total_length OUT int4,
description OUT text, block_ref OUT text, data OUT bytea, data_len OUT
int4)</function>
+    </term>

Same comment applies here as well. In the return type you can just
mention - "returns setof record" like shown below:

pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records.

You may also check for such optimizations at other places. I might
have missed some.

==

+<screen>
+postgres=# select prev_lsn, xid, resource_manager, length,
total_length, block_ref from pg_get_wal_records_info('0/158A7F0',
'0/1591400');
+ prev_lsn  | xid | resource_manager | length | total_length |
                       block_ref

+-----------+-----+------------------+--------+--------------+--------------------------------------------------------------------------
+ 0/158A7B8 | 735 | Heap             |     54 |         7838 | blkref
#0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408
+ 0/158A7F0 | 735 | Btree            |     53 |         8133 | blkref
#0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112
+ 0/158C6A8 | 735 | Heap             |     53 |          873 | blkref
#0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372

Instead of specifying column names in the targetlist I think it's
better to use "*" so that it will display all the output columns. Also
you may shorten the gap between start and end lsn to reduce the output
size.

==

Any reason for not specifying author name in the .sgml file. Do you
want me to add my name to the author? :)

  <para>
   Ashutosh Sharma <email>ashu.coek88@gmail.com</email>
  </para>
 </sect2>

--
With Regards,
Ashutosh Sharma.

On Thu, Mar 10, 2022 at 10:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote:
> > >
> > > Attaching v6 patch set with above review comments addressed. Please
> > > review it further.
>
> Thanks Jeff for reviewing it. I've posted the latest v7 patch-set
> upthread [1] which is having more simple-yet-useful-and-effective
> functions.
>
> > * Don't issue WARNINGs or other messages for ordinary situations, like
> > when pg_get_wal_records_info() hits the end of WAL.
>
> v7 patch-set [1] has no warnings, but the functions will error out if
> future LSN is specified.
>
> > * It feels like the APIs that allow waiting for the end of WAL are
> > slightly off. Can't you just do pg_get_wal_records_info(start_lsn,
> > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting
> > behavior? Try to make the API more orthogonal, where a few basic
> > functions can be combined to give you everything you need, rather than
> > specifying extra parameters and issuing WARNINGs. I
>
> v7 patch-set [1] onwards waiting mode has been removed for all of the
> functions, again to keep things simple-yet-useful-and-effective.
> However, we can always add new pg_walinspect functions that wait for
> future WAL in the next versions once basic stuff gets committed and if
> many users ask for it.
>
> > * In the docs, include some example output. I don't see any output in
> > the tests, which makes sense because it's mostly non-deterministic, but
> > it would be helpful to see sample output of at least
> > pg_get_wal_records_info().
>
> +1. Added for pg_get_wal_records_info and pg_get_wal_stats.
>
> > * Is pg_get_wal_stats() even necessary, or can you get the same
> > information with a query over pg_get_wal_records_info()? For instance,
> > if you want to group by transaction ID rather than rmgr, then
> > pg_get_wal_stats() is useless.
>
> Yes, you are right pg_get_wal_stats provides WAL stats per resource
> manager which is similar to pg_waldump with --start, --end and --stats
> option. It provides more information than pg_get_wal_records_info and
> is a good way of getting stats than adding more columns to
> pg_get_wal_records_info, calculating percentage in sql and having
> group by clause. IMO, pg_get_wal_stats is more readable and useful.
>
> > * Would be nice to have a pg_wal_file_is_valid() or similar, which
> > would test that it exists, and the header matches the filename (e.g. if
> > it was recycled but not used, that would count as invalid). I think
> > pg_get_first_valid_wal_record_lsn() would make some cases look invalid
> > even if the file is valid -- for example, if a wal record spans many
> > wal segments, the segments might look invalid because they contain no
> > complete records, but the file itself is still valid and contains valid
> > wal data.
>
> Actually I haven't tried testing a single WAL record spanning many WAL
> files yet(I'm happy to try it if someone suggests such a use-case). In
> that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't
> have a problem because it just gives the next valid LSN and it's
> previous LSN using existing WAL reader API XLogFindNextRecord(). It
> opens up the WAL file segments using (some dots to connect -
> page_read/read_local_xlog_page, WALRead,
> segment_open/wal_segment_open). Thoughts?
>
> I don't think it's necessary to have a function pg_wal_file_is_valid()
> that given a WAL file name as input checks whether a WAL file exists
> or not, probably not in the core (xlogfuncs.c) too. These kinds of
> functions can open up challenges in terms of user input validation and
> may cause unnecessary problems, please see some related discussion
> [2].
>
> > * Is there a reason you didn't include the timeline ID in
> > pg_get_wal_records_info()?
>
> I'm right now allowing the functions to read WAL from the current
> server's timeline which I have mentioned in the docs. The server's
> current timeline is available via pg_control_checkpoint()'s
> timeline_id. So, having timeline_id as a column doesn't make sense.
> Again this is to keep things simple-yet-useful-and-effective. However,
> we can add new pg_walinspect functions to read WAL from historic as
> well as current timelines in the next versions once basic stuff gets
> committed and if many users ask for it.
>
> + <para>
> +  All the functions of this module will provide the WAL information using the
> +  current server's timeline ID.
> + </para>
>
> > * Can we mark this extension 'trusted'? I'm not 100% clear on the
> > standards for that marker, but it seems reasonable for a database owner
> > with the right privileges might want to install it.
>
> 'trusted' extensions concept is added by commit 50fc694 [3]. Since
> pg_walinspect deals with WAL, we strictly want to control who creates
> and can execute functions exposed by it, so I don't know if 'trusted'
> is a good idea here. Also, pageinspect isn't a 'trusted' extension.
>
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
>
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is. However, does it sound good if some functions are allowed to be
> executed by users with a pg_monitor role and others
> pg_get_raw_wal_record by users with pg_read_server_files? Since the
> extension itself can be created by superusers, isn't the
> pg_get_raw_wal_record sort of safe with pg_mointor itself?
>
>  If hackers don't agree, I'm happy to grant execution on
> pg_get_raw_wal_record() to the pg_read_server_files role.
>
> Attaching the v8 patch-set resolving above comments and some tests for
> checking function permissions. Please review it further.
>
> [1] https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com
> [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Wed Jan 29 18:42:43 2020 -0500
>
>     Invent "trusted" extensions, and remove the pg_pltemplate catalog.
>
> Regards,
> Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 11, 2022 at 8:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> >
> - The difference between pg_get_wal_record_info and _records_ other than
> - the number of argument is the former accepts incorrect LSNs.
>
> The discussion is somewhat confused after some twists and turns..  It
> should be something like the following.
>
> pg_get_wal_record_info and pg_get_wal_records_info are almost same
> since the latter can show a single record.  However it is a bit
> annoying to do that. Since, other than it doens't accept same LSNs for
> start and end, it doesn't show a record when there' no record in the
> specfied LSN range.  But I don't think there's no usefulness of the
> behavior.

I would like to reassert the usability of pg_get_wal_record_info and
pg_get_wal_records_info:

pg_get_wal_record_info(lsn):
if lsn is invalid i.e. '0/0' - throws an error
if lsn is future lsn - throws an error
if lsn looks okay, it figures out the next available valid WAL record
and returns info about that

pg_get_wal_records_info(start_lsn, end_lsn default null) -> if start
and end lsns are provided  no end_lsn would give the WAL records info
till the end of WAL,
if start_lsn is invalid i.e. '0/0' - throws an error
if start_lsn is future lsn - throws an error
if end_lsn isn't provided by the user - calculates the end_lsn as
server's current flush lsn
if end_lsn is provided by the user - throws an error if it's future LSN
if start_lsn and end_lsn look okay, it returns info about all WAL
records from the next available valid WAL record of start_lsn until
end_lsn

So, both pg_get_wal_record_info and pg_get_wal_records_info are necessary IMHO.

Coming to the behaviour when input lsn is '0/1000000', it's an issue
with XLogSegmentOffset(lsn, wal_segment_size) != 0 check, which I will
fix in the next version.

    if (*first_record != lsn && XLogSegmentOffset(lsn, wal_segment_size) != 0)
        ereport(WARNING,
                (errmsg_plural("first record is after %X/%X, at %X/%X,
skipping over %u byte",
                               "first record is after %X/%X, at %X/%X,
skipping over %u bytes",
                               (*first_record - lsn),
                               LSN_FORMAT_ARGS(lsn),
                               LSN_FORMAT_ARGS(*first_record),
                               (uint32) (*first_record - lsn))));

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Robert Haas
Дата:
On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> It seems to me too rigorous that pg_get_wal_records_info/stats()
> reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> the real end-of-WAL is more kind to users.  I think the same with the
> restriction that start and end LSN are required to be different.

In his review just yesterday, Jeff suggested this: "Don't issue
WARNINGs or other messages for ordinary situations, like when
pg_get_wal_records_info() hits the end of WAL." I think he's entirely
right, and I don't think any patch that does otherwise should get
committed. It is worth remembering that the results of queries are
often examined by something other than a human being sitting at a psql
terminal. Any tool that uses this is going to want to understand what
happened from the result set, not by parsing strings that may show up
inside warning messages.

I think that the right answer here is to have a function that returns
one row per record parsed, and each row should also include the start
and end LSN of the record. If for some reason the WAL records return
start after the specified start LSN (e.g. because we skip over a page
header) or end before the specified end LSN (e.g. because we reach
end-of-WAL) the user can figure it out from looking at the LSNs in the
output rows and comparing them to the LSNs provided as input.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeff Davis
Дата:
On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote:
> The standard is basically that all of the functions it brings are
> written to enforce the PG privilege system and you aren't able to use
> the extension to bypass those privileges.  In some cases that means
> that

Every extension should follow that standard, right? If it doesn't (e.g.
creating dangerous functions and granting them to public), then even
superuser should not install it.

> the C-language functions installed have if(!superuser) ereport()
> calls

I'm curious why not rely on the grant system where possible? I thought
we were trying to get away from explicit superuser checks.

> I've not looked back on this thread, but I'd expect pg_walinspect to
> need those superuser checks and with those it *could* be marked as
> trusted, but that again brings into question how useful it is to mark
> it
> thusly.

As long as any functions are safely accessible to public or a
predefined role, there is some utility for the 'trusted' marker.

As this patch is currently written, pg_monitor has access these
functions, though I don't think that's the right privilege level at
least for pg_get_raw_wal_record().

> I certainly don't think we should allow either database owners or
> regular users on a system the ability to access the WAL traffic of
> the
> entire system.

Agreed. That was not what I intended by asking if it should be marked
'trusted'. The marker only allows the non-superuser to run the CREATE
EXTENSION command; it's up to the extension script to decide whether
any non-superusers can do anything at all with the extension.

>   More forcefully- we should *not* be throwing more access
> rights towards $owners in general and should be thinking about how we
> can allow admins, providers, whomever, the ability to control what
> rights users are given.  If they're all lumped under 'owner' then
> there's no way for people to provide granular access to just those
> things they wish and intend to.

Not sure I understand, but that sounds like a larger discussion.

Regards,
    Jeff Davis





Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 11, 2022 at 7:53 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Some comments on pg_walinspect-docc.patch this time:
>
> +   <varlistentry>
> +    <term>
> +     <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn,
> prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length
> OUT int4, total_length OUT int4, description OUT text, block_ref OUT
> text, data OUT bytea, data_len OUT int4)</function>
> +    </term>
>
> You may shorten this by mentioning just the function input parameters
> and specify "returns record" like shown below. So no need to specify
> all the OUT params.
>
> pg_get_wal_record_info(in_lsn pg_lsn) returns record.
>
> Please check the documentation for other functions for reference.
>
> ==
>
> +    <term>
> +     <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn
> pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid,
> resource_manager OUT text, length OUT int4, total_length OUT int4,
> description OUT text, block_ref OUT text, data OUT bytea, data_len OUT
> int4)</function>
> +    </term>
>
> Same comment applies here as well. In the return type you can just
> mention - "returns setof record" like shown below:
>
> pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records.
>
> You may also check for such optimizations at other places. I might
> have missed some.

I used the way verify_heapam shows the columns as it looks good IMO
and we can't show sample outputs for all of the functions in the
documentation.

> ==
>
> +<screen>
> +postgres=# select prev_lsn, xid, resource_manager, length,
> total_length, block_ref from pg_get_wal_records_info('0/158A7F0',
> '0/1591400');
> + prev_lsn  | xid | resource_manager | length | total_length |
>                        block_ref
>
+-----------+-----+------------------+--------+--------------+--------------------------------------------------------------------------
> + 0/158A7B8 | 735 | Heap             |     54 |         7838 | blkref
> #0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408
> + 0/158A7F0 | 735 | Btree            |     53 |         8133 | blkref
> #0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112
> + 0/158C6A8 | 735 | Heap             |     53 |          873 | blkref
> #0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372
>
> Instead of specifying column names in the targetlist I think it's
> better to use "*" so that it will display all the output columns. Also
> you may shorten the gap between start and end lsn to reduce the output
> size.

All columns are giving huge output, especially because of data and
description columns hence I'm not showing them in the sample output.

> ==
>
> Any reason for not specifying author name in the .sgml file. Do you
> want me to add my name to the author? :)

Haha. Thanks. I will add in the v9 patch set which I will post in a while.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Sat, Mar 12, 2022 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
>
> In his review just yesterday, Jeff suggested this: "Don't issue
> WARNINGs or other messages for ordinary situations, like when
> pg_get_wal_records_info() hits the end of WAL." I think he's entirely
> right, and I don't think any patch that does otherwise should get
> committed. It is worth remembering that the results of queries are
> often examined by something other than a human being sitting at a psql
> terminal. Any tool that uses this is going to want to understand what
> happened from the result set, not by parsing strings that may show up
> inside warning messages.
>
> I think that the right answer here is to have a function that returns
> one row per record parsed, and each row should also include the start
> and end LSN of the record. If for some reason the WAL records return
> start after the specified start LSN (e.g. because we skip over a page
> header) or end before the specified end LSN (e.g. because we reach
> end-of-WAL) the user can figure it out from looking at the LSNs in the
> output rows and comparing them to the LSNs provided as input.

Thanks Robert. I've removed the WARNING part and added end_lsn as suggested.

Thanks Kyotaro-san, Ashutosh and Jeff for your review. I tried to
address your review comments, if not all, but many.

Here's v9 patch-set please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Fri, 11 Mar 2022 15:39:13 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Thu, Mar 10, 2022 at 9:38 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > It seems to me too rigorous that pg_get_wal_records_info/stats()
> > reject future LSNs as end-LSN and I think WARNING or INFO and stop at
> > the real end-of-WAL is more kind to users.  I think the same with the
> > restriction that start and end LSN are required to be different.
> 
> In his review just yesterday, Jeff suggested this: "Don't issue
> WARNINGs or other messages for ordinary situations, like when
> pg_get_wal_records_info() hits the end of WAL." I think he's entirely
> right, and I don't think any patch that does otherwise should get

It depends on what we think is the "ordinary" here.  If we don't
expect that specified LSN range is not filled-out, the case above is
ordinary and no need for any WARNING nor INFO.  I'm fine with that
definition here.

> committed. It is worth remembering that the results of queries are
> often examined by something other than a human being sitting at a psql
> terminal. Any tool that uses this is going to want to understand what
> happened from the result set, not by parsing strings that may show up
> inside warning messages.

Right. I don't think it is right that WARNING is required to evaluate
the result. And I think that the WARNING like 'reached end-of-wal
before end LSN' is a kind that is not required in evaluation of the
result. Since each WAL row contains at least start LSN.

> I think that the right answer here is to have a function that returns
> one row per record parsed, and each row should also include the start
> and end LSN of the record. If for some reason the WAL records return
> start after the specified start LSN (e.g. because we skip over a page
> header) or end before the specified end LSN (e.g. because we reach
> end-of-WAL) the user can figure it out from looking at the LSNs in the
> output rows and comparing them to the LSNs provided as input.

I agree with you here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Stephen Frost
Дата:
Greetings,

* Jeff Davis (pgsql@j-davis.com) wrote:
> On Thu, 2022-03-10 at 15:54 -0500, Stephen Frost wrote:
> > The standard is basically that all of the functions it brings are
> > written to enforce the PG privilege system and you aren't able to use
> > the extension to bypass those privileges.  In some cases that means
> > that
>
> Every extension should follow that standard, right? If it doesn't (e.g.
> creating dangerous functions and granting them to public), then even
> superuser should not install it.

Every extension that's intended to be installed on a multi-user PG
system where the admin cares about security in the database, sure.  I
disagree that this applies universally to every system or every
extension.  Those are standards that modules we distribute in contrib
should meet but I don't know that we necessarily have to have them for,
say, modules in test.

> > the C-language functions installed have if(!superuser) ereport()
> > calls
>
> I'm curious why not rely on the grant system where possible? I thought
> we were trying to get away from explicit superuser checks.

We don't yet have capabilities for everything.  I agree that we should
be getting away from explicit superuser checks and explained below how
we might be able to in this case.

> > I've not looked back on this thread, but I'd expect pg_walinspect to
> > need those superuser checks and with those it *could* be marked as
> > trusted, but that again brings into question how useful it is to mark
> > it
> > thusly.
>
> As long as any functions are safely accessible to public or a
> predefined role, there is some utility for the 'trusted' marker.

I'm not sure that I agree, though I'm also not sure that it's a useful
thing to debate.  Still, if all of the functions in a particular
extension have explicit if (!superuser) ereport() checks in them, then
while installing it is 'safe' and it could be marked as 'trusted'
there's very little point in doing so as the only person who can get
anything useful from those functions is a superuser, and a superuser can
install non-trusted extensions anyway.  How is it useful to mark such an
extension as 'trusted'?

> As this patch is currently written, pg_monitor has access these
> functions, though I don't think that's the right privilege level at
> least for pg_get_raw_wal_record().

Yeah, I agree that pg_monitor isn't the right thing for such a function
to be checking.

> > I certainly don't think we should allow either database owners or
> > regular users on a system the ability to access the WAL traffic of
> > the
> > entire system.
>
> Agreed. That was not what I intended by asking if it should be marked
> 'trusted'. The marker only allows the non-superuser to run the CREATE
> EXTENSION command; it's up to the extension script to decide whether
> any non-superusers can do anything at all with the extension.

Sure.

> >   More forcefully- we should *not* be throwing more access
> > rights towards $owners in general and should be thinking about how we
> > can allow admins, providers, whomever, the ability to control what
> > rights users are given.  If they're all lumped under 'owner' then
> > there's no way for people to provide granular access to just those
> > things they wish and intend to.
>
> Not sure I understand, but that sounds like a larger discussion.

The point I was trying to make is that it's better to move in the
direction of things like pg_read_all_data rather than just declaring
that the owner of a database can read all of the tables in that
database, as an example.  Maybe we want to implicitly have such
privilege for the owner of the database too, but we should first make it
something that's able to be GRANT'd out to non-owners so that it's not
required that all of those privileges be given out together at once.

Note that to be considered an 'owner' of an object you have to be a
member of the role that owns the object, which means that said role is
necessarily able to also become the owning role too.  Lumping lots of
privileges together- all the rights that being an 'owner' of the object
conveys, plus the ability to also become that role directly and do
things as that role, works actively against the general idea of 'least
privilege'.

Thanks,

Stephen

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> > As this patch is currently written, pg_monitor has access these
> > functions, though I don't think that's the right privilege level at
> > least for pg_get_raw_wal_record().
>
> Yeah, I agree that pg_monitor isn't the right thing for such a function
> to be checking.

On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> function should require pg_read_server_files? Or at least
> pg_read_all_data?

The v9 patch set posted at [1] grants execution on
pg_get_raw_wal_record() to the pg_monitor role.

pg_read_all_data may not be the right choice, but pg_read_server_files
is as these functions do read the WAL files on the server. If okay,
I'm happy to grant execution on pg_get_raw_wal_record() to the
pg_read_server_files role.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> >
> > > As this patch is currently written, pg_monitor has access these
> > > functions, though I don't think that's the right privilege level at
> > > least for pg_get_raw_wal_record().
> >
> > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > to be checking.
>
> On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > function should require pg_read_server_files? Or at least
> > pg_read_all_data?
>
> The v9 patch set posted at [1] grants execution on
> pg_get_raw_wal_record() to the pg_monitor role.
>
> pg_read_all_data may not be the right choice, but pg_read_server_files
> is as these functions do read the WAL files on the server. If okay,
> I'm happy to grant execution on pg_get_raw_wal_record() to the
> pg_read_server_files role.
>
> Thoughts?
>
> [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com

Attaching v10 patch set which allows pg_get_raw_wal_record to be
executed by either superuser or users with pg_read_server_files role,
no other change from v9 patch set.

Please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Stephen Frost
Дата:
Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> > >
> > > > As this patch is currently written, pg_monitor has access these
> > > > functions, though I don't think that's the right privilege level at
> > > > least for pg_get_raw_wal_record().
> > >
> > > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > > to be checking.
> >
> > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > >
> > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > > function should require pg_read_server_files? Or at least
> > > pg_read_all_data?
> >
> > The v9 patch set posted at [1] grants execution on
> > pg_get_raw_wal_record() to the pg_monitor role.
> >
> > pg_read_all_data may not be the right choice, but pg_read_server_files
> > is as these functions do read the WAL files on the server. If okay,
> > I'm happy to grant execution on pg_get_raw_wal_record() to the
> > pg_read_server_files role.
> >
> > Thoughts?
> >
> > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com
>
> Attaching v10 patch set which allows pg_get_raw_wal_record to be
> executed by either superuser or users with pg_read_server_files role,
> no other change from v9 patch set.

In a quick look, that seems reasonable to me.  If folks want to give out
access to this function individually they're also able to do so, which
is good.  Doesn't seem worthwhile to introduce a new predefined role for
this one function.

Thanks,

Stephen

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Ashutosh Sharma
Дата:
I can see that the pg_get_wal_records_info function shows the details
of the WAL record whose existence is beyond the user specified
stop/end lsn pointer. See below:

ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
'0/01000029');
-[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
start_lsn        | 0/1000028
end_lsn          | 0/100009F
prev_lsn         | 0/0
xid              | 0
resource_manager | XLOG
record_length    | 114
fpi_length       | 0
description      | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli
1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB
1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0;
oldest running xid 0; shutdown
block_ref        |
data_length      | 88
data             |

\x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000

In this case, the end lsn pointer specified by the user is
'0/01000029'. There is only one WAL record which starts before this
specified end lsn pointer whose start pointer is at 01000028, but that
WAL record ends at 0/100009F which is way beyond the specified end
lsn. So, how come we are able to display the complete WAL record info?
AFAIU, end lsn is the lsn pointer where you need to stop reading the
WAL data. If that is true, then there exists no valid WAL record
between the start and end lsn in this particular case.

--
With Regards,
Ashutosh Sharma.

On Wed, Mar 16, 2022 at 7:56 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > On Tue, Mar 15, 2022 at 7:21 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Mon, Mar 14, 2022 at 8:25 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > >
> > > > > As this patch is currently written, pg_monitor has access these
> > > > > functions, though I don't think that's the right privilege level at
> > > > > least for pg_get_raw_wal_record().
> > > >
> > > > Yeah, I agree that pg_monitor isn't the right thing for such a function
> > > > to be checking.
> > >
> > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > > >
> > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that
> > > > function should require pg_read_server_files? Or at least
> > > > pg_read_all_data?
> > >
> > > The v9 patch set posted at [1] grants execution on
> > > pg_get_raw_wal_record() to the pg_monitor role.
> > >
> > > pg_read_all_data may not be the right choice, but pg_read_server_files
> > > is as these functions do read the WAL files on the server. If okay,
> > > I'm happy to grant execution on pg_get_raw_wal_record() to the
> > > pg_read_server_files role.
> > >
> > > Thoughts?
> > >
> > > [1] https://www.postgresql.org/message-id/CALj2ACVRH-z8mZLyFkpLvY4qRhxQCqU_BLkFTtwt%2BTPZNhfEVg%40mail.gmail.com
> >
> > Attaching v10 patch set which allows pg_get_raw_wal_record to be
> > executed by either superuser or users with pg_read_server_files role,
> > no other change from v9 patch set.
>
> In a quick look, that seems reasonable to me.  If folks want to give out
> access to this function individually they're also able to do so, which
> is good.  Doesn't seem worthwhile to introduce a new predefined role for
> this one function.
>
> Thanks,
>
> Stephen



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Wed, 16 Mar 2022 20:49:12 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in 
> I can see that the pg_get_wal_records_info function shows the details
> of the WAL record whose existence is beyond the user specified
> stop/end lsn pointer. See below:
> 
> ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
> '0/01000029');
> -[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> start_lsn        | 0/1000028
> end_lsn          | 0/100009F
> prev_lsn         | 0/0
...
> record_length    | 114
...
> In this case, the end lsn pointer specified by the user is
> '0/01000029'. There is only one WAL record which starts before this
> specified end lsn pointer whose start pointer is at 01000028, but that
> WAL record ends at 0/100009F which is way beyond the specified end
> lsn. So, how come we are able to display the complete WAL record info?
> AFAIU, end lsn is the lsn pointer where you need to stop reading the
> WAL data. If that is true, then there exists no valid WAL record
> between the start and end lsn in this particular case.

You're right considering how pg_waldump behaves. pg_waldump works
almost the way as you described above.  The record above actually ends
at 1000099 and pg_waldump shows that record by specifying -s 0/1000028
-e 0/100009a, but not for -e 0/1000099.

# I personally think the current behavior is fine, though..


It still suggests unspecifiable end-LSN..

> select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060');
> ERROR:  cannot accept future end LSN
> DETAIL:  Last known WAL LSN on the database system is 4/4C000060.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Mar 16, 2022 at 8:49 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> I can see that the pg_get_wal_records_info function shows the details
> of the WAL record whose existence is beyond the user specified
> stop/end lsn pointer. See below:
>
> ashu@postgres=# select * from pg_get_wal_records_info('0/01000028',
> '0/01000029');
> -[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> start_lsn        | 0/1000028
> end_lsn          | 0/100009F
> prev_lsn         | 0/0
> xid              | 0
> resource_manager | XLOG
> record_length    | 114
> fpi_length       | 0
> description      | CHECKPOINT_SHUTDOWN redo 0/1000028; tli 1; prev tli
> 1; fpw true; xid 0:3; oid 10000; multi 1; offset 0; oldest xid 3 in DB
> 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0;
> oldest running xid 0; shutdown
> block_ref        |
> data_length      | 88
> data             |
>
\x28000001000000000100000001000000010000000000000003000000000000001027000001000000000000000300000001000000010000000100000072550000a5c4316200000000000000000000000000000000ff7f0000
>
> In this case, the end lsn pointer specified by the user is
> '0/01000029'. There is only one WAL record which starts before this
> specified end lsn pointer whose start pointer is at 01000028, but that
> WAL record ends at 0/100009F which is way beyond the specified end
> lsn. So, how come we are able to display the complete WAL record info?
> AFAIU, end lsn is the lsn pointer where you need to stop reading the
> WAL data. If that is true, then there exists no valid WAL record
> between the start and end lsn in this particular case.

Thanks Ashutosh, it's an edge case and I don't think we would want to
show a WAL record that ends at LSN after the user specified end-lsn
which doesn't look good. I fixed it in the v11 patch set. Now, the
pg_get_wal_records_info will show records only upto user specified
end_lsn, it doesn't show the last record which starts at LSN < end_lsn
but ends at LSN > end_lsn, see [1].

Please review the v11 patch set further.

[1]
postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/01000029');
 start_lsn | end_lsn | prev_lsn
-----------+---------+----------
(0 rows)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/100009F');
 start_lsn |  end_lsn  | prev_lsn
-----------+-----------+----------
 0/1000028 | 0/100009F | 0/0
(1 row)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/10000A0');
 start_lsn |  end_lsn  | prev_lsn
-----------+-----------+----------
 0/1000028 | 0/100009F | 0/0
(1 row)

postgres=# select start_lsn, end_lsn, prev_lsn from
pg_get_wal_records_info('0/01000028', '0/0100009E');
 start_lsn | end_lsn | prev_lsn
-----------+---------+----------
(0 rows)

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Mar 17, 2022 at 10:48 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> It still suggests unspecifiable end-LSN..
>
> > select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060');
> > ERROR:  cannot accept future end LSN
> > DETAIL:  Last known WAL LSN on the database system is 4/4C000060.

Thanks Kyotaro-san. We can change the detail message to show (current
flush lsn/last replayed lsn - 1), that's what I've done in v11 posted
upthread at [1]. The problem is that all the pg_walinspect functions
would wait for the first valid record in read_local_xlog_page() via
InitXLogReaderState()->XLogFindNextRecord(), see[2].

We have two things to do:
1) Just document the behaviour "pg_walinspect functions will wait for
the first valid WAL record if there is none found after the specified
input LSN/start LSN.". This seems easier but some may see it as a
problem.
2) Have read_local_xlog_page_2 which doesn't wait for future WAL LSN
unlike read_local_xlog_page and like pg_waldump's WALDumpReadPage. It
requires a new function read_local_xlog_page_2 that almost looks like
read_local_xlog_page except wait (pg_usleep) loop, we can avoid code
duplication by moving the read_local_xlog_page code to a static
function read_local_xlog_page_guts(existing params, bool wait):

read_local_xlog_page(params)
 read_local_xlog_page_guts(existing params, false);

read_local_xlog_page_2(params)
  read_local_xlog_page_guts(existing params, true);

read_local_xlog_page_guts:
  if (wait) wait for future wal; ---> existing pg_usleep code in
read_local_xlog_page.
  else return;

I'm fine either way, please let me know your thoughts on this?

[1] https://www.postgresql.org/message-id/CALj2ACU8XjbYbMwh5x6hEUJdpRoG9%3DPO52_tuOSf1%3DMO7WtsmQ%40mail.gmail.com
[2]
postgres=# select pg_current_wal_flush_lsn();
 pg_current_wal_flush_lsn
--------------------------
 0/1624430
(1 row)

postgres=# select * from pg_get_wal_record_info('0/1624430');
ERROR:  cannot accept future input LSN
DETAIL:  Last known WAL LSN on the database system is 0/162442F.
postgres=# select * from pg_get_wal_record_info('0/162442f');   --->
waits for the first valid record in read_local_xlog_page.

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Nitin Jadhav
Дата:
Hi Bharath,

Due to recent commits on master, the pg_walinpect module is not
compiling. Kindly update the patch.

pg_walinspect.c: In function ‘GetXLogRecordInfo’:
pg_walinspect.c:362:39: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘max_block_id’
  362 |  for (block_id = 0; block_id <= record->max_block_id; block_id++)
      |                                       ^~
pg_walinspect.c:382:29: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  382 |    uint8  bimg_info = record->blocks[block_id].bimg_info;
      |                             ^~
pg_walinspect.c:385:21: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  385 |    fpi_len += record->blocks[block_id].bimg_len;
      |                     ^~
pg_walinspect.c:402:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  402 |          record->blocks[block_id].hole_offset,
      |                ^~
pg_walinspect.c:403:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  403 |          record->blocks[block_id].hole_length,
      |                ^~
pg_walinspect.c:405:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  405 |          record->blocks[block_id].hole_length -
      |                ^~
pg_walinspect.c:406:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  406 |          record->blocks[block_id].bimg_len,
      |                ^~
pg_walinspect.c:414:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  414 |          record->blocks[block_id].hole_offset,
      |                ^~
pg_walinspect.c:415:16: error: ‘XLogReaderState’ {aka ‘struct
XLogReaderState’} has no member named ‘blocks’
  415 |          record->blocks[block_id].hole_length);
      |                ^~
make: *** [../../src/Makefile.global:941: pg_walinspect.o] Error 1

Thanks & Regards,
Nitin Jadhav

On Thu, Mar 17, 2022 at 1:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 10:48 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > It still suggests unspecifiable end-LSN..
> >
> > > select * from pg_get_wal_records_info('4/4B28EB68', '4/4C000060');
> > > ERROR:  cannot accept future end LSN
> > > DETAIL:  Last known WAL LSN on the database system is 4/4C000060.
>
> Thanks Kyotaro-san. We can change the detail message to show (current
> flush lsn/last replayed lsn - 1), that's what I've done in v11 posted
> upthread at [1]. The problem is that all the pg_walinspect functions
> would wait for the first valid record in read_local_xlog_page() via
> InitXLogReaderState()->XLogFindNextRecord(), see[2].
>
> We have two things to do:
> 1) Just document the behaviour "pg_walinspect functions will wait for
> the first valid WAL record if there is none found after the specified
> input LSN/start LSN.". This seems easier but some may see it as a
> problem.
> 2) Have read_local_xlog_page_2 which doesn't wait for future WAL LSN
> unlike read_local_xlog_page and like pg_waldump's WALDumpReadPage. It
> requires a new function read_local_xlog_page_2 that almost looks like
> read_local_xlog_page except wait (pg_usleep) loop, we can avoid code
> duplication by moving the read_local_xlog_page code to a static
> function read_local_xlog_page_guts(existing params, bool wait):
>
> read_local_xlog_page(params)
>  read_local_xlog_page_guts(existing params, false);
>
> read_local_xlog_page_2(params)
>   read_local_xlog_page_guts(existing params, true);
>
> read_local_xlog_page_guts:
>   if (wait) wait for future wal; ---> existing pg_usleep code in
> read_local_xlog_page.
>   else return;
>
> I'm fine either way, please let me know your thoughts on this?
>
> [1] https://www.postgresql.org/message-id/CALj2ACU8XjbYbMwh5x6hEUJdpRoG9%3DPO52_tuOSf1%3DMO7WtsmQ%40mail.gmail.com
> [2]
> postgres=# select pg_current_wal_flush_lsn();
>  pg_current_wal_flush_lsn
> --------------------------
>  0/1624430
> (1 row)
>
> postgres=# select * from pg_get_wal_record_info('0/1624430');
> ERROR:  cannot accept future input LSN
> DETAIL:  Last known WAL LSN on the database system is 0/162442F.
> postgres=# select * from pg_get_wal_record_info('0/162442f');   --->
> waits for the first valid record in read_local_xlog_page.
>
> Regards,
> Bharath Rupireddy.
>
>



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Andres Freund
Дата:
Hi,

First look at this patch, so I might be repeating stuff already commented on /
discussed.

On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> +--
> +-- pg_get_raw_wal_record()
> +--
> +CREATE FUNCTION pg_get_raw_wal_record(IN in_lsn pg_lsn,
> +    OUT start_lsn pg_lsn,
> +    OUT end_lsn pg_lsn,
> +    OUT prev_lsn pg_lsn,
> +    OUT record_length int4,
> +    OUT record bytea
> +)
> +AS 'MODULE_PATHNAME', 'pg_get_raw_wal_record'
> +LANGUAGE C CALLED ON NULL INPUT PARALLEL SAFE;

What is raw about the function?

Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
NULL lsn, does it?  Also, that's the default, why is it restated here?


> +REVOKE EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) FROM PUBLIC;
> +GRANT EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) TO pg_read_server_files;
> +
> +--
> +-- pg_get_wal_record_info()
> +--
> +CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
> +    OUT start_lsn pg_lsn,
> +    OUT end_lsn pg_lsn,
> +    OUT prev_lsn pg_lsn,
> +    OUT xid xid,
> +    OUT resource_manager text,
> +    OUT record_length int4,
> +    OUT fpi_length int4,
> +    OUT description text,
> +    OUT block_ref text,
> +    OUT data_length int4,
> +    OUT data bytea
> +)
> +AS 'MODULE_PATHNAME', 'pg_get_wal_record_info'
> +LANGUAGE C CALLED ON NULL INPUT PARALLEL SAFE;
> +
> +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;

I don't think it's appropriate for pg_monitor to see all the data in the WAL.

> +--
> +-- pg_get_wal_stats()
> +--
> +CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn,
> +    IN end_lsn pg_lsn DEFAULT NULL,
> +    OUT resource_manager text,
> +    OUT count int8,
> +    OUT count_percentage float4,
> +    OUT record_length int8,
> +    OUT record_length_percentage float4,
> +    OUT fpi_length int8,
> +    OUT fpi_length_percentage float4
> +    )
> +RETURNS SETOF record AS $$
> +SELECT resource_manager,
> +    count(*) AS cnt,
> +    CASE WHEN count(*) > 0 THEN (count(*) * 100 / sum(count(*)) OVER total)::numeric(5,2) ELSE 0 END AS "count_%",
> +    sum(record_length) AS trecl,
> +    CASE WHEN sum(record_length) > 0 THEN (sum(record_length) * 100 / sum(sum(record_length)) OVER
total)::numeric(5,2)ELSE 0 END AS "trecl_%",
 
> +    sum(fpi_length) AS tfpil,
> +    CASE WHEN sum(fpi_length) > 0 THEN (sum(fpi_length) * 100 / sum(sum(fpi_length)) OVER total)::numeric(5,2) ELSE
0END AS "tfpil_%"
 
> +FROM pg_get_wal_records_info(start_lsn, end_lsn)
> +GROUP BY resource_manager
> +WINDOW total AS ();
> +$$ LANGUAGE SQL CALLED ON NULL INPUT PARALLEL SAFE;

This seems like an exceedingly expensive way to compute this. Not just because
of doing the grouping, window etc, but also because it's serializing the
"data" field from pg_get_wal_records_info() just to never use it. With any
appreciatable amount of data the return value pg_get_wal_records_info() will
be serialized into a on-disk tuplestore.

This is probably close to an order of magnitude slower than pg_waldump
--stats. Which imo renders this largely useless.

The column names don't seem great either. "tfpil"?


> +/*
> + * Module load callback.
> + */
> +void
> +_PG_init(void)
> +{
> +    /* Define custom GUCs and install hooks here, if any. */
> +
> +    /*
> +     * Have EmitWarningsOnPlaceholders("pg_walinspect"); if custom GUCs are
> +     * defined.
> +     */
> +}
> +
> +/*
> + * Module unload callback.
> + */
> +void
> +_PG_fini(void)
> +{
> +    /* Uninstall hooks, if any. */
> +}

Why have this stuff if it's not used?


> +/*
> + * Validate given LSN and return the LSN up to which the server has WAL.
> + */
> +static XLogRecPtr
> +ValidateInputLSN(XLogRecPtr lsn)
> +{
> +    XLogRecPtr curr_lsn;
> +
> +    /* Validate input WAL LSN. */
> +    if (XLogRecPtrIsInvalid(lsn))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid WAL LSN")));
> +
> +    /*
> +     * We determine the current LSN of the server similar to how page_read
> +     * callback read_local_xlog_page does.
> +     */
> +    if (!RecoveryInProgress())
> +        curr_lsn = GetFlushRecPtr(NULL);
> +    else
> +        curr_lsn = GetXLogReplayRecPtr(NULL);
> +
> +    Assert(!XLogRecPtrIsInvalid(curr_lsn));
> +
> +    if (lsn >= curr_lsn)
> +    {
> +        /*
> +          * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last
> +         * record flushed or replayed respectively. But let's use the LSN up
> +         * to "end" in user facing message.
> +          */
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("cannot accept future input LSN"),
> +                 errdetail("Last known WAL LSN on the database system is %X/%X.",
> +                           LSN_FORMAT_ARGS(curr_lsn - 1))));
> +    }

> +    return curr_lsn;
> +}
> +
> +/*
> + * Validate given start LSN and end LSN, return the new end LSN in case user
> + * hasn't specified one.
> + */
> +static XLogRecPtr
> +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn)
> +{
> +    XLogRecPtr curr_lsn;
> +
> +    /* Validate WAL start LSN. */
> +    if (XLogRecPtrIsInvalid(start_lsn))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid WAL start LSN")));
> +
> +    if (!RecoveryInProgress())
> +        curr_lsn = GetFlushRecPtr(NULL);
> +    else
> +        curr_lsn = GetXLogReplayRecPtr(NULL);
> +
> +    if (start_lsn >= curr_lsn)
> +    {
> +        /*
> +          * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last
> +         * record flushed or replayed respectively. But let's use the LSN up
> +         * to "end" in user facing message.
> +          */
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("cannot accept future start LSN"),
> +                 errdetail("Last known WAL LSN on the database system is %X/%X.",
> +                           LSN_FORMAT_ARGS(curr_lsn - 1))));
> +    }

> +    /*
> +     * If end_lsn is specified, let's ensure that it's not a future LSN i.e.
> +     * something the database system doesn't know about.
> +     */
> +    if (!XLogRecPtrIsInvalid(end_lsn) &&
> +        (end_lsn >= curr_lsn))
> +    {
> +        /*
> +          * GetFlushRecPtr or GetXLogReplayRecPtr gives "end+1" LSN of the last
> +         * record flushed or replayed respectively. But let's use the LSN up
> +         * to "end" in user facing message.
> +          */
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("cannot accept future end LSN"),
> +                 errdetail("Last known WAL LSN on the database system is %X/%X.",
> +                           LSN_FORMAT_ARGS(curr_lsn - 1))));
> +    }
> +
> +    /*
> +     * When end_lsn is not specified let's read up to the last WAL position
> +     * known to be on the server.
> +     */
> +    if (XLogRecPtrIsInvalid(end_lsn))
> +        end_lsn = curr_lsn;
> +
> +    Assert(!XLogRecPtrIsInvalid(end_lsn));
> +
> +    if (start_lsn >= end_lsn)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("WAL start LSN must be less than end LSN")));
> +
> +    return end_lsn;
> +}

These two functions are largely redundant, that doesn't seem great.


> +Datum
> +pg_get_raw_wal_record(PG_FUNCTION_ARGS)
> +{
> +#define PG_GET_RAW_WAL_RECORD_COLS 5
> +    XLogRecPtr    lsn;
> +    XLogRecord *record;
> +    XLogRecPtr    first_record;
> +    XLogReaderState *xlogreader;
> +    bytea    *raw_record;
> +    uint32    rec_len;
> +    char    *raw_record_data;
> +    TupleDesc    tupdesc;
> +    Datum    result;
> +    HeapTuple    tuple;
> +    Datum    values[PG_GET_RAW_WAL_RECORD_COLS];
> +    bool    nulls[PG_GET_RAW_WAL_RECORD_COLS];
> +    int    i = 0;
> +
> +    lsn = PG_GETARG_LSN(0);
> +
> +    (void) ValidateInputLSN(lsn);
> +
> +    /* Build a tuple descriptor for our result type. */
> +    if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
> +        elog(ERROR, "return type must be a row type");
> +
> +    xlogreader = InitXLogReaderState(lsn, &first_record);
> +
> +    Assert(xlogreader);
> +
> +    record = ReadNextXLogRecord(xlogreader, first_record);
> +
> +    rec_len = XLogRecGetTotalLen(xlogreader);
> +
> +    Assert(rec_len > 0);
> +

Most of this has another copy in pg_get_wal_record_info(). Can more of this be
deduplicated?


> +/*
> + * Get WAL record info.
> + */
> +static void
> +GetXLogRecordInfo(XLogReaderState *record, XLogRecPtr lsn,
> +                  Datum *values, bool *nulls, uint32 ncols)
> +{
> +    const char *id;
> +    const RmgrData *desc;
> +    uint32    fpi_len = 0;
> +    RelFileNode rnode;
> +    ForkNumber    forknum;
> +    BlockNumber blk;
> +    int    block_id;
> +    StringInfoData rec_desc;
> +    StringInfoData rec_blk_ref;
> +    StringInfoData temp;
> +    bytea    *data;
> +    char    *main_data;
> +    uint32    main_data_len;
> +    int    i = 0;
> +
> +    desc = &RmgrTable[XLogRecGetRmid(record)];
> +    initStringInfo(&rec_desc);
> +    id = desc->rm_identify(XLogRecGetInfo(record));
> +
> +    if (id == NULL)
> +        appendStringInfo(&rec_desc, "UNKNOWN (%x) ", XLogRecGetInfo(record) & ~XLR_INFO_MASK);
> +    else
> +        appendStringInfo(&rec_desc, "%s ", id);
> +
> +    initStringInfo(&temp);
> +    desc->rm_desc(&temp, record);
> +    appendStringInfo(&rec_desc, "%s", temp.data);
> +    pfree(temp.data);
> +    initStringInfo(&rec_blk_ref);

This seems unnecessarily wasteful. You serialize into one stringinfo, just to
then copy that stringinfo into another stringinfo. Just to then allocate yet
another stringinfo.


> +    /* Block references (detailed format). */

This comment seems copied from pg_waldump, but doesn't make sense here,
because there's no short format.


> +    for (block_id = 0; block_id <= record->max_block_id; block_id++)
> +    {
> +        if (!XLogRecHasBlockRef(record, block_id))
> +            continue;
> +
> +        XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blk);
> +
> +        if (forknum != MAIN_FORKNUM)
> +            appendStringInfo(&rec_blk_ref,
> +                            "blkref #%u: rel %u/%u/%u fork %s blk %u",
> +                            block_id, rnode.spcNode, rnode.dbNode,
> +                            rnode.relNode, get_forkname(forknum), blk);
> +        else
> +            appendStringInfo(&rec_blk_ref,
> +                            "blkref #%u: rel %u/%u/%u blk %u",
> +                            block_id, rnode.spcNode, rnode.dbNode,
> +                            rnode.relNode, blk);
> +
> +        if (XLogRecHasBlockImage(record, block_id))
> +        {
> +            uint8        bimg_info = record->blocks[block_id].bimg_info;
> +
> +            /* Calculate the amount of FPI data in the record. */
> +            fpi_len += record->blocks[block_id].bimg_len;
> +
> +            if (BKPIMAGE_COMPRESSED(bimg_info))
> +            {
> +                const char *method;
> +
> +                if ((bimg_info & BKPIMAGE_COMPRESS_PGLZ) != 0)
> +                    method = "pglz";
> +                else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
> +                    method = "lz4";
> +                else
> +                    method = "unknown";
> +
> +                appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u, "
> +                                 "compression saved: %u, method: %s",
> +                                 XLogRecBlockImageApply(record, block_id) ?
> +                                 "" : " for WAL verification",
> +                                 record->blocks[block_id].hole_offset,
> +                                 record->blocks[block_id].hole_length,
> +                                 BLCKSZ -
> +                                 record->blocks[block_id].hole_length -
> +                                 record->blocks[block_id].bimg_len,
> +                                 method);
> +            }
> +            else
> +            {
> +                appendStringInfo(&rec_blk_ref, " (FPW%s); hole: offset: %u, length: %u",
> +                                 XLogRecBlockImageApply(record, block_id) ?
> +                                 "" : " for WAL verification",
> +                                 record->blocks[block_id].hole_offset,
> +                                 record->blocks[block_id].hole_length);
> +            }
> +        }
> +    }

To me duplicating this much code from waldump seems like a bad idea from a
maintainability POV.



> +/*
> + * Get info and data of all WAL records between start LSN and end LSN.
> + */
> +static void
> +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
> +                          XLogRecPtr end_lsn)
> +{
> +#define PG_GET_WAL_RECORDS_INFO_COLS 11
> +    XLogRecPtr    first_record;
> +    XLogReaderState *xlogreader;
> +    ReturnSetInfo *rsinfo;
> +    TupleDesc    tupdesc;
> +    Tuplestorestate *tupstore;
> +    MemoryContext per_query_ctx;
> +    MemoryContext oldcontext;
> +    Datum    values[PG_GET_WAL_RECORDS_INFO_COLS];
> +    bool    nulls[PG_GET_WAL_RECORDS_INFO_COLS];
> +
> +    rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +
> +    /* Check to see if caller supports us returning a tuplestore. */
> +    if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                 errmsg("set-valued function called in context that cannot accept a set")));
> +    if (!(rsinfo->allowedModes & SFRM_Materialize))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                 errmsg("materialize mode required, but it is not allowed in this context")));
> +
> +    /* Build a tuple descriptor for our result type. */
> +    if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
> +        elog(ERROR, "return type must be a row type");
> +
> +    /* Build tuplestore to hold the result rows. */
> +    per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
> +    oldcontext = MemoryContextSwitchTo(per_query_ctx);
> +    tupstore = tuplestore_begin_heap(true, false, work_mem);
> +    rsinfo->returnMode = SFRM_Materialize;
> +    rsinfo->setResult = tupstore;
> +    rsinfo->setDesc = tupdesc;

This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807.


> +    for (;;)
> +    {
> +        (void) ReadNextXLogRecord(xlogreader, first_record);
> +
> +        /*
> +         * Let's not show the record info if it is spanning more than the
> +         * end_lsn. EndRecPtr is "end+1" of the last read record, hence
> +         * use "end" here.
> +         */
> +        if ((xlogreader->EndRecPtr - 1) <= end_lsn)
> +        {
> +            GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls,
> +                               PG_GET_WAL_RECORDS_INFO_COLS);
> +
> +            tuplestore_putvalues(tupstore, tupdesc, values, nulls);
> +        }
> +
> +        /* Exit loop if read up to end_lsn. */
> +        if (xlogreader->EndRecPtr >= end_lsn)
> +            break;

Seems weird to have both of these conditions separately.


Greetings,

Andres Freund



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 18, 2022 at 8:07 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Hi Bharath,
>
> Due to recent commits on master, the pg_walinpect module is not
> compiling. Kindly update the patch.

Thanks Nitin. Here's an updated v12 patch-set. I will respond to
Andres comments in a while.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> First look at this patch, so I might be repeating stuff already commented on /
> discussed.

Thanks for taking a look at the patch.

> On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> > +--
> > +-- pg_get_raw_wal_record()
>
> What is raw about the function?

It right now gives data starting from the output of XLogReadRecord
upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord
returns a pointer to the decoded record's header, I'm not sure it's
the right choice. Actually, this function's intention(not an immediate
use-case though), is to feed the WAL record to another function and
then, say, repair a corrupted page given a base data page.

As I said upthread, I'm open to removing this function for now, when a
realistic need comes we can add it back. It also raised some concerns
around the security and permissions. Thoughts?

> Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
> NULL lsn, does it?  Also, that's the default, why is it restated here?

pg_get_wal_records_info needed that option (if end_lsn being the
default, providing WAL info upto the end of WAL). Also, we can emit
better error message ("invalid WAL start LSN") instead of generic one.
I wanted to keep error message and code same across all the functions
hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.

> > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
>
> I don't think it's appropriate for pg_monitor to see all the data in the WAL.

How about pg_read_server_files or some other?

> > +-- pg_get_wal_stats()
>
> This seems like an exceedingly expensive way to compute this. Not just because
> of doing the grouping, window etc, but also because it's serializing the
> "data" field from pg_get_wal_records_info() just to never use it. With any
> appreciatable amount of data the return value pg_get_wal_records_info() will
> be serialized into a on-disk tuplestore.
>
> This is probably close to an order of magnitude slower than pg_waldump
> --stats. Which imo renders this largely useless.

Yeah that's true. Do you suggest having pg_get_wal_stats() a
c-function like in v8 patch [1]?

SEe some numbers at [2] with pg_get_wal_stats using
pg_get_wal_records_info and pg_get_wal_records_info as a direct
c-function like in v8 patch [1]. A direct c-function always fares
better (84 msec vs 1400msec).

> The column names don't seem great either. "tfpil"?

"total fpi length" - tfpil wanted to keep it short - it's just an
internal column name isn't it? The actual column name the user sees is
fpi_length.

> > +void
> > +_PG_init(void)
>
> > +void
> > +_PG_fini(void)
>
> Why have this stuff if it's not used?

I kept it as a placeholder for future code additions, for instance
test_decoding.c and ssl_passphrase_func.c has empty _PG_init(),
_PG_fini(). If okay, I can mention there like "placeholder for now",
otherwise I can remove it.

> > +static XLogRecPtr
> > +ValidateInputLSN(XLogRecPtr lsn)
>
> > +static XLogRecPtr
> > +ValidateStartAndEndLSNs(XLogRecPtr start_lsn, XLogRecPtr end_lsn)
> > +{
>
> These two functions are largely redundant, that doesn't seem great.

I will modify it in the next version.

> > +Datum
> > +pg_get_raw_wal_record(PG_FUNCTION_ARGS)
>
> Most of this has another copy in pg_get_wal_record_info(). Can more of this be
> deduplicated?

I will do, if we decide on whether or not to have the
pg_get_raw_wal_record function at all? Please see my comments above.

> > +     initStringInfo(&temp);
> > +     desc->rm_desc(&temp, record);
> > +     appendStringInfo(&rec_desc, "%s", temp.data);
> > +     pfree(temp.data);
> > +     initStringInfo(&rec_blk_ref);
>
> This seems unnecessarily wasteful. You serialize into one stringinfo, just to
> then copy that stringinfo into another stringinfo. Just to then allocate yet
> another stringinfo.

Yeah, I will remove it. Looks like all the rm_desc callbacks append to
the passed-in buffer and not reset it, so we should be good.

> > +     /* Block references (detailed format). */
>
> This comment seems copied from pg_waldump, but doesn't make sense here,
> because there's no short format.

Yes, I will remove it.

> > +     for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > +     {
>
> To me duplicating this much code from waldump seems like a bad idea from a
> maintainability POV.

Even if we were to put the above code from pg_walinspect and
pg_waldump into, say, walutils.c or some other existing file, there we
had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
printf() sort of thing, isn't it clumsy? Also, unnecessary if
conditions need to be executed for every record. For maintainability,
I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
things in both places (of course this might sound dumbest way of doing
it, IMHO, it's sensible, given the if(pg_walinspect)-else
if(pg_waldump) sorts of code that we need to do in the common
functions). Thoughts?

> > +/*
> > + * Get info and data of all WAL records between start LSN and end LSN.
> > + */
> > +static void
> > +GetWALRecordsInfoInternal(FunctionCallInfo fcinfo, XLogRecPtr start_lsn,
>
> This should likely use the infrastructure introduced in 5b81703787bfc1e6072c8e37125eba0c5598b807.

Yes, I will change it.

> > +     for (;;)
> > +     {
> > +             (void) ReadNextXLogRecord(xlogreader, first_record);
> > +
> > +             /*
> > +              * Let's not show the record info if it is spanning more than the
> > +              * end_lsn. EndRecPtr is "end+1" of the last read record, hence
> > +              * use "end" here.
> > +              */
> > +             if ((xlogreader->EndRecPtr - 1) <= end_lsn)
> > +             {
> > +                     GetXLogRecordInfo(xlogreader, xlogreader->currRecPtr, values, nulls,
> > +                                                       PG_GET_WAL_RECORDS_INFO_COLS);
> > +
> > +                     tuplestore_putvalues(tupstore, tupdesc, values, nulls);
> > +             }
> > +
> > +             /* Exit loop if read up to end_lsn. */
> > +             if (xlogreader->EndRecPtr >= end_lsn)
> > +                     break;
>
> Seems weird to have both of these conditions separately.

Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
and avoid waiting in read_local_xlog_page. I will change it.

Actually, there's an open point as specified in [3]. Any thoughts on it?

[1] https://www.postgresql.org/message-id/CALj2ACWhcbW_s4BXLyCpLWcCppZN9ncTXHbn4dv1F0Vpe0kxqA%40mail.gmail.com
[2] with pg_get_wal_stats using pg_get_wal_stats:
Time: 1394.919 ms (00:01.395)
Time: 1403.199 ms (00:01.403)
Time: 1408.138 ms (00:01.408)
Time: 1397.670 ms (00:01.398)

with pg_get_wal_stats as a c-function like in v8 patch [1]:
Time: 84.319 ms
Time: 84.303 ms
Time: 84.208 ms
Time: 84.452 ms

use case:
create extension pg_walinspect;

create table foo(col int);
insert into foo select * from generate_series(1, 100000);
update foo set col = col*2+1;
delete from foo;

\timing on
select * from pg_get_wal_stats('0/01000028');
\timing off

output:
postgres=# select * from pg_get_wal_stats('0/01000028');
 resource_manager | count  | count_percentage | record_length |
record_length_percentage | fpi_length | fpi_length_percentage

------------------+--------+------------------+---------------+--------------------------+------------+-----------------------
 Storage          |     13 |                0 |           546 |
                0 |          0 |                     0
 CLOG             |      1 |                0 |            30 |
                0 |          0 |                     0
 Database         |      2 |                0 |            84 |
                0 |          0 |                     0
 Btree            |  13078 |              3.1 |       1486990 |
             4.97 |     461512 |                 23.13
 Heap             | 404835 |            95.84 |      26354653 |
            88.17 |     456576 |                 22.88
 Transaction      |    721 |             0.17 |        178933 |
              0.6 |          0 |                     0
 Heap2            |   3056 |             0.72 |       1131836 |
             3.79 |     376932 |                 18.89
 Standby          |    397 |             0.09 |         23226 |
             0.08 |          0 |                     0
 XLOG             |    316 |             0.07 |        716027 |
              2.4 |     700164 |                 35.09
(9 rows)

[3] https://www.postgresql.org/message-id/CALj2ACVBST5Us6-eDz4q_Gem3rUHSC7AYNOB7tmp9Yqq6PHsXw%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Andres Freund
Дата:
Hi,

On 2022-03-22 21:57:51 +0530, Bharath Rupireddy wrote:
> On Sat, Mar 19, 2022 at 5:18 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-03-17 13:25:35 +0530, Bharath Rupireddy wrote:
> > > +--
> > > +-- pg_get_raw_wal_record()
> >
> > What is raw about the function?
> 
> It right now gives data starting from the output of XLogReadRecord
> upto XLogRecGetTotalLen(xlogreader); length. Given that XLogReadRecord
> returns a pointer to the decoded record's header, I'm not sure it's
> the right choice. Actually, this function's intention(not an immediate
> use-case though), is to feed the WAL record to another function and
> then, say, repair a corrupted page given a base data page.
> 
> As I said upthread, I'm open to removing this function for now, when a
> realistic need comes we can add it back. It also raised some concerns
> around the security and permissions. Thoughts?

I'm ok with having it with appropriate permissions, I just don't like the
name.


> > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
> > NULL lsn, does it?  Also, that's the default, why is it restated here?
> 
> pg_get_wal_records_info needed that option (if end_lsn being the
> default, providing WAL info upto the end of WAL). Also, we can emit
> better error message ("invalid WAL start LSN") instead of generic one.
> I wanted to keep error message and code same across all the functions
> hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.

I think it should be strict if it behaves strict. I fail to see what
consistency in error messages is worth here. And I'd probably just create two
different functions for begin and begin & end LSN and mark those as strict as
well.


> > > +REVOKE EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) FROM PUBLIC;
> > > +GRANT EXECUTE ON FUNCTION pg_get_wal_record_info(pg_lsn) TO pg_monitor;
> >
> > I don't think it's appropriate for pg_monitor to see all the data in the WAL.
> 
> How about pg_read_server_files or some other?

That seems more appropriate.


> > > +-- pg_get_wal_stats()
> >
> > This seems like an exceedingly expensive way to compute this. Not just because
> > of doing the grouping, window etc, but also because it's serializing the
> > "data" field from pg_get_wal_records_info() just to never use it. With any
> > appreciatable amount of data the return value pg_get_wal_records_info() will
> > be serialized into a on-disk tuplestore.
> >
> > This is probably close to an order of magnitude slower than pg_waldump
> > --stats. Which imo renders this largely useless.
> 
> Yeah that's true. Do you suggest having pg_get_wal_stats() a
> c-function like in v8 patch [1]?

Yes.


> SEe some numbers at [2] with pg_get_wal_stats using
> pg_get_wal_records_info and pg_get_wal_records_info as a direct
> c-function like in v8 patch [1]. A direct c-function always fares
> better (84 msec vs 1400msec).

That indeed makes the view as is pretty much useless. And it'd probably be
worse in a workload with longer records / many FPIs.


> > > +void
> > > +_PG_init(void)
> >
> > > +void
> > > +_PG_fini(void)
> >
> > Why have this stuff if it's not used?
> 
> I kept it as a placeholder for future code additions, for instance
> test_decoding.c and ssl_passphrase_func.c has empty _PG_init(),
> _PG_fini(). If okay, I can mention there like "placeholder for now",
> otherwise I can remove it.

That's not comparable, the test_decoding case has it as a placeholder because
it serves as a template to create further output plugins. Something not the
case here. So please remove.


> > > +     for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > > +     {
> >
> > To me duplicating this much code from waldump seems like a bad idea from a
> > maintainability POV.
> 
> Even if we were to put the above code from pg_walinspect and
> pg_waldump into, say, walutils.c or some other existing file, there we
> had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
> printf() sort of thing, isn't it clumsy?

Why is that needed? Just use the stringinfo in both places? You're outputting
the exact same thing in both places right now. There's already a stringinfo in
XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was
written), so you could just convert at least the relevant printfs in
XLogDumpDisplayRecord().


> Also, unnecessary if
> conditions need to be executed for every record. For maintainability,
> I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
> things in both places (of course this might sound dumbest way of doing
> it, IMHO, it's sensible, given the if(pg_walinspect)-else
> if(pg_waldump) sorts of code that we need to do in the common
> functions). Thoughts?

IMO we shouldn't merge this with as much duplication as there is right now,
the notes don't change that it's a PITA to maintain.


> Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
> and avoid waiting in read_local_xlog_page. I will change it.
> 
> Actually, there's an open point as specified in [3]. Any thoughts on it?

Seems more user-friendly to wait - it's otherwise hard for a user to know what
LSN to put in.

Greetings,

Andres Freund



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Tue, 22 Mar 2022 11:00:06 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2022-03-22 21:57:51 +0530, Bharath Rupireddy wrote:
> > > This is probably close to an order of magnitude slower than pg_waldump
> > > --stats. Which imo renders this largely useless.
> > 
> > Yeah that's true. Do you suggest having pg_get_wal_stats() a
> > c-function like in v8 patch [1]?
> 
> Yes.
>
> > SEe some numbers at [2] with pg_get_wal_stats using
> > pg_get_wal_records_info and pg_get_wal_records_info as a direct
> > c-function like in v8 patch [1]. A direct c-function always fares
> > better (84 msec vs 1400msec).
> 
> That indeed makes the view as is pretty much useless. And it'd probably be
> worse in a workload with longer records / many FPIs.

FWIW agreed. The SQL version is too slow..


> > > > +     for (block_id = 0; block_id <= record->max_block_id; block_id++)
> > > > +     {
> > >
> > > To me duplicating this much code from waldump seems like a bad idea from a
> > > maintainability POV.
> > 
> > Even if we were to put the above code from pg_walinspect and
> > pg_waldump into, say, walutils.c or some other existing file, there we
> > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
> > printf() sort of thing, isn't it clumsy?
> 
> Why is that needed? Just use the stringinfo in both places? You're outputting
> the exact same thing in both places right now. There's already a stringinfo in
> XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was
> written), so you could just convert at least the relevant printfs in
> XLogDumpDisplayRecord().

> > Also, unnecessary if
> > conditions need to be executed for every record. For maintainability,
> > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
> > things in both places (of course this might sound dumbest way of doing
> > it, IMHO, it's sensible, given the if(pg_walinspect)-else
> > if(pg_waldump) sorts of code that we need to do in the common
> > functions). Thoughts?
> 
> IMO we shouldn't merge this with as much duplication as there is right now,
> the notes don't change that it's a PITA to maintain.

The two places emit different outputs but the only difference is the
delimiter between two blockrefs. (By the way, the current code forgets
to insert a delimiter there).  So even if the function took "bool
is_waldump", it is used only when appending a line delimiter.  It
would be nicer if the "bool is_waldump" were "char *delimiter".
Others might think differently, though..

So, the function looks like this.

StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
                            uint32 &fpi_len);


> > Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
> > and avoid waiting in read_local_xlog_page. I will change it.
> > 
> > Actually, there's an open point as specified in [3]. Any thoughts on it?
> 
> Seems more user-friendly to wait - it's otherwise hard for a user to know what
> LSN to put in.

I'm not sure it is user-friendly that the function "freeze"s for a
reason uncertain to the user..  Even if any results are accumulated
before waiting, all of them vanishes by entering Ctrl-C to release the
"freeze".

About the usefulness of the waiting behavior, it depends on what we
think the function's major use cases are.  Robert (AFAIU) thinks it as
a simple WAL dumper that is intended to use in some automated
mechanism.  The start/end LSNs simply identifies the records to emit.
No warning/errors and no waits except for apparently invalid inputs.

I thought it as a means by which to manually inspect wal on SQL
interface but don't have a strong opinion on the waiting behavior.
(Because I can avoid that by giving a valid LSN pair to the function
if I don't want it to "freeze".)


Anyway, the opinions here on the interface are described as follows.

A. as a diag interface for human use.

  1. If the whole region is filled with records, return them all.
  2. If start-LSN is too past, starts from the first available record.

  3-1. If start-LSN is in futnre, wait for the record to come.
  4-1. If end-LSN is in future, waits for new records.
  5-1. If end-LSN is too past, error out?

B. as a simple WAL dumper

  1. If the whole region is filled with records, return them all.
  2. If start-LSN is too past, starts from the first available record.

  3-2. If start-LSN is in futnre, returns nothig.
  4-2. If end-LSN is in future, ends with the last available record.
  5-2. If end-LSN is too past, returns northing.

1 and 2 are uncontroversial.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Wed, 23 Mar 2022 11:51:25 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> The two places emit different outputs but the only difference is the
> delimiter between two blockrefs. (By the way, the current code forgets
> to insert a delimiter there).  So even if the function took "bool
> is_waldump", it is used only when appending a line delimiter.  It
> would be nicer if the "bool is_waldump" were "char *delimiter".
> Others might think differently, though..
> 
> So, the function looks like this.
> 
> StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
>                             uint32 &fpi_len);

By the way, xlog_block_info@xlogrecovery.c has the subset of the
function.  So the function can be shared with the callers of
xlog_block_info but I'm not sure it is not too-much...

StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
                               bool fpw_detail, uint32 &fpi_len);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Tue, Mar 22, 2022 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
>
> > > Why "CALLED ON NULL INPUT"? It doesn't make sense to call the function with a
> > > NULL lsn, does it?  Also, that's the default, why is it restated here?
> >
> > pg_get_wal_records_info needed that option (if end_lsn being the
> > default, providing WAL info upto the end of WAL). Also, we can emit
> > better error message ("invalid WAL start LSN") instead of generic one.
> > I wanted to keep error message and code same across all the functions
> > hence CALLED ON NULL INPUT option for pg_get_raw_wal_record.
>
> I think it should be strict if it behaves strict. I fail to see what
> consistency in error messages is worth here. And I'd probably just create two
> different functions for begin and begin & end LSN and mark those as strict as
> well.

I'm okay with changing them to be STRICT. Right now, the behaviour of
pg_get_wal_records_info is this:
CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
    IN end_lsn pg_lsn DEFAULT NULL,

select pg_get_wal_records_info(start_lsn, end_lsn);
if start_lsn is future, then errors out
if end_lsn is future, then errors out
otherwise, returns WAL records info between start_lsn and end_lsn

select pg_get_wal_records_info(start_lsn);
if start_lsn is future, then errors out
sets end_lsn = current server lsn and returns WAL records info between
start_lsn and end_lsn

Same is true for pg_get_wal_stats.

Getting WAL records info provided start_lsn until end-of-WAL is a
basic ask and a good function to have. Now, if I were to make
pg_get_wal_records_info STRICT, then I would need to have another
function like pg_get_wal_records_info_till_end_of_wal/pg_get_wal_stats_till_end_of_wal
much like ones in few of my initial patches upthread.

Is it okay to have these functions pg_get_wal_records_info(start_lsn,
end_lsn)/pg_get_wal_stats(start_lsn, end_lsn) and
pg_get_wal_records_info_till_end_of_wal(start_lsn)/pg_get_wal_stats_till_end_of_wal(start_lsn)?
This way, it will be more clear to the user actually than to stuff
more than one behaviour in a single function with default values.

Please let me know your thoughts.

> > Yeah. It is to handle some edge cases to print the WAL  upto end_lsn
> > and avoid waiting in read_local_xlog_page. I will change it.
> >
> > Actually, there's an open point as specified in [3]. Any thoughts on it?
>
> Seems more user-friendly to wait - it's otherwise hard for a user to know what
> LSN to put in.

I agree with Kyotaro-san that the wait behavior isn't a good choice,
because CTRL+C would not emit the accumulated info/stats unlike
pg_waldump. Also, with wait behaviour it's easy for a user to trick
the server with an unreasonably futuristic WAL LSN, say F/FFFFFFFF.
Also, if we use pg_walinspect functions, say, within a WAL monitoring
app, the wait behaviour isn't good there as it might look like the
functions hanging the app. We might think about adding a timeout for
waiting, but that doesn't seem an elegant way. Users/Apps can easily
figure out the LSNs to get WAL info/stats - either they can use
pg_current_wal_XXXX or by looking at the control file or server logs
or pg_stat_replication, what not. LSNs are everywhere within the
postgres eco-system.

Instead, the functions simply can figure out what's current server LSN
at-the-moment and choose to error out if any of the provided input LSN
is beyond that as it's being done currently. This looks simpler and
user-friendly.

On Wed, Mar 23, 2022 at 8:27 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 23 Mar 2022 11:51:25 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > The two places emit different outputs but the only difference is the
> > delimiter between two blockrefs. (By the way, the current code forgets
> > to insert a delimiter there).  So even if the function took "bool
> > is_waldump", it is used only when appending a line delimiter.  It
> > would be nicer if the "bool is_waldump" were "char *delimiter".
> > Others might think differently, though..
> >
> > So, the function looks like this.
> >
> > StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
> >                                                       uint32 &fpi_len);
>
> By the way, xlog_block_info@xlogrecovery.c has the subset of the
> function.  So the function can be shared with the callers of
> xlog_block_info but I'm not sure it is not too-much...
>
> StringInfo XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
>                                                         bool fpw_detail, uint32 &fpi_len);
>

Yes, putting them in a common function is a good idea. I'm thinking
something like below.
StringInfo
XLogBlockRefInfos(XLogReaderState *record, char *delimiter,
                  uint32 *fpi_len, bool detailed_format)

I will try to put the common functions in xlogreader.h/.c, so that
both pg_waldump and pg_walinspect can make use of it. Thoughts?

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Tue, Mar 22, 2022 at 11:30 PM Andres Freund <andres@anarazel.de> wrote:
>
> > > To me duplicating this much code from waldump seems like a bad idea from a
> > > maintainability POV.
> >
> > Even if we were to put the above code from pg_walinspect and
> > pg_waldump into, say, walutils.c or some other existing file, there we
> > had to make if (pg_walinspect) appendStringInfo else if (pg_waldump)
> > printf() sort of thing, isn't it clumsy?
>
> Why is that needed? Just use the stringinfo in both places? You're outputting
> the exact same thing in both places right now. There's already a stringinfo in
> XLogDumpDisplayRecord() these days (there wasn't back when pg_xlogddump was
> written), so you could just convert at least the relevant printfs in
> XLogDumpDisplayRecord().
>
> > Also, unnecessary if
> > conditions need to be executed for every record. For maintainability,
> > I added a note in pg_walinspect.c and pg_waldump.c to consider fixing
> > things in both places (of course this might sound dumbest way of doing
> > it, IMHO, it's sensible, given the if(pg_walinspect)-else
> > if(pg_waldump) sorts of code that we need to do in the common
> > functions). Thoughts?
>
> IMO we shouldn't merge this with as much duplication as there is right now,
> the notes don't change that it's a PITA to maintain.

Here's a refactoring patch that basically moves the pg_waldump's
functions and stats structures to xlogreader.h/.c so that the
pg_walinspect can reuse them. If it looks okay, I will send the
pg_walinspect patches based on it.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Kyotaro Horiguchi
Дата:
At Wed, 23 Mar 2022 21:36:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Here's a refactoring patch that basically moves the pg_waldump's
> functions and stats structures to xlogreader.h/.c so that the
> pg_walinspect can reuse them. If it looks okay, I will send the
> pg_walinspect patches based on it.


+void
+XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
+                       uint32 *fpi_len, bool detailed_format,
+                       StringInfo buf)
...
+        if (detailed_format && delimiter)
+            appendStringInfoChar(buf, '\n');

It is odd that the variable "delimiter" is used as a bool in the
function, though it is a "char *", which I meant that it is used as
delimiter string (assuming that you might want to insert ", " between
two blkref descriptions).


+get_forkname(ForkNumber num)

forkNames[] is public and used in reinit.c.  I think we don't need
this function.


+#define MAX_XLINFO_TYPES 16
...
+    XLogRecStats    rmgr_stats[RM_NEXT_ID];
+    XLogRecStats    record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+} XLogStats;
+

This doesn't seem to be a part of xlogreader.  Couldn't we add a new
module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
a part of xlogreader, the xlogstats looks like a better place.


+#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
+                                  tot_len_pct, total_count, \

It doesn't need to be a macro. However in the first place I don't
think it is useful to have.  Rather it may be harmful since it doesn't
reduce complexity much but instead just hides details.  If we want to
avoid tedious repetitions of the same statements, a macro like the
following may work.

#define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
...
>   n_pct = CALC_PCT(n, total_count);
>   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
>   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
>   tot_len_pct = CALC_PCT(tot_len, total_len);

But it is not seem that different if we directly write out the detail.

>   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
>   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
>   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
>   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> +void
> +XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
> +                                          uint32 *fpi_len, bool detailed_format,
> +                                          StringInfo buf)
> ...
> +               if (detailed_format && delimiter)
> +                       appendStringInfoChar(buf, '\n');
>
> It is odd that the variable "delimiter" is used as a bool in the
> function, though it is a "char *", which I meant that it is used as
> delimiter string (assuming that you might want to insert ", " between
> two blkref descriptions).

I'm passing NULL if the delimiter isn't required (for pg_walinspect)
and I wanted to check if it's passed, so I was using the delimiter in
the condition. However, I now changed it to delimiter != NULL.

> +get_forkname(ForkNumber num)
>
> forkNames[] is public and used in reinit.c.  I think we don't need
> this function.

Yes. I removed it.

> +#define MAX_XLINFO_TYPES 16
> ...
> +       XLogRecStats    rmgr_stats[RM_NEXT_ID];
> +       XLogRecStats    record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
> +} XLogStats;
> +
>
> This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> a part of xlogreader, the xlogstats looks like a better place.

I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
structures, 1 macro, and 2 functions with no plan to add new stats
structures or functions. Since xlogreader is the one that reads the
WAL, and is being included by both backend and other modules (tools
and extensions) IMO it's the right place. However, I can specify in
xlogreader that if at all the new stats related structures or
functions are going to be added, it's good to move them into a new
header and .c file.

Thoughts?

> +#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
> +                                                                 tot_len_pct, total_count, \
>
> It doesn't need to be a macro. However in the first place I don't
> think it is useful to have.  Rather it may be harmful since it doesn't
> reduce complexity much but instead just hides details.  If we want to
> avoid tedious repetitions of the same statements, a macro like the
> following may work.
>
> #define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
> ...
> >   n_pct = CALC_PCT(n, total_count);
> >   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
> >   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
> >   tot_len_pct = CALC_PCT(tot_len, total_len);
>
> But it is not seem that different if we directly write out the detail.
>
> >   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
> >   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
> >   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
> >   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

I removed the XLOG_GET_STATS_PERCENTAGE macro.

Attaching v14 patch-set here with. It has bunch of other changes along
with the above:

1) Used STRICT for all the functions and introduced _till_end_of_wal
versions for pg_get_wal_records_info and pg_get_wal_stats.
2) Most of the code is reused between pg_walinspect and pg_waldump and
also within pg_walinspect.
3) Added read_local_xlog_page_no_wait without duplicating the code so
that the pg_walinspect functions don't wait even while finding the
first valid WAL record.
4) No function waits for future WAL lsn even to find the first valid WAL record.
5) Addressed the review comments raised upthread by Andres.

I hope this version makes the patch cleaner, please review it further.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Andres Freund
Дата:
Hi,

On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote:
> On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
> > This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> > module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> > a part of xlogreader, the xlogstats looks like a better place.
> 
> I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
> structures, 1 macro, and 2 functions with no plan to add new stats
> structures or functions. Since xlogreader is the one that reads the
> WAL, and is being included by both backend and other modules (tools
> and extensions) IMO it's the right place. However, I can specify in
> xlogreader that if at all the new stats related structures or
> functions are going to be added, it's good to move them into a new
> header and .c file.

I don't like that location for XLogRecGetBlockRefInfo(). How about putting it
in xlogdesc.c - that kind of fits?

And what do you think about creating src/backend/access/rmgrdesc/stats.c for
XLogRecStoreStats()? It's not a perfect location, but not too bad either.

XLogRecGetLen() would be ok in xlogreader, but stats.c also would work?

Greetings,

Andres Freund



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 25, 2022 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote:
> > On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
> > > This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> > > module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> > > a part of xlogreader, the xlogstats looks like a better place.
> >
> > I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
> > structures, 1 macro, and 2 functions with no plan to add new stats
> > structures or functions. Since xlogreader is the one that reads the
> > WAL, and is being included by both backend and other modules (tools
> > and extensions) IMO it's the right place. However, I can specify in
> > xlogreader that if at all the new stats related structures or
> > functions are going to be added, it's good to move them into a new
> > header and .c file.
>
> I don't like that location for XLogRecGetBlockRefInfo(). How about putting it
> in xlogdesc.c - that kind of fits?

Done.

> And what do you think about creating src/backend/access/rmgrdesc/stats.c for
> XLogRecStoreStats()? It's not a perfect location, but not too bad either.
>
> XLogRecGetLen() would be ok in xlogreader, but stats.c also would work?

I've added a new xlogstats.c/.h (as suggested by Kyotaro-san as well)
file under src/backend/access/transam/. I don't think the new file
fits well under rmgrdesc.

Attaching v15 patch-set, please have a look at it.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
RKN Sai Krishna
Дата:
Hi Bharath,

First look at the patch, bear with me if any of the following comments are repeated.
1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more meaningful to show the corresponding WAL record. 
For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output and not the one with start and end lsn as ('0/17335E0', '0/1733617').

With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.

2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.
rkn=# select * from pg_get_wal_record('0/4041728');
 start_lsn |  end_lsn  | prev_lsn  | record_length |















                 record















-----------+-----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 0/4041728 | 0/40421AF | 0/40416F0 |          2670 | \x6e0a0000d2020000f016040400000000000a0000fef802b400007f7f00000000408fe738a25500000300000000000000010000007f06000000
4000003b0a00000000000012000000100101007f7f7f7f0885e738a25500003000c815380a03000885e738a255000000007f7f7f7f7f7f0000000078674301296000003000f8150020042000000000709e1203909
dba01c89c8601609cd00030986008f8956804d202000000000000000000000000120006001f00030820ffff5f04000000000001400000010000000000000004000000000080bf0200030000000000000000006100
0000610000000000000000000000000000000000000000000000000000000000000000000000330100000000000000bc02000001000000010000000000803f00000000000000b0060000010000000000000017000

3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during recovery right?

4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the specified input lsn range, would it be better to output the existing WAL records displaying a message that it is a partial list of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out saying "requested WAL segment has already been removed"?

5. Following are very minor comments in the code
  • Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN
  • In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data
  • In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable
  • In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements verifying if the start lsn is a future lsn.
Thanks,
RKN

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Fri, Mar 25, 2022 at 8:37 PM RKN Sai Krishna
<rknsaiforpostgres@gmail.com> wrote:
>
> Hi Bharath,
>
> First look at the patch, bear with me if any of the following comments are repeated.

Thanks RKN, for playing around with the patches.

> 1. With pg_get_wal_record(lsn), say a WAL record start, end lsn range contains the specified LSN, wouldn't it be more
meaningfulto show the corresponding WAL record. 

In general, all the functions will first look for a first valid WAL
record from the given input lsn/start lsn(XLogFindNextRecord) and then
give info of all the valid records including the first valid WAL
record until either the given end lsn or till end of WAL depending on
the function used.

> For example, upon providing '0/17335E7' as input, and I see get the WAL record ('0/1733618', '0/173409F') as output
andnot the one with start and end lsn as ('0/17335E0', '0/1733617'). 

If  '0/17335E7' is an LSN containing a valid WAL record,
pg_get_wal_record gives the info of that, otherwise if there's any
next valid WAL record, it finds and gives that info. '0/17335E0' is
before '0/17335E7' the input lsn, so it doesn't show that record, but
the next valid record.

All the pg_walinspect functions don't look for the nearest valid WAL
record (could be previous to input lsn or next to input lsn), but they
look for the next valid WAL record. This is because the xlogreader
infra now has no API for backward iteration from a given LSN ( it has
XLogFindNextRecord and XLogReadRecord which scans the WAL in forward
direction). But, it's a good idea to have XLogFindPreviousRecord and
XLogReadPreviousRecord versions (as we have links for previous WAL
record in each WAL record) but that's a separate discussion.

> With pg_walfile_name(lsn), we can find the WAL segment file name that contains the specified LSN.

Yes.

> 2. I see the following output for pg_get_wal_record. Need to have a look at the spaces I suppose.

I believe this is something psql does for larger column outputs for
pretty-display. When used in a non-psql client, the column values are
returned properly. Nothing to do with the pg_walinspect patches here.

> 3. Should these functions be running in standby mode too? We do not allow WAL control functions to be executed during
recoveryright? 

There are functions that can be executable during recovery
pg_last_wal_receive_lsn, pg_last_wal_replay_lsn. The pg_walinspect
functions are useful even in recovery and I don't see a strong reason
to not support them. Hence, I'm right now supporting them.

> 4. If the wal segment corresponding to the start lsn is removed, but there are WAL records which could be read in the
specifiedinput lsn range, would it be better to output the existing WAL records displaying a message that it is a
partiallist of WAL records and the WAL files corresponding to the rest are already removed, rather than erroring out
saying"requested WAL segment has already been removed"? 

"requested WAL segment %s has already been removed" is a common error
across the xlogreader infra (see wal_segment_open) and I don't want to
invent a new behaviour. And all the segment_open callbacks report an
error when they are not finding the WAL file that they are looking
for.

> 5. Following are very minor comments in the code
>
> Correct the function description by removing "return the LSN up to which the server has WAL" for IsFutureLSN

That's fine, because it actually returns curr_lsn via the function
param curr_lsn. However, I modified the comment a bit.

> In GetXLogRecordInfo, good to have pfree in place for rec_desc, rec_blk_ref, data

No, we are just returning pointer to the string, not deep copying, see
CStringGetTextDatum. All the functions get executed within a
function's memory context and after handing off the results to the
client that gets deleted, deallocating all the memory.

> In GetXLogRecordInfo, can avoid calling XLogRecGetInfo(record) multiple times by capturing in a variable

XLogRecGetInfo is not a function, it's a macro, so that's fine.
#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info)

> In GetWALDetailsGuts, setting end_lsn could be done in single if else and similarly we can club the if statements
verifyingif the start lsn is a future lsn. 

The existing if conditions are:

    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    if (till_end_of_wal)
    if (start_lsn >= end_lsn)

I clubbed them like this:
    if (!till_end_of_wal)
    if (IsFutureLSN(start_lsn, &curr_lsn))
    if (!till_end_of_wal && end_lsn >= curr_lsn)
    else if (till_end_of_wal)

Other if conditions are serving different purposes, so I'm leaving them as-is.

Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch,
others remain the same.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeff Davis
Дата:
On Sat, 2022-03-26 at 10:31 +0530, Bharath Rupireddy wrote:
> Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch,
> others remain the same.

I looked more closely at this patch.

* It seems that pg_get_wal_record() is not returning the correct raw
data for the record. I tested with pg_logical_emit_message, and the
message isn't there. pg_get_wal_record_info() uses XLogRecordGetData(),
which seems closer to what I expect.

* I'm a little unclear on the purpose of pg_get_wal_record(). What does
it offer that the other functions don't?

* I don't think we need the stats at all. We can run GROUP BY queries
on the results of pg_get_wal_records_info().

* Include the xlinfo portion of the wal record in addition to the rmgr,
like pg_waldump --stats=record shows. That way we can GROUP BY that as
well.

* I don't think we need the changes to xlogutils.c. You calculate the
end pointer based on the flush pointer, anyway, so we should never need
to wait (and when I take it out, the tests still pass).


I think we can radically simplify it without losing functionality,
unless I'm missing something.

1. Eliminate pg_get_wal_record(),
pg_get_wal_records_info_till_end_of_wal(), pg_get_wal_stats(),
pg_get_wal_stats_till_end_of_wal().

2. Rename pg_get_wal_record_info -> pg_get_wal_record

3. Rename pg_get_wal_records_info -> pg_get_wal_records

4. For pg_get_wal_records, if end_lsn is NULL, read until the end of
WAL.

5. For pg_get_wal_record and pg_get_wal_records, also return the xlinfo
using rm_identify() if available.

6. Remove changes to xlogutils.

7. Remove the refactor to pull the stats out to a separate file,
because stats aren't needed. 

8. With only two functions in the API, it may even make sense to just
make it a part of postgres rather than a separate module.

However, I'm a little behind on this discussion thread, so perhaps I'm
missing some important context. I'll try to catch up soon.

Regards,
    Jeff Davis






Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Andres Freund
Дата:
Hi,

On 2022-04-01 16:35:38 -0700, Jeff Davis wrote:
> * I don't think we need the stats at all. We can run GROUP BY queries
> on the results of pg_get_wal_records_info().

It's well over an order of magnitude slower. And I don't see how that can be
avoided. That makes it practically useless.

See numbers at the bottom of
https://postgr.es/m/CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR%2BQRbuOw%40mail.gmail.com

Greetings,

Andres Freund



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Sat, Apr 2, 2022 at 5:05 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sat, 2022-03-26 at 10:31 +0530, Bharath Rupireddy wrote:
> > Attaching v16 patch-set, only change in v16-0002-pg_walinspect.patch,
> > others remain the same.
>
> I looked more closely at this patch.

Thanks Jeff for reviewing this.

> * It seems that pg_get_wal_record() is not returning the correct raw
> data for the record. I tested with pg_logical_emit_message, and the
> message isn't there. pg_get_wal_record_info() uses XLogRecordGetData(),
> which seems closer to what I expect.
>
> * I'm a little unclear on the purpose of pg_get_wal_record(). What does
> it offer that the other functions don't?

My intention is to return the overall undecoded WAL record [5] i.e.
the data starting from XLogReadRecord's output [6] till length
XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed
to have this function, I also mentioned a possible use-case there.

pg_get_wal_record_info returns the main data of the WAL record
(xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update
and so on).

> * I don't think we need the stats at all. We can run GROUP BY queries
> on the results of pg_get_wal_records_info().

As identified in [1], SQL-version of stats function is way slower in
normal cases, hence it was agreed (by Andres, Kyotaro-san and myself)
to have a C-function for stats.

> * Include the xlinfo portion of the wal record in addition to the rmgr,
> like pg_waldump --stats=record shows. That way we can GROUP BY that as
> well.
>
> 5. For pg_get_wal_record and pg_get_wal_records, also return the xlinfo
> using rm_identify() if available.

Yes, that's already part of the description column (much like
pg_waldump does) and the users can still do it with GROUP BY and
HAVING clauses, see [4].

> * I don't think we need the changes to xlogutils.c. You calculate the
> end pointer based on the flush pointer, anyway, so we should never need
> to wait (and when I take it out, the tests still pass).
>
> 6. Remove changes to xlogutils.

As mentioned in [1], read_local_xlog_page_no_wait required because the
functions can still wait in read_local_xlog_page for WAL while finding
the first valid record after the given input LSN (the use case is
simple - just provide input LSN closer to server's current flush LSN,
may be off by 3 or 4 bytes).

Also, I tried to keep the changes minimal with the
read_local_xlog_page_guts static function. IMO, that shouldn't be a
problem.

> I think we can radically simplify it without losing functionality,
> unless I'm missing something.
>
> 1. Eliminate pg_get_wal_record(),
> pg_get_wal_records_info_till_end_of_wal(), pg_get_wal_stats(),
> pg_get_wal_stats_till_end_of_wal().
>
> 4. For pg_get_wal_records, if end_lsn is NULL, read until the end of
> WAL.

It's pretty much clear to the users with till_end_of_wal functions
instead of cooking many things into the same functions with default
values for input LSNs as NULL which also requires the functions to be
"CALLED ON NULL INPUT" types which isn't good. This was also suggested
by Andres, see [2], and I agree with it.

> 2. Rename pg_get_wal_record_info -> pg_get_wal_record
>
> 3. Rename pg_get_wal_records_info -> pg_get_wal_records

As these functions aren't returning the WAL record data, but info
about it (decoded data), I would like to retain the function names
as-is.

> 8. With only two functions in the API, it may even make sense to just
> make it a part of postgres rather than a separate module.

As said above, I would like to have till_end_of_wal versions. Firstly,
pg_walinspect functions may not be needed by everyone, the extension
provides a way for the users to install if required. Also, many
hackers have suggested new functions [3], but, right now the idea is
to get pg_walinspect onboard with simple-yet-useful functions and then
think of extending it with new functions later.

[1] https://www.postgresql.org/message-id/CALj2ACUvU2fGLw7keEpxZhGAoMQ9vrCPX-13hexQPoR%2BQRbuOw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20220322180006.hgbsoldgbljyrcm7%40alap3.anarazel.de
[3] There are many functions we can add to pg_walinspect - functions
with wait mode for future WAL, WAL parsing, function to return all the
WAL record info/stats given a WAL file name, functions to return WAL
info/stats from historic timelines as well, function to see if the
given WAL file is valid and so on.
[4]
postgres=# select count(resource_manager), description, from
pg_get_wal_records_info('0/14E0568', '0/14F2568') group by description
having description like '%INSERT_LEAF%';
 count |     description
-------+---------------------
     7 | INSERT_LEAF off 108
     1 | INSERT_LEAF off 111
     1 | INSERT_LEAF off 135
     1 | INSERT_LEAF off 142
     3 | INSERT_LEAF off 143
     1 | INSERT_LEAF off 144
     1 | INSERT_LEAF off 145
     1 | INSERT_LEAF off 146
     1 | INSERT_LEAF off 274
     1 | INSERT_LEAF off 405
(10 rows)

[5]
/*
 * The overall layout of an XLOG record is:
 * Fixed-size header (XLogRecord struct)
 * XLogRecordBlockHeader struct
 * XLogRecordBlockHeader struct
 * ...
 * XLogRecordDataHeader[Short|Long] struct
 * block data
 * block data
 * ...
 * main data

[6]
XLogRecord *
XLogReadRecord(XLogReaderState *state, char **errormsg)
{
    decoded = XLogNextRecord(state, errormsg);
    if (decoded)
    {
        /*
         * This function returns a pointer to the record's header, not the
         * actual decoded record.  The caller will access the decoded record
         * through the XLogRecGetXXX() macros, which reach the decoded
         * recorded as xlogreader->record.
         */
        Assert(state->record == decoded);
        return &decoded->header;
    }

[7] https://www.postgresql.org/message-id/20220322180006.hgbsoldgbljyrcm7%40alap3.anarazel.de

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Jeff Davis
Дата:
On Mon, 2022-04-04 at 09:15 +0530, Bharath Rupireddy wrote:
> My intention is to return the overall undecoded WAL record [5] i.e.
> the data starting from XLogReadRecord's output [6] till length
> XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed
> to have this function, I also mentioned a possible use-case there.

The current patch does not actually do this: it's returning a pointer
into the DecodedXLogRecord struct, which doesn't have the raw bytes of
the WAL record.

To return the raw bytes of the record is not entirely trivial: it seems
we have to look in the decoded record and either find a pointer into
readBuf, or readRecordBuf, depending on whether the record crosses a
boundary or not. If we find a good way to do this I'm fine keeping the
function, but if not, we can leave it for v16.

> pg_get_wal_record_info returns the main data of the WAL record
> (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update
> and so on).

We also discussed just removing the main data from the output here.
It's not terribly useful, and could be arbitrarily large. Similar to
how we leave out the backup block data and images.

> As identified in [1], SQL-version of stats function is way slower in
> normal cases, hence it was agreed (by Andres, Kyotaro-san and myself)
> to have a C-function for stats.a pointer into 

Now I agree. We should also have an equivalent of "pg_waldump --
stats=record" though, too.

> Yes, that's already part of the description column (much like
> pg_waldump does) and the users can still do it with GROUP BY and
> HAVING clauses, see [4].

I still think an extra column for the results of rm_identify() would
make sense. Not critical, but seems useful.

> As mentioned in [1], read_local_xlog_page_no_wait required because
> the
> functions can still wait in read_local_xlog_page for WAL while
> finding
> the first valid record after the given input LSN (the use case is
> simple - just provide input LSN closer to server's current flush LSN,
> may be off by 3 or 4 bytes).

Did you look into using XLogReadAhead() rather than XLogReadRecord()?

> It's pretty much clear to the users with till_end_of_wal functions
> instead of cooking many things into the same functions with default
> values for input LSNs as NULL which also requires the functions to be
> "CALLED ON NULL INPUT" types which isn't good. This was also
> suggested
> by Andres, see [2], and I agree with it.

OK, it's a matter of taste I suppose. I don't have a strong opinion.

> > 2. Rename pg_get_wal_record_info -> pg_get_wal_record
> > 
> > 3. Rename pg_get_wal_records_info -> pg_get_wal_records
> 
> As these functions aren't returning the WAL record data, but info
> about it (decoded data), I would like to retain the function names
> as-is.

The name pg_get_wal_records_info bothers me slightly, but I don't have
a better suggestion.


One other thought: functions like pg_logical_emit_message() return an
LSN, but if you feed that into pg_walinspect you get the *next* record.
That makes sense because pg_logical_emit_message() returns the result
of XLogInsertRecord(), which is the end of the last inserted record.
But it can be slightly annoying/confusing. I don't have any particular
suggestion, but maybe it's worth a mention in the docs or something?

Regards,
    Jeff Davis





Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Apr 6, 2022 at 10:32 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2022-04-04 at 09:15 +0530, Bharath Rupireddy wrote:
> > My intention is to return the overall undecoded WAL record [5] i.e.
> > the data starting from XLogReadRecord's output [6] till length
> > XLogRecGetTotalLen(xlogreader);. Please see [7], where Andres agreed
> > to have this function, I also mentioned a possible use-case there.
>
> The current patch does not actually do this: it's returning a pointer
> into the DecodedXLogRecord struct, which doesn't have the raw bytes of
> the WAL record.
>
> To return the raw bytes of the record is not entirely trivial: it seems
> we have to look in the decoded record and either find a pointer into
> readBuf, or readRecordBuf, depending on whether the record crosses a
> boundary or not. If we find a good way to do this I'm fine keeping the
> function, but if not, we can leave it for v16.

With no immediate use of raw WAL data without a WAL record parsing
function, I'm dropping that function for now.

> > pg_get_wal_record_info returns the main data of the WAL record
> > (xl_heap_delete, xl_heap_insert, xl_heap_multi_insert, xl_heap_update
> > and so on).
>
> We also discussed just removing the main data from the output here.
> It's not terribly useful, and could be arbitrarily large. Similar to
> how we leave out the backup block data and images.

Done.

> > As identified in [1], SQL-version of stats function is way slower in
> > normal cases, hence it was agreed (by Andres, Kyotaro-san and myself)
> > to have a C-function for stats.a pointer into
>
> Now I agree. We should also have an equivalent of "pg_waldump --
> stats=record" though, too.

Added a parameter per_record (with default being false, emitting
per-rmgr stats) to pg_get_wal_stats and
pg_get_wal_stats_till_end_of_wal, when set returns per-record stats,
much like pg_waldump.

> > Yes, that's already part of the description column (much like
> > pg_waldump does) and the users can still do it with GROUP BY and
> > HAVING clauses, see [4].
>
> I still think an extra column for the results of rm_identify() would
> make sense. Not critical, but seems useful.

Added rm_identify as record_type column in pg_get_wal_record_info,
pg_get_wal_records_info, pg_get_wal_record_info_till_end_of_wal.
Removed the rm_identify from the description column as it's
unnecessary now here.

> > As mentioned in [1], read_local_xlog_page_no_wait required because
> > the
> > functions can still wait in read_local_xlog_page for WAL while
> > finding
> > the first valid record after the given input LSN (the use case is
> > simple - just provide input LSN closer to server's current flush LSN,
> > may be off by 3 or 4 bytes).
>
> Did you look into using XLogReadAhead() rather than XLogReadRecord()?

I don't think XLogReadAhead will help either, as it calls page_read
callback, XLogReadAhead->XLogDecodeNextRecord->ReadPageInternal->page_read->read_local_xlog_page
(which again waits for future WAL).

Per our internal discussion, I'm keeping the
read_local_xlog_page_no_wait as it offers a better solution without
much code duplication.

> The name pg_get_wal_records_info bothers me slightly, but I don't have
> a better suggestion.

IMO, pg_get_wal_records_info looks okay, hence didn't change it.

> One other thought: functions like pg_logical_emit_message() return an
> LSN, but if you feed that into pg_walinspect you get the *next* record.
> That makes sense because pg_logical_emit_message() returns the result
> of XLogInsertRecord(), which is the end of the last inserted record.
> But it can be slightly annoying/confusing. I don't have any particular
> suggestion, but maybe it's worth a mention in the docs or something?

Yes, all the pg_walinspect functions would find the next valid WAL
record  to the input/start LSN and start returning the details from
then.

IMO, the descriptions of these functions have already specified it:

pg_get_wal_record_info
      Gets WAL record information of a given LSN. If the given LSN isn't
      containing a valid WAL record, it gives the information of the next
      available valid WAL record. This function emits an error if a future (the

all other functions say this:
     Gets information/statistics of all the valid WAL records between/from

Attaching v17 patch-set with the above review comments addressed.
Please have a look at it.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Wed, Apr 6, 2022 at 2:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Attaching v17 patch-set with the above review comments addressed.
> Please have a look at it.

Had to rebase because of 5c279a6d350 (Custom WAL Resource Managers.).
Please find v18 patch-set.

Regards,
Bharath Rupireddy.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Amit Kapila
Дата:
On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>

I am facing the below doc build failure on my machine due to this work:

./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
Tabs appear in SGML/XML files
make: *** [check-tabs] Error 1

The attached patch fixes this for me.

-- 
With Regards,
Amit Kapila.

Вложения

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Bharath Rupireddy
Дата:
On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
>
> I am facing the below doc build failure on my machine due to this work:
>
> ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
> Tabs appear in SGML/XML files
> make: *** [check-tabs] Error 1
>
> The attached patch fixes this for me.

Thanks. It looks like there's a TAB in between. Your patch LGTM.

I'm wondering why this hasn't been caught in the build farm members
(or it may have been found but I'm missing to locate it.).

Can you please provide me with the doc build command to catch these
kinds of errors?

Regards,
Bharath Rupireddy.



Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

От
Amit Kapila
Дата:
On Mon, Apr 11, 2022 at 6:33 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> >
> > I am facing the below doc build failure on my machine due to this work:
> >
> > ./filelist.sgml:<!ENTITY pgwalinspect SYSTEM "pgwalinspect.sgml">
> > Tabs appear in SGML/XML files
> > make: *** [check-tabs] Error 1
> >
> > The attached patch fixes this for me.
>
> Thanks. It looks like there's a TAB in between. Your patch LGTM.
>
> I'm wondering why this hasn't been caught in the build farm members
> (or it may have been found but I'm missing to locate it.).
>
> Can you please provide me with the doc build command to catch these
> kinds of errors?
>

Nothing special. In the doc/src/sgml, I did make clean followed by make check.


-- 
With Regards,
Amit Kapila.