Обсуждение: Allow replication roles to use file access functions

Поиск
Список
Период
Сортировка

Allow replication roles to use file access functions

От
Michael Paquier
Дата:
Hi all,

As of now, file access functions in genfile.c can only be used by
superusers. This proposal is to relax those functions so as
replication users can use them as well. Here are the functions aimed
by this patch:
- pg_stat_file
- pg_read_binary_file
- pg_read_file
- pg_ls_dir
The main argument for this change is that pg_rewind makes use of those
functions, forcing users to use a superuser role when rewinding a
node. And with this patch, we could allow replication roles to do the
same. Another argument in favor of this change is to allow replication
users to dump directly the contents of PGDATA via SQL, though I don't
believe that there are many people doing so these days.

Also, replication roles can already have an access to the contents of
PGDATA by taking a base backup for example, so this change looks
logical to me, even if we filter out some files in a base backup,
though I could not find any arguments to not let a replication user
have a look at them via those functions. A patch is attached, I am
adding it as well to the next CF.
Regards,
--
Michael

Вложения

Re: Allow replication roles to use file access functions

От
Fujii Masao
Дата:
On Thu, Aug 27, 2015 at 10:50 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hi all,
>
> As of now, file access functions in genfile.c can only be used by
> superusers. This proposal is to relax those functions so as
> replication users can use them as well. Here are the functions aimed
> by this patch:
> - pg_stat_file
> - pg_read_binary_file
> - pg_read_file
> - pg_ls_dir
> The main argument for this change is that pg_rewind makes use of those
> functions, forcing users to use a superuser role when rewinding a
> node. And with this patch, we could allow replication roles to do the
> same. Another argument in favor of this change is to allow replication
> users to dump directly the contents of PGDATA via SQL, though I don't
> believe that there are many people doing so these days.
>
> Also, replication roles can already have an access to the contents of
> PGDATA by taking a base backup for example, so this change looks
> logical to me, even if we filter out some files in a base backup,
> though I could not find any arguments to not let a replication user
> have a look at them via those functions. A patch is attached, I am
> adding it as well to the next CF.

+1

Did you confirm that replication user can complete pg_rewind
after this patch is applied?

Regards,

-- 
Fujii Masao



Re: Allow replication roles to use file access functions

От
Alvaro Herrera
Дата:
Michael Paquier wrote:
> Hi all,
> 
> As of now, file access functions in genfile.c can only be used by
> superusers. This proposal is to relax those functions so as
> replication users can use them as well. Here are the functions aimed
> by this patch:
> - pg_stat_file
> - pg_read_binary_file
> - pg_read_file
> - pg_ls_dir
> The main argument for this change is that pg_rewind makes use of those
> functions, forcing users to use a superuser role when rewinding a
> node.

Can this piggyback on Stephen Frost's proposed patch for reserved roles et al?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Wed, Sep 2, 2015 at 11:32 PM, Fujii Masao wrote:
> Did you confirm that replication user can complete pg_rewind
> after this patch is applied?

Yes.
-- 
Michael



Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> Hi all,
>>
>> As of now, file access functions in genfile.c can only be used by
>> superusers. This proposal is to relax those functions so as
>> replication users can use them as well. Here are the functions aimed
>> by this patch:
>> - pg_stat_file
>> - pg_read_binary_file
>> - pg_read_file
>> - pg_ls_dir
>> The main argument for this change is that pg_rewind makes use of those
>> functions, forcing users to use a superuser role when rewinding a
>> node.
>
> Can this piggyback on Stephen Frost's proposed patch for reserved roles et al?

(Adding Stephen in the loop)
I thought that this was rather independent on Stephen's default role
patch because this is not based on a new role permission type.
Stephen, do you think so, or should we merge the effort with your
things?
-- 
Michael



Re: Allow replication roles to use file access functions

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Michael Paquier wrote:
>>> The main argument for this change is that pg_rewind makes use of those
>>> functions, forcing users to use a superuser role when rewinding a
>>> node.

>> Can this piggyback on Stephen Frost's proposed patch for reserved roles et al?

> (Adding Stephen in the loop)
> I thought that this was rather independent on Stephen's default role
> patch because this is not based on a new role permission type.
> Stephen, do you think so, or should we merge the effort with your
> things?

Just on general principles, this seems like a pretty horrid idea.
To me replication privilege means the ability to transfer data out of
the master, not to cause arbitrary state changes on the master.
Therefore, "allow replication users to run pg_rewind" seems less like
a feature and more like a security bug.  Am I missing something?

Another problem with it is that access to the filesystem is about halfway
to being superuser anyway, even if it's only read-only access.  It would
certainly let you read things that one would not expect a replication
user to be able to read (like files unrelated to Postgres).

I don't entirely understand why pg_rewind should be invoking any of these
functions to begin with.  Isn't it working directly with the disk data,
with both postmasters shut down?
        regards, tom lane



Re: Allow replication roles to use file access functions

От
Andres Freund
Дата:
On 2015-09-02 19:48:15 -0400, Tom Lane wrote:
> Just on general principles, this seems like a pretty horrid idea.
> To me replication privilege means the ability to transfer data out of
> the master, not to cause arbitrary state changes on the master.

It's not about the permission to trigger pg_rewind on the master - it's
about being able to run pg_rewind (as the necessary OS user) on the
*standby* when the connection to the primary has only replication rather
than superuser privs.

> Another problem with it is that access to the filesystem is about halfway
> to being superuser anyway, even if it's only read-only access.  It would
> certainly let you read things that one would not expect a replication
> user to be able to read (like files unrelated to Postgres).

Doesn't pg_read_file et al insist that the path is below the data
directorY?

> I don't entirely understand why pg_rewind should be invoking any of these
> functions to begin with.  Isn't it working directly with the disk data,
> with both postmasters shut down?

There's two modes: If both data directories are on the same server both
need to be shut down. If, as it's pretty commonly the case, the "source"
db is on another system the source system must be started (as a
primary). In the second mode all the files that have changed are copied
over a libpq connection.

Andres



Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 8:59 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-09-02 19:48:15 -0400, Tom Lane wrote:
>> Just on general principles, this seems like a pretty horrid idea.
>> To me replication privilege means the ability to transfer data out of
>> the master, not to cause arbitrary state changes on the master.
>
> It's not about the permission to trigger pg_rewind on the master - it's
> about being able to run pg_rewind (as the necessary OS user) on the
> *standby* when the connection to the primary has only replication rather
> than superuser privs.

Yeah, I got poked by this limitation of pg_rewind some time ago
internally actually, folks willing to be able to manage their cluster
only with a replication role, and they were not really willing to have
a superuser for such operations being used across the network.
-- 
Michael



Re: Allow replication roles to use file access functions

От
Stephen Frost
Дата:
Micahel,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Michael Paquier wrote:
> >> As of now, file access functions in genfile.c can only be used by
> >> superusers. This proposal is to relax those functions so as
> >> replication users can use them as well. Here are the functions aimed
> >> by this patch:
> >> - pg_stat_file
> >> - pg_read_binary_file
> >> - pg_read_file
> >> - pg_ls_dir
> >> The main argument for this change is that pg_rewind makes use of those
> >> functions, forcing users to use a superuser role when rewinding a
> >> node.
> >
> > Can this piggyback on Stephen Frost's proposed patch for reserved roles et al?
>
> (Adding Stephen in the loop)

Thanks!

> I thought that this was rather independent on Stephen's default role
> patch because this is not based on a new role permission type.
> Stephen, do you think so, or should we merge the effort with your
> things?

It's not directly related to the default roles as it's adding
permissions to an existing role attribute control, but it's certainly
covering related territory and I'd like to make sure we're careful with
any increase in permissions for existing role attributes.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
> > On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >> Michael Paquier wrote:
> >>> The main argument for this change is that pg_rewind makes use of those
> >>> functions, forcing users to use a superuser role when rewinding a
> >>> node.
>
> >> Can this piggyback on Stephen Frost's proposed patch for reserved roles et al?
>
> > (Adding Stephen in the loop)
> > I thought that this was rather independent on Stephen's default role
> > patch because this is not based on a new role permission type.
> > Stephen, do you think so, or should we merge the effort with your
> > things?
>
> Just on general principles, this seems like a pretty horrid idea.

Agreed- it's far broader than what is necessary, as I understand it.

> To me replication privilege means the ability to transfer data out of
> the master, not to cause arbitrary state changes on the master.

Agreed, but as Andres points out, this doesn't allow state changes on
the master.

> Therefore, "allow replication users to run pg_rewind" seems less like
> a feature and more like a security bug.  Am I missing something?

I believe Andres clarified this- this is about remastering an old master
to follow a *new* master, and at this point, the old master needs
information from the new master to reset the old master system back to a
point where the old master can turn into a replica of the new master.

> Another problem with it is that access to the filesystem is about halfway
> to being superuser anyway, even if it's only read-only access.  It would
> certainly let you read things that one would not expect a replication
> user to be able to read (like files unrelated to Postgres).

The replication role already has read-only access to everything
(nearly?) in the PGDATA directory.  The specific issue in this case, I
believe, is if the functions mentioned provide access beyond what the
replication role already has access to.

> I don't entirely understand why pg_rewind should be invoking any of these
> functions to begin with.  Isn't it working directly with the disk data,
> with both postmasters shut down?

Certain information is needed from the new master to rewind the old
master.  What I would hope is that we'd provide a way for *just* that
information to be made available to the replication role rather than
completely general access through the functions mentioned.

* Andres Freund (andres@anarazel.de) wrote:
> On 2015-09-02 19:48:15 -0400, Tom Lane wrote:
> > Just on general principles, this seems like a pretty horrid idea.
> > To me replication privilege means the ability to transfer data out of
> > the master, not to cause arbitrary state changes on the master.
>
> It's not about the permission to trigger pg_rewind on the master - it's
> about being able to run pg_rewind (as the necessary OS user) on the
> *standby* when the connection to the primary has only replication rather
> than superuser privs.

Understood.

> > Another problem with it is that access to the filesystem is about halfway
> > to being superuser anyway, even if it's only read-only access.  It would
> > certainly let you read things that one would not expect a replication
> > user to be able to read (like files unrelated to Postgres).
>
> Doesn't pg_read_file et al insist that the path is below the data
> directorY?

I don't believe that's the case, actually..  I might be wrong though,
I'm not looking at the code right now.  This is definitely a big part of
the question, but I'd like to ask- what, exactly, does pg_rewind
actually need access to?  Is it only the WAL, or are heap and WAL files
needed?  How much of that access could be reasonably needed as part of
the replication protocol for other purposes?

Consider the discussion about delta backups, et al, using LSNs.  Perhaps
the replication protocol should be extended to allow access to arbitrary
WAL, querying what WAL is available, and access to the heap files,
perhaps even specific pages in the heap files and relation forks,
instead of giving pg_rewind access to these extremely general
nearly-OS-user-level functions.

* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 8:59 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-09-02 19:48:15 -0400, Tom Lane wrote:
> >> Just on general principles, this seems like a pretty horrid idea.
> >> To me replication privilege means the ability to transfer data out of
> >> the master, not to cause arbitrary state changes on the master.
> >
> > It's not about the permission to trigger pg_rewind on the master - it's
> > about being able to run pg_rewind (as the necessary OS user) on the
> > *standby* when the connection to the primary has only replication rather
> > than superuser privs.
>
> Yeah, I got poked by this limitation of pg_rewind some time ago
> internally actually, folks willing to be able to manage their cluster
> only with a replication role, and they were not really willing to have
> a superuser for such operations being used across the network.

What I believe Tom is, rightly, worried about is exactly this case.
That is, users believe that the replication role is less powerful and
less dangerous than the superuser role and we should be careful to
maintain that.  If we're unable or unwilling to do so, we should
eliminate the replication role as being redundant to superuser.

I don't believe that's the case here, but I do view giving access to the
replication role for these functions as going farther than we need to,
or should, go.

Further, for my two cents, I view any access to the system by a
replication role to a non-replication-protocol communication with the
server as being inappropriate.  To that point, I'd actually remove the
ability for the replication roles to call pg_start/stop_backup as normal
SQL calls since that implies the replication user has connected to a
normal backend instead of talking the replication protocol.  I'm not
sure if we can really put that cat back in the bag, but I'd be a lot
happier if we could.  Allowing access to these functions outside of the
replication protocol doubles-down on that bad idea, in my view.
Thanks!
    Stephen

Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 10:05 AM, Stephen Frost wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera wrote:
> The replication role already has read-only access to everything
> (nearly?) in the PGDATA directory.  The specific issue in this case, I
> believe, is if the functions mentioned provide access beyond what the
> replication role already has access to.

It seems to me that this difference of files is symbolized by what is
filtered out in a base backup, like postmaster.opts or postmaster.pid.
The matter here is: are we going to have in the future files that we
do not want to include in a base backup that are superuser-only? Today
I cannot recall we have any of that, we may have in the future though.

>> I don't entirely understand why pg_rewind should be invoking any of these
>> functions to begin with.  Isn't it working directly with the disk data,
>> with both postmasters shut down?
>
> Certain information is needed from the new master to rewind the old
> master.  What I would hope is that we'd provide a way for *just* that
> information to be made available to the replication role rather than
> completely general access through the functions mentioned.

The rewound node does not only need the diffs of blocks from a source,
but also some more information like clog. There are actually three
different approaches to solve that:
1) Use a differential backup to me, or the possibility to fetch a set
of data block diffs from a source node using an LSN and then re-apply
them on the target node. The major disadvantage of this approach is
actually performance: we would need to scan the source node
completely, so that's slow. And pg_rewind is fast because it only
scans the WAL records of the node to be rewound from the last
checkpoint before WAL forked.
2) Request data blocks from the source node using the replication
protocol, that's what pg_read_binary_file actually does, we just don't
have the logic on replication protocol side, though we could.
3) Create a new set of functions similar to the existing file access
functions that are usable by the replication user, except that they
refuse to return file entries that match the existing filters in
basebackup.c. That is doable, with a routine in basebackup.c that
decides if a given file string can be read or not. Base backups could
use this routine as well.
I just came up with 3), which looks rather appealing to me.

>> > Another problem with it is that access to the filesystem is about halfway
>> > to being superuser anyway, even if it's only read-only access.  It would
>> > certainly let you read things that one would not expect a replication
>> > user to be able to read (like files unrelated to Postgres).
>>
>> Doesn't pg_read_file et al insist that the path is below the data
>> directorY?
>
> I don't believe that's the case, actually..  I might be wrong though,
> I'm not looking at the code right now.

The use of PGDATA is enforced
=# select pg_read_file('/etc/passwd');
ERROR:  42501: absolute path not allowed
LOCATION:  convert_and_check_filename, genfile.c:73
If PGDATA includes soft links it is possible to point to those places
though, but that's not something anybody sane would do, still it can
be done and I've seen worse:
$ cd $PGDATA && ln -s /etc etc
$ psql -c 'select pg_read_file('etc/passwd')'
# Adult content, not an ERROR
From this perspective allowing a replication user to have access to
those functions is dangerous, a superuser being the equivalent of an
all-mightly god on a server, still that's not the case of the
replication user.

> This is definitely a big part of
> the question, but I'd like to ask- what, exactly, does pg_rewind
> actually need access to? Is it only the WAL, or are heap and WAL files
> needed?

Not only, +clog, configuration files, etc.

> Consider the discussion about delta backups, et al, using LSNs.  Perhaps
> the replication protocol should be extended to allow access to arbitrary
> WAL, querying what WAL is available, and access to the heap files,
> perhaps even specific pages in the heap files and relation forks,
> instead of giving pg_rewind access to these extremely general
> nearly-OS-user-level functions.

The problem when using differential backups in this case is
performance as mentioned above. We would need to scan the whole target
cluster, which may take time, the current approach of pg_rewind only
needs to scan WAL records to find the list of blocks modified, and
directly requests them from the source. I would expect pg_rewind to be
as quick as possible.

> Further, for my two cents, I view any access to the system by a
> replication role to a non-replication-protocol communication with the
> server as being inappropriate.  To that point, I'd actually remove the
> ability for the replication roles to call pg_start/stop_backup as normal
> SQL calls since that implies the replication user has connected to a
> normal backend instead of talking the replication protocol.  I'm not
> sure if we can really put that cat back in the bag, but I'd be a lot
> happier if we could.  Allowing access to these functions outside of the
> replication protocol doubles-down on that bad idea, in my view.

That would break many existing applications, but I definitely see the
rightful concern here: basically replication roles should just use the
replication protocol and have no access to directly to SQL.
-- 
Michael



Re: Allow replication roles to use file access functions

От
Stephen Frost
Дата:
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 10:05 AM, Stephen Frost wrote:
> > * Michael Paquier (michael.paquier@gmail.com) wrote:
> >> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera wrote:
> > The replication role already has read-only access to everything
> > (nearly?) in the PGDATA directory.  The specific issue in this case, I
> > believe, is if the functions mentioned provide access beyond what the
> > replication role already has access to.
>
> It seems to me that this difference of files is symbolized by what is
> filtered out in a base backup, like postmaster.opts or postmaster.pid.

Neither of those strike me as particularly sensitive information..

> The matter here is: are we going to have in the future files that we
> do not want to include in a base backup that are superuser-only? Today
> I cannot recall we have any of that, we may have in the future though.

This also seems unliekly.

> >> I don't entirely understand why pg_rewind should be invoking any of these
> >> functions to begin with.  Isn't it working directly with the disk data,
> >> with both postmasters shut down?
> >
> > Certain information is needed from the new master to rewind the old
> > master.  What I would hope is that we'd provide a way for *just* that
> > information to be made available to the replication role rather than
> > completely general access through the functions mentioned.
>
> The rewound node does not only need the diffs of blocks from a source,
> but also some more information like clog. There are actually three
> different approaches to solve that:

Ok, so CLOG information is also required, but that's not a particularly
large issue and the replication role has access to those files already,
no?  It's just inconvenient to access them using the current protocol,
if that's all you're interested in.

> 1) Use a differential backup to me, or the possibility to fetch a set
> of data block diffs from a source node using an LSN and then re-apply
> them on the target node. The major disadvantage of this approach is
> actually performance: we would need to scan the source node
> completely, so that's slow. And pg_rewind is fast because it only
> scans the WAL records of the node to be rewound from the last
> checkpoint before WAL forked.

I don't follow this performance concern at all.  Why can't pg_rewind
look through the WAL and find what it needs, and then request exactly
that information?  Clearly, we would need to add to the existing
replication protocol, but I don't see any reason to not consider that a
perfectly reasonable approach for this.

> 2) Request data blocks from the source node using the replication
> protocol, that's what pg_read_binary_file actually does, we just don't
> have the logic on replication protocol side, though we could.

Right, we would need to modify the replication protocol to allow such
requests, but that's not particularly difficult.

> 3) Create a new set of functions similar to the existing file access
> functions that are usable by the replication user, except that they
> refuse to return file entries that match the existing filters in
> basebackup.c. That is doable, with a routine in basebackup.c that
> decides if a given file string can be read or not. Base backups could
> use this routine as well.

I don't particularly like this approach as it implies SQL access for the
replication role.

> >> > Another problem with it is that access to the filesystem is about halfway
> >> > to being superuser anyway, even if it's only read-only access.  It would
> >> > certainly let you read things that one would not expect a replication
> >> > user to be able to read (like files unrelated to Postgres).
> >>
> >> Doesn't pg_read_file et al insist that the path is below the data
> >> directorY?
> >
> > I don't believe that's the case, actually..  I might be wrong though,
> > I'm not looking at the code right now.
>
> The use of PGDATA is enforced

We know that there are a lot of potential risks around enforcing this,
or similar, requirements.  Right now, we don't have to worry about any
of those corner cases, but we would if we used this approach.  If,
instead, the replication protocol is modified to accept requests for the
specific information (CLOG xxxx), we could be sure to only ever return
exactly that information from the current cluster, or throw an error.

> > This is definitely a big part of
> > the question, but I'd like to ask- what, exactly, does pg_rewind
> > actually need access to? Is it only the WAL, or are heap and WAL files
> > needed?
>
> Not only, +clog, configuration files, etc.

Configuration files?  Perhaps you could elaborate?

> > Consider the discussion about delta backups, et al, using LSNs.  Perhaps
> > the replication protocol should be extended to allow access to arbitrary
> > WAL, querying what WAL is available, and access to the heap files,
> > perhaps even specific pages in the heap files and relation forks,
> > instead of giving pg_rewind access to these extremely general
> > nearly-OS-user-level functions.
>
> The problem when using differential backups in this case is
> performance as mentioned above. We would need to scan the whole target
> cluster, which may take time, the current approach of pg_rewind only
> needs to scan WAL records to find the list of blocks modified, and
> directly requests them from the source. I would expect pg_rewind to be
> as quick as possible.

I don't follow why the current approach of pg_rewind would have to
change.  All I'm suggesting is that we have a different way, one which
is much more restricted, for pg_rewind to request exactly the
information it needs for performant operation.

> > Further, for my two cents, I view any access to the system by a
> > replication role to a non-replication-protocol communication with the
> > server as being inappropriate.  To that point, I'd actually remove the
> > ability for the replication roles to call pg_start/stop_backup as normal
> > SQL calls since that implies the replication user has connected to a
> > normal backend instead of talking the replication protocol.  I'm not
> > sure if we can really put that cat back in the bag, but I'd be a lot
> > happier if we could.  Allowing access to these functions outside of the
> > replication protocol doubles-down on that bad idea, in my view.
>
> That would break many existing applications, but I definitely see the
> rightful concern here: basically replication roles should just use the
> replication protocol and have no access to directly to SQL.

Indeed.
Thanks!
    Stephen

Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 11:20 AM, Stephen Frost wrote:
> * Michael Paquier wrote:
>> 1) Use a differential backup to me, or the possibility to fetch a set
>> of data block diffs from a source node using an LSN and then re-apply
>> them on the target node. The major disadvantage of this approach is
>> actually performance: we would need to scan the source node
>> completely, so that's slow. And pg_rewind is fast because it only
>> scans the WAL records of the node to be rewound from the last
>> checkpoint before WAL forked.
>
> I don't follow this performance concern at all.  Why can't pg_rewind
> look through the WAL and find what it needs, and then request exactly
> that information?  Clearly, we would need to add to the existing
> replication protocol, but I don't see any reason to not consider that a
> perfectly reasonable approach for this.

See below, visibly I misunderstood what you meant.

>> 2) Request data blocks from the source node using the replication
>> protocol, that's what pg_read_binary_file actually does, we just don't
>> have the logic on replication protocol side, though we could.
>
> Right, we would need to modify the replication protocol to allow such
> requests, but that's not particularly difficult.

Check.

>> 3) Create a new set of functions similar to the existing file access
>> functions that are usable by the replication user, except that they
>> refuse to return file entries that match the existing filters in
>> basebackup.c. That is doable, with a routine in basebackup.c that
>> decides if a given file string can be read or not. Base backups could
>> use this routine as well.
>
> I don't particularly like this approach as it implies SQL access for the
> replication role.

Check.

>> > This is definitely a big part of
>> > the question, but I'd like to ask- what, exactly, does pg_rewind
>> > actually need access to? Is it only the WAL, or are heap and WAL files
>> > needed?
>>
>> Not only, +clog, configuration files, etc.
>
> Configuration files?  Perhaps you could elaborate?

Sure. Sorry for being unclear. It copies everything that is not a
relation file, a kind of base backup without the relation files then.

>> > Consider the discussion about delta backups, et al, using LSNs.  Perhaps
>> > the replication protocol should be extended to allow access to arbitrary
>> > WAL, querying what WAL is available, and access to the heap files,
>> > perhaps even specific pages in the heap files and relation forks,
>> > instead of giving pg_rewind access to these extremely general
>> > nearly-OS-user-level functions.
>>
>> The problem when using differential backups in this case is
>> performance as mentioned above. We would need to scan the whole target
>> cluster, which may take time, the current approach of pg_rewind only
>> needs to scan WAL records to find the list of blocks modified, and
>> directly requests them from the source. I would expect pg_rewind to be
>> as quick as possible.
>
> I don't follow why the current approach of pg_rewind would have to
> change.  All I'm suggesting is that we have a different way, one which
> is much more restricted, for pg_rewind to request exactly the
> information it needs for efficient operation.

Ah, OK. I thought that you were referring to a protocol where caller
sends a single LSN from which it gets a differential backup that needs
to scan all the relation files of the source cluster to get the data
blocks with an LSN newer than the one sent, and then sends them back
to the caller.

I guess that what you are suggesting instead is an approach where
caller sends something like that through the replication protocol with
a relation OID and a block list:
BLOCK_DIFF relation_oid BLOCK_LIST m,n,[o, ...]
Which is close to what pg_read_binary_file does now for a superuser.
We would need as well to extend BASE_BACKUP so as it does not include
relation files though for this use case.

Regards,
-- 
Michael



Re: Allow replication roles to use file access functions

От
Stephen Frost
Дата:
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 11:20 AM, Stephen Frost wrote:
> >> Not only, +clog, configuration files, etc.
> >
> > Configuration files?  Perhaps you could elaborate?
>
> Sure. Sorry for being unclear. It copies everything that is not a
> relation file, a kind of base backup without the relation files then.

How does that work on systems where the configuration files aren't
stored under PGDATA (Debian and derivatives, at least)?  I guess I don't
quite see why it's necessary for pg_rewind to copy the configuration
files in the first place, it doesn't have the same role as
pg_basebackup, at least as I understand it.

> >> The problem when using differential backups in this case is
> >> performance as mentioned above. We would need to scan the whole target
> >> cluster, which may take time, the current approach of pg_rewind only
> >> needs to scan WAL records to find the list of blocks modified, and
> >> directly requests them from the source. I would expect pg_rewind to be
> >> as quick as possible.
> >
> > I don't follow why the current approach of pg_rewind would have to
> > change.  All I'm suggesting is that we have a different way, one which
> > is much more restricted, for pg_rewind to request exactly the
> > information it needs for efficient operation.
>
> Ah, OK. I thought that you were referring to a protocol where caller
> sends a single LSN from which it gets a differential backup that needs
> to scan all the relation files of the source cluster to get the data
> blocks with an LSN newer than the one sent, and then sends them back
> to the caller.

No, apologies, I was simply pointing out that we might want this kind of
a capability at the protocol level to support other replication protocol
clients.

> I guess that what you are suggesting instead is an approach where
> caller sends something like that through the replication protocol with
> a relation OID and a block list:
> BLOCK_DIFF relation_oid BLOCK_LIST m,n,[o, ...]

Right, something along those lines is what I had been thinking.  We
would probably need to provide independent commands for the different
file types, with the parameters expressed in terms appropriate for each
kind of file (block numbers for heap, XIDs for WAL and CLOG?).
Essentially, whatever API would be both simple for pg_rewind and general
enough to be useful for other clients in the future.  At least, I
imagine that pg_rewind would be a bit simpler if it could communicate
with the backend in the 'language of PG' rather than having to specify
file names and paths.

Other clients that might find such an interface useful are incremental
pg_basebackup or possibly parallel pg_basebackup.

> Which is close to what pg_read_binary_file does now for a superuser.

I really don't see them as being all that close.  Further, I worry a bit
that users would abuse the replication role to grant access to these
functions for non-superusers to be able to access non-PG files (but ones
which happen to be under PGDATA, or through a symlink are somewhere
else..).

> We would need as well to extend BASE_BACKUP so as it does not include
> relation files though for this use case.

... huh?  I'm not following this comment at all.  We might need to
provide explicit start/stop backup commands and/or extend BASE_BACKUP
for things like parallel pg_basebackup, but I'm not following why we
would need to change it for pg_rewind.  Further BASE_BACKUP clearly does
include relation files today..

Thanks!

Stephen

Re: Allow replication roles to use file access functions

От
Michael Paquier
Дата:
On Thu, Sep 3, 2015 at 9:53 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Thu, Sep 3, 2015 at 11:20 AM, Stephen Frost wrote:
>> >> Not only, +clog, configuration files, etc.
>> >
>> > Configuration files?  Perhaps you could elaborate?
>>
>> Sure. Sorry for being unclear. It copies everything that is not a
>> relation file, a kind of base backup without the relation files then.
>
> How does that work on systems where the configuration files aren't
> stored under PGDATA (Debian and derivatives, at least)?

When a file is out of PGDATA, it is not fetched. Symlinks in PGDATA
have their contents fetched as well if I recall correctly.

> I guess I don't
> quite see why it's necessary for pg_rewind to copy the configuration
> files in the first place, it doesn't have the same role as
> pg_basebackup, at least as I understand it.

Of course, that's not mandatory to fetch them. It is as well not worth
the complication to apply a filter to not fetch a portion of the
files, and I think that's why Heikki took the approach to fetch
everything in PGDATA (except relation files) because that was just
more simple to implement as such for little gain.

>> I guess that what you are suggesting instead is an approach where
>> caller sends something like that through the replication protocol with
>> a relation OID and a block list:
>> BLOCK_DIFF relation_oid BLOCK_LIST m,n,[o, ...]
>
> Right, something along those lines is what I had been thinking.  We
> would probably need to provide independent commands for the different
> file types, with the parameters expressed in terms appropriate for each
> kind of file (block numbers for heap, XIDs for WAL and CLOG?).
> Essentially, whatever API would be both simple for pg_rewind and general
> enough to be useful for other clients in the future.  At least, I
> imagine that pg_rewind would be a bit simpler if it could communicate
> with the backend in the 'language of PG' rather than having to specify
> file names and paths.

I guess that makes sense if we want to remove the superuser-only
barrier, still this would require to invent new commands each time a
new file type is added as this new type of file may be needed as well
in the rewound node (imagine a pg_clog2 for example). I would rather
take the other approach by applying an exclude list in an existing
command, and not an include list or a new set of commands.

> Other clients that might find such an interface useful are incremental
> pg_basebackup or possibly parallel pg_basebackup.

Possible.

>> We would need as well to extend BASE_BACKUP so as it does not include
>> relation files though for this use case.
>
> ... huh?  I'm not following this comment at all.  We might need to
> provide explicit start/stop backup commands and/or extend BASE_BACKUP
> for things like parallel pg_basebackup, but I'm not following why we
> would need to change it for pg_rewind.  Further BASE_BACKUP clearly does
> include relation files today..

The whole point of pg_rewind is not to have to fetch relation files,
so my idea would be basically to extend a bit BASE_BACKUP so as it
accepts a set of regex expressions aimed at filtering files to not
include in a base backup as requested by the client.

Still, for now it seems that this patch is taking the wrong approach
and that the general consensus would be to use the replication
protocol instead, so I am marking this patch as returned with
feedback.
Thanks!
-- 
Michael



Re: Allow replication roles to use file access functions

От
Heikki Linnakangas
Дата:
On 09/04/2015 08:14 AM, Michael Paquier wrote:
> On Thu, Sep 3, 2015 at 9:53 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Michael Paquier (michael.paquier@gmail.com) wrote:
>>> On Thu, Sep 3, 2015 at 11:20 AM, Stephen Frost wrote:
>>>>> Not only, +clog, configuration files, etc.
>>>>
>>>> Configuration files?  Perhaps you could elaborate?
>>>
>>> Sure. Sorry for being unclear. It copies everything that is not a
>>> relation file, a kind of base backup without the relation files then.
>>
>> How does that work on systems where the configuration files aren't
>> stored under PGDATA (Debian and derivatives, at least)?
>
> When a file is out of PGDATA, it is not fetched. Symlinks in PGDATA
> have their contents fetched as well if I recall correctly.
>
>> I guess I don't
>> quite see why it's necessary for pg_rewind to copy the configuration
>> files in the first place, it doesn't have the same role as
>> pg_basebackup, at least as I understand it.
>
> Of course, that's not mandatory to fetch them. It is as well not worth
> the complication to apply a filter to not fetch a portion of the
> files, and I think that's why Heikki took the approach to fetch
> everything in PGDATA (except relation files) because that was just
> more simple to implement as such for little gain.

It's also simpler to explain and reason about. The current behaviour of 
pg_rewind is that the end-result is basically the same as completely 
copying over the data directory. If we start to add smarts on what to 
copy and what not, it gets a lot more complicated. Which configuration 
files to copy, and which not? (Think postgresql.auto.conf...)

If you want to preserve a file, you can copy it elsewhere first, and 
copy it back after running pg_rewind. Just as you would with "cp" or 
"rsync" (well, with rsync I guess you could pass a command-line switch 
to ignore some files). That might not be perfect, but it's a problem 
you'll have to deal with if you're not using pg_rewind anyway.

- Heikki




Re: Allow replication roles to use file access functions

От
Stephen Frost
Дата:
Heikki,

* Heikki Linnakangas (hlinnaka@iki.fi) wrote:
> On 09/04/2015 08:14 AM, Michael Paquier wrote:
> >Of course, that's not mandatory to fetch them. It is as well not worth
> >the complication to apply a filter to not fetch a portion of the
> >files, and I think that's why Heikki took the approach to fetch
> >everything in PGDATA (except relation files) because that was just
> >more simple to implement as such for little gain.
>
> It's also simpler to explain and reason about. The current behaviour
> of pg_rewind is that the end-result is basically the same as
> completely copying over the data directory. If we start to add
> smarts on what to copy and what not, it gets a lot more complicated.
> Which configuration files to copy, and which not? (Think
> postgresql.auto.conf...)

I've always felt we had a well defined set of things which PG is allowed
and expected to modify vs. what it shouldn't be messing with.  Namely,
config files (pg_hba.conf, postgresql.conf, etc) are not things which
are modified by PostgreSQL, and is why they often live outside of
$PGDATA, vs. clog, xlog, the heap, postgresql.auto.conf, etc, which are
all very clearly under PG's control.

> If you want to preserve a file, you can copy it elsewhere first, and
> copy it back after running pg_rewind. Just as you would with "cp" or
> "rsync" (well, with rsync I guess you could pass a command-line
> switch to ignore some files). That might not be perfect, but it's a
> problem you'll have to deal with if you're not using pg_rewind
> anyway.

That's only true if the configs exist in $PGDATA, which they often
don't.  What about things like pg_log?  Does pg_rewind copy every log
file from the new-master back to the old-master?  That strikes me as
useless at best and a terrible idea at worst since it's likely to blow
away old log files.

I had expected pg_rewind to concern itself with exactly what is under
PG's perview to mess with.  That includes postgresql.auto.conf, but not
pg_log or postgresql.conf.

Perhaps it's too late to change that but I'm not thrilled with it.

As for this discussion, if we're going to make pg_rewind a magic rsync,
I'd still prefer for it to work through the replication protocol instead
of directly giving users who have the 'replication' attribute access to
call the SQL-level functions for opening/reading files on the
filesystem.  If that's objectionable, then I'd suggest we come up with a
new/different way of giving access to those functions instead and tell
users to use that for their pg_rewind user, but I do think we'll need
replication protocol capabilities along those lines since it might very
well be simpler to work with (what happens if there's a >1G file?) and
we might want that for parallel pg_basebackup anyway.

One thought that I just had would be to have a default 'pg_rewind' role
which has exactly the access needed for pg_rewind to do its job, which
would simplify things for our users, I'd think.

I'll add that to the proposed set of roles in the default roles patch.

Thanks!

Stephen

Re: Allow replication roles to use file access functions

От
Artur Zakirov
Дата:
On 03.09.2015 05:40, Michael Paquier wrote:
>
> Ah, OK. I thought that you were referring to a protocol where caller
> sends a single LSN from which it gets a differential backup that needs
> to scan all the relation files of the source cluster to get the data
> blocks with an LSN newer than the one sent, and then sends them back
> to the caller.
>
> I guess that what you are suggesting instead is an approach where
> caller sends something like that through the replication protocol with
> a relation OID and a block list:
> BLOCK_DIFF relation_oid BLOCK_LIST m,n,[o, ...]
> Which is close to what pg_read_binary_file does now for a superuser.
> We would need as well to extend BASE_BACKUP so as it does not include
> relation files though for this use case.
>
> Regards,
>

Hi,

we need to run pg_rewind without using a superuser role too. What if the 
new parameter EXCLUDE_DATA_FILES will be added to the BASE_BACKUP 
command? This parameter will force the BASE_BACKUP command to not 
include relation files.

And pg_rewind can execute, for example, the following command:

BASE_BACKUP LABEL 'pg_rewind base backup' WAL EXCLUDE_DATA_FILES

This command will be executed if --source-server parameter is defined. 
Are there any pitfalls in this condition?

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company