Re: A proposal for shared memory based backup infrastructure

Поиск
Список
Период
Сортировка
От mahendrakar s
Тема Re: A proposal for shared memory based backup infrastructure
Дата
Msg-id CABkiuWp0Qg5ectA91ubS=gTX_L8wMBbpFjtEfPZQnVswK6121w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A proposal for shared memory based backup infrastructure  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: A proposal for shared memory based backup infrastructure  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: A proposal for shared memory based backup infrastructure  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Bharath,

There might be security concerns if the backup started by one user can be stopped by another user.
This is because the user who stops the backup will get the backup_label or table space map file contents of other user.
Isn't this a concern for non-exclusive backup?

I think there should be role based control for backup related activity which can prevent other unprivileged users from stopping the backup. 

Thoughts?

Thanks,
Mahendrakar.


On Mon, 25 Jul 2022 at 12:00, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Mon, Jul 25, 2022 at 10:03 AM mahendrakar s
<mahendrakarforpg@gmail.com> wrote:
>
> Hi Bharath,

Thanks Mahendrakar for taking a look at the design.

> "Typically, step (3) takes a good amount of time in production
> environments with terabytes or petabytes scale of data and keeping the
> session alive from step (1) to (4) has overhead and it wastes the
> resources.  And the session can get closed for various reasons - idle
> in session timeout, tcp/ip keepalive timeout, network problems etc.
> All of these can render the backup useless."
>
> >> this could be a common scenario and needs to be addressed.

Hm. Additionally, the problem of keeping the session that starts the
backup open until the entire data directory is backed-up becomes more
worrisome if we were to run backups for a huge number of servers at
scale - the entity (control plane or whatever), that is responsible
for taking backups across huge fleet of postgres production servers,
will have tremendous amount of resources wasted and it's a problem for
that entity to keep the backup sessions active until the actual backup
is finished.

> "What if the backup started by a session can also be closed by another
> session? This seems to be achievable, if we can place the
> backup_label, tablespace_map and other required session/backend level
> contents in shared memory with the key as backup_label name. It's a
> long way to go."
>
> >>   I think storing metadata about backup of a session in shared memory may not work as it gets purged when the database goes for restart. We might require a separate catalogue table to handle the backup session.

Right now, the non-exclusive (and we don't have exclusive backups now
from postgres 15) backup will anyway become useless if the postgres
restarts, because there's no running backup state (backup_label,
tablespace_map contents) that's persisted.

Following are few more thoughts with the shared memory based backups
as proposed in this thread:

1) How many max backups do we want to allow? Right now, there's no
limit, I believe, max_connections number of concurrent backups can be
taken - we have XLogCtlInsert->runningBackups but no limit. If we were
to use shared memory to track the backup state, we might or might not
have to decide on max backup limit to not preallocate and consume
shared memory unnecessarily, otherwise, we could use something like
dynamic shared memory hash table for storing backup state.

2) How to deal with the backups that are started but no one is coming
to stop them? Basically, when to declare that the backup is dead or
expired? Perhaps, we can have a max time limit after which if no stop
backup is issued for a backup, which is then marked as dead or
expired.

We may or may not want to think on the above points for now until the
idea in general has some benefits over the current backup
infrastructure.

Regards,
Bharath Rupireddy.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Inconvenience of pg_read_binary_file()
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Introduce wait_for_subscription_sync for TAP tests