Re: refactoring basebackup.c

Поиск

Список

Период

Сортировка

От	Sumanta Mukherjee
Тема	Re: refactoring basebackup.c
Дата	12 мая 2020 г. 05:26:40
Msg-id	CAMSJAiradWb8ncm+83FngNA59Ss601cN5Me7AwVZeRFVWOqN9A@mail.gmail.com обсуждение исходный текст
Ответ на	Re: refactoring basebackup.c (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Robert,

Please see my comments inline below.

On Tue, May 12, 2020 at 12:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

One needs a balance between the number of packets to be sent across the network and so if the size

of reading from the disk and the network packet size could be unified then it might provide a better optimization.

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.

I agree here too, except that if we have too many tar files one might cringe

but sending the xtra amt from these tar files looks okay to me.

There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

One thing to remember here could be that an optimization would need to be made between the number of options

we provide and people coming back and saying which combinations do not work

For example, if a user script has "put all the tablespaces in a single archive" and later on somebody makes some

script changes to break it down at "256 GB" and there is a conflict then which one takes precedence needs to be chosen.

When the number of options like this become very large this could lead to some complications.

I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.

Here I support this.

I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.

--Sumanta Mukherjee

EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: refactoring basebackup.c