Re: PATCH: Exclude unlogged tables from base backups

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: PATCH: Exclude unlogged tables from base backups
Дата
Msg-id CAD21AoD-qfQrbqQK7_dorpf1ev_yaju6msiE1o9G4CYfGMB08Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: Exclude unlogged tables from base backups  (David Steele <david@pgmasters.net>)
Ответы Re: PATCH: Exclude unlogged tables from base backups
Список pgsql-hackers
On Thu, Jan 25, 2018 at 3:25 AM, David Steele <david@pgmasters.net> wrote:
> Hi Masahiko,
>
> Thanks for the review!
>
> On 1/22/18 3:14 AM, Masahiko Sawada wrote:
>> On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> We would also have a problem if the missing file caused something in
>>> recovery to croak on the grounds that the file was expected to be
>>> there, but I don't think anything works that way; I think we just
>>> assume missing files are an expected failure mode and silently do
>>> nothing if asked to remove them.
>>
>> I also couldn't see a problem in this approach.
>>
>> Here is the first review comments.
>>
>> +       unloggedDelim = strrchr(path, '/');
>>
>> I think it doesn't work fine on windows. How about using
>> last_dir_separator() instead?
>
> I think this function is OK on Windows -- we use it quite a bit.
> However, last_dir_separator() is clearer so I have changed it.

Thank you for updating this. I was concerned about a separator
character '/' might not work fine on windows.

>
>> ----
>> + * Find all unlogged relations in the specified directory and return
>> their OIDs.
>>
>> What the ResetUnloggedrelationsHash() actually returns is a hash
>> table. The comment of this function seems not appropriate.
>
> Fixed.
>
>> +               /* Part of path that contains the parent directory. */
>> +               int parentPathLen = unloggedDelim - path;
>> +
>> +               /*
>> +                * Build the unlogged relation hash if the parent path is either
>> +                * $PGDATA/base or a tablespace version path.
>> +                */
>> +               if (strncmp(path, "./base", parentPathLen) == 0 ||
>> +                       (parentPathLen >=
>> (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
>> +                        strncmp(unloggedDelim -
>> (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
>> +                                        TABLESPACE_VERSION_DIRECTORY,
>> +
>> sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
>> +                       unloggedHash = ResetUnloggedRelationsHash(path);
>> +       }
>>
>> How about using get_parent_directory() to get parent directory name?
>
> get_parent_directory() munges the string that is passed to it which I
> was trying to avoid (we'd need a copy) - and I don't think it makes the
> rest of the logic any simpler without constructing yet another string to
> hold the tablespace path.

Agreed.

>
> I know performance isn't the most important thing here, so if the
> argument is for clarity perhaps it makes sense. Otherwise I don't know
> if it's worth it.
>
>> Also, I think it's better to destroy the unloggedHash after use.
>
> Whoops! Fixed.
>
>> +               /* Exclude all forks for unlogged tables except the
>> init fork. */
>> +               if (unloggedHash && ResetUnloggedRelationsMatch(
>> +                               unloggedHash, de->d_name) == unloggedOther)
>> +               {
>> +                       elog(DEBUG2, "unlogged relation file \"%s\"
>> excluded from backup",
>> +                                de->d_name);
>> +                       continue;
>> +               }
>>
>> I think it's better to log this debug message at DEBUG2 level for
>> consistency with other messages.
>
> I think you mean DEBUG1?  It's already at DEBUG2.

Oops, yes I meant DEBUG1.

>
> I considered using DEBUG1 but decided against it.  The other exclusions
> will produce a limited amount of output because there are only a few of
> them.  In the case of unlogged tables there could be any number of
> exclusions and I thought that was too noisy for DEBUG1.

IMO it's okay to output many unlogged tables for a debug purpose but I
see your point.

>
>> +       ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
>> +               'unlogged imain fork not in tablespace backup');
>>
>> s/imain/main/
>
> Fixed.
>
>> If a new unlogged relation is created after constructed the
>> unloggedHash before sending file, we cannot exclude such relation. It
>> would not be problem if the taking backup is not long because the new
>> unlogged relation unlikely becomes so large. However, if takeing a
>> backup takes a long time, we could include large main fork in the
>> backup.
>
> This is a good point.  It's per database directory which makes it a
> little better, but maybe not by much.
>
> Three options here:
>
> 1) Leave it as is knowing that unlogged relations created during the
> backup may be copied and document it that way.
>
> 2) Construct a list for SendDir() to work against so the gap between
> creating that and creating the unlogged hash is as small as possible.
> The downside here is that the list may be very large and take up a lot
> of memory.
>
> 3) Check each file that looks like a relation in the loop to see if it
> has an init fork.  This might affect performance since an
> opendir/readdir loop would be required for every relation.
>
> Personally, I'm in favor of #1, at least for the time being.  I've
> updated the docs as indicated in case you and Adam agree.
>

See below comment.

On Thu, Jan 25, 2018 at 6:23 AM, David Steele <david@pgmasters.net> wrote:
> On 1/24/18 4:02 PM, Adam Brightwell wrote:
> Actually, I was talking to Stephen about this it seems like #3 would be
> more practical if we just stat'd the init fork for each relation file
> found.  I doubt the stat would add a lot of overhead and we can track
> each unlogged relation in a hash table to reduce overhead even more.
>

Can the readdir handle files that are added during the loop? I think
that we still cannot exclude a new unlogged relation if the relation
is added after we execute readdir first time. To completely eliminate
it we need a sort of lock that prevents to create new unlogged
relation from current backends. Or we need to do readdir loop multiple
times to see if no new relations were added during sending files.

If you're updating the patch to implement #3, this patch should be
marked as "Waiting on Author". After updated I'll review it again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Further cleanup of pg_dump/pg_restore item selection code
Следующее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Temporary tables prevent autovacuum, leading to XID wraparound