pg_resetwal regression: could not upgrade after 1d863c2504

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема pg_resetwal regression: could not upgrade after 1d863c2504
Дата
Msg-id TYAPR01MB58664AD301F511B1EA5B72B4F5C0A@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответы Re: pg_resetwal regression: could not upgrade after 1d863c2504
Список pgsql-hackers
Dear hackers,
(CC: Peter Eisentraut - committer of the problematic commit)

While developing pg_upgrade patch, I found a candidate regression for pg_resetwal.
It might be occurred due to 1d863c2504.

Is it really regression, or am I missing something?

# Phenomenon

pg_resetwal with relative path cannot be executed. It could be done at 7273945,
but could not at 1d863.


At 1d863:

```
$ pg_resetwal -n data_N1/
pg_resetwal: error: could not read permissions of directory "data_N1/": No such file or directory
```

At 7273945:

```
$ pg_resetwal -n data_N1/
Current pg_control values:

pg_control version number:            1300
Catalog version number:               202309251
...
```

# Environment

Attached script was executed on RHEL 7.9, gcc was 8.3.1.
I used meson build system with following options:

meson setup -Dcassert=true -Ddebug=true -Dc_args="-ggdb -O0 -g3 -fno-omit-frame-pointer"

# My analysis

I found that below part in GetDataDirectoryCreatePerm() returns false, it was a
cause.

```
    /*
     * If an error occurs getting the mode then return false.  The caller is
     * responsible for generating an error, if appropriate, indicating that we
     * were unable to access the data directory.
     */
    if (stat(dataDir, &statBuf) == -1)
        return false;
```

Also, I found that the value DataDir in main() has relative path.
Based on that, upcoming stat() may not able to detect the given location because
the process has already located inside the directory.

```
(gdb) break chdir
Breakpoint 1 at 0x4016f0
(gdb) run -n data_N1

...
Breakpoint 1, 0x00007ffff78e1390 in chdir () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64
(gdb) print DataDir
$1 = 0x7fffffffe25c "data_N1"
(gdb) frame 1
#1  0x00000000004028d7 in main (argc=3, argv=0x7fffffffdf58) at ../postgres/src/bin/pg_resetwal/pg_resetwal.c:348
348             if (chdir(DataDir) < 0)
(gdb) print DataDir
$2 = 0x7fffffffe25c "data_N1"
```

# How to fix

One alternative approach is to call chdir() several times. PSA the patch.
(I'm not sure the commit should be reverted)

# Appendix - How did I find?

Originally, I found an issue when attached script was executed.
It creates two clusters and executes pg_upgrade, but failed with following output.
(I also attached whole output, please see result_*.out)

```
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
pg_resetwal: error: could not read permissions of directory "data_N1": No such file or directory
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: pg_resetwal tests, logging, and docs update