Обсуждение: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

Поиск
Список
Период
Сортировка

Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
Posgres13_beta1, is consistently writing to the logs, "could not rename temporary statistics file".
When analyzing the source that writes the log, I simplified the part that writes the logs a little.

1. I changed from if else if to if
2. For the user, better to have more errors recorded, which can help in discovering the problem
3. Errors are independent of each other
4. If I can't release tmpfile, there's no way to delete it (unlink)
5. If I can rename, there is no need to delete it (unlink) tmpfile.

Attached is the patch that proposes these changes.

Now, the problem has not been solved.
1. statfile, is it really closed or does it not exist in the directory?
     There is no way to rename a file, which is open and in use.
2. Why delete (pgstat_stat_filename), if permanent is true:
const char * statfile = permanent? PGSTAT_STAT_PERMANENT_FILENAME: pgstat_stat_filename;
statfile is PGSTAT_STAT_PERMANENT_FILENAME and not pgstat_stat_filename

regards,
Ranier Vilela
Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
Em sex., 12 de jun. de 2020 às 15:15, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Posgres13_beta1, is consistently writing to the logs, "could not rename temporary statistics file".
When analyzing the source that writes the log, I simplified the part that writes the logs a little.

1. I changed from if else if to if
2. For the user, better to have more errors recorded, which can help in discovering the problem
3. Errors are independent of each other
4. If I can't release tmpfile, there's no way to delete it (unlink)
5. If I can rename, there is no need to delete it (unlink) tmpfile.
Fix for case 5.
Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
posix rename, "renames a file, moving it between directories if required".

pgrename, win32 port uses MoveFileEx, to support rename files at Windows side,
but, actually don't allow "renames a file, moving it between directories if required".

To match the same characteristics as posix rename, we need to add a flag to MoveFileEx (MOVEFILE_COPY_ALLOWED)
Which allows, if necessary, to move between volumes, drives and directories.

Such a resource seems to decrease the chances of occurring, permission denied, when renaming the temporary statistics file.
Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Michael Paquier
Дата:
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote:
> Posgres13_beta1, is consistently writing to the logs, "could not rename
> temporary statistics file".
> When analyzing the source that writes the log, I simplified the part that
> writes the logs a little.

FWIW, I have been running a server on Windows for some time with
pgbench running in the background, combined with some starts and stops
but I cannot see that.  This log entry uses LOG, which is the level we
use for the TAP tests and please note that there are four buildfarm
animals for Windows able to run the TAP tests and they don't seem to
show that problem either: drongo, fairywen, jacana and bowerbird.  I
may be missing something of course.
--
Michael

Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Justin Pryzby
Дата:
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote:
> Posgres13_beta1, is consistently writing to the logs, "could not rename
> temporary statistics file".
> When analyzing the source that writes the log, I simplified the part that
> writes the logs a little.

What windows version and compiler ?

Please show the full CSV log for this event, and not an excerpt.
Preferably with several lines of "context" for the stats process PID, with
log_min_messages=debug or debug2 and log_error_verbosity=verbose, so that you
get the file location where it's erroring, if you don't already know that.

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

> 1. I changed from if else if to if
> 2. For the user, better to have more errors recorded, which can help in
> discovering the problem
> 3. Errors are independent of each other
> 4. If I can't release tmpfile, there's no way to delete it (unlink)
> 5. If I can rename, there is no need to delete it (unlink) tmpfile.
> 
> Attached is the patch that proposes these changes.
> Now, the problem has not been solved.

It sounds like you haven't yet found the problem, right ?  These are all
unrelated changes which are confusing the problem report and discussion.
And introducing behavior regressions, like renaming files with write errors on
top of known good files.

I think you'll want to 1) identify where the problem is occuring, and attach a
debugger there.

2) figure out when the problem was introduced.  If this problem doesn't happen
under v12:

git log --cherry-pick -p origin/REL_12_STABLE...origin/REL_13_STABLE -- src/backend/postmaster/pgstat.c
or just:
git log -p origin/REL_12_STABLE.. src/backend/postmaster/pgstat.c

You could try git-bisecting between v12..v13, but there's only 30 commits which
touched pgstat.c (assuming that's where the ERROR is being thrown).

Do you have a special value of stats_temp_directory?
Or a symlink or junction at pg_stat_tmp ?

> 1. statfile, is it really closed or does it not exist in the directory?
>      There is no way to rename a file, which is open and in use.
> 2. Why delete (pgstat_stat_filename), if permanent is true:
> const char * statfile = permanent? PGSTAT_STAT_PERMANENT_FILENAME:
> pgstat_stat_filename;
> statfile is PGSTAT_STAT_PERMANENT_FILENAME and not pgstat_stat_filename

You can find answers to a lot of questions in the git history.  In this case,
70d756970 and 187492b6c.

-- 
Justin



Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:

Em dom., 14 de jun. de 2020 às 23:08, Michael Paquier <michael@paquier.xyz> escreveu:
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote:
> Posgres13_beta1, is consistently writing to the logs, "could not rename
> temporary statistics file".
> When analyzing the source that writes the log, I simplified the part that
> writes the logs a little.

FWIW, I have been running a server on Windows for some time with
pgbench running in the background, combined with some starts and stops
but I cannot see that.  This log entry uses LOG, which is the level we
use for the TAP tests and please note that there are four buildfarm
animals for Windows able to run the TAP tests and they don't seem to
show that problem either: drongo, fairywen, jacana and bowerbird.  I
may be missing something of course
Hi Michael, thsnks for answer.
Yeah, something is wrong with Postgres and Windows 10 (not server) with msvc 2019 (64 bits)
II already reported on another thread, that vcregress is failing with (float8 and partitionprune) and now these messages are showing up.
None buildfarm animal, have that combination, but as Postgres officially supports it ..

regards,
Ranier Vilela

Livre de vírus. www.avast.com.

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
Em dom., 14 de jun. de 2020 às 23:53, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote:
> Posgres13_beta1, is consistently writing to the logs, "could not rename
> temporary statistics file".
> When analyzing the source that writes the log, I simplified the part that
> writes the logs a little.

What windows version and compiler ?
Windows 10 (2004, msvc 2019 (64 bits)
None configuration, only git clone and build.bat
 

Please show the full CSV log for this event, and not an excerpt.
Preferably with several lines of "context" for the stats process PID, with
log_min_messages=debug or debug2 and log_error_verbosity=verbose, so that you
get the file location where it's erroring, if you don't already know that.

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

Ok, I will provide.

 
> 1. I changed from if else if to if
> 2. For the user, better to have more errors recorded, which can help in
> discovering the problem
> 3. Errors are independent of each other
> 4. If I can't release tmpfile, there's no way to delete it (unlink)
> 5. If I can rename, there is no need to delete it (unlink) tmpfile.
>
> Attached is the patch that proposes these changes.
> Now, the problem has not been solved.

It sounds like you haven't yet found the problem, right ?  These are all
unrelated changes which are confusing the problem report and discussion.
And introducing behavior regressions, like renaming files with write errors on
top of known good files.
It is certainly on pgstat.c
Yes, I have not yet discovered the real cause.
But, while checking the code, I thought I could improve error checking, and to avoid creating a new thread about it, I took advantage of that thread. It would certainly be better to separate, but this list is busy, I tried not to create any more confusion.
If I can't record, I can't close and I can't rename, seeing this in the logs, certainly, helps to solve the problem, not confuse.
In addition, the flow was cleaner, with fewer instructions and in this specific case, it continues with the same expected behavior.
Only the rename message is showing, so the expected behavior is the same. 


I think you'll want to 1) identify where the problem is occuring, and attach a
debugger there.
I will try.
 

2) figure out when the problem was introduced.  If this problem doesn't happen
under v12:
I don't think so, but v12 I'm using the official distributed version.
 

git log --cherry-pick -p origin/REL_12_STABLE...origin/REL_13_STABLE -- src/backend/postmaster/pgstat.c
or just:
git log -p origin/REL_12_STABLE.. src/backend/postmaster/pgstat.c

You could try git-bisecting between v12..v13, but there's only 30 commits which
touched pgstat.c (assuming that's where the ERROR is being thrown)
Thanks for the hit.
 
.

Do you have a special value of stats_temp_directory?
Not.
 
Or a symlink or junction at pg_stat_tmp ?
Not. Only C driver (one volume)
 
You saw the patch regarding the flag, suggested.


> 1. statfile, is it really closed or does it not exist in the directory?
>      There is no way to rename a file, which is open and in use.
> 2. Why delete (pgstat_stat_filename), if permanent is true:
> const char * statfile = permanent? PGSTAT_STAT_PERMANENT_FILENAME:
> pgstat_stat_filename;
> statfile is PGSTAT_STAT_PERMANENT_FILENAME and not pgstat_stat_filename

You can find answers to a lot of questions in the git history.  In this case,
70d756970 and 187492b6c.
Ok,.

Thanks for answer Justin.

regards,
Ranier Vilela

Livre de vírus. www.avast.com.

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
Attached a log.

I hacked dirmod.c (pgrename), to print GetLastError();

MoveFIleEx from: pg_stat_tmp/global.tmp
MoveFIleEx to: pg_stat_tmp/global.stat
MoveFIleEx win32 error code 5

regards,
Ranier Vilela
Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
I can confirm that the problem is in pgrename (dirmod.c),
something is not OK, with MoveFileEx, even with the (MOVEFILE_REPLACE_EXISTING) flag.

Replacing MoveFileEx, with
unlink (to);
rename (from, to);

#if defined (WIN32) &&! defined (__ CYGWIN__)
unlink(to);
while (rename (from, to)! = 0)
#else
while (rename (from, to) <0)
#endif

The log messages have disappeared.

I suspect that if the target (to) file exists, MoveFileEx, it is failing to rename, even with the flag enabled.

Windows have the native rename function (https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=vs-2019)
However, it fails if the target (to) file exists.

Question, is it acceptable delete the target file, if it exists, to guarantee the rename?

regards,
Ranier Vilela

Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Michael Paquier
Дата:
On Mon, Jun 15, 2020 at 09:49:31AM -0300, Ranier Vilela wrote:
> II already reported on another thread, that vcregress is failing with
> (float8 and partitionprune) and now these messages are showing up.
> None buildfarm animal, have that combination, but as Postgres officially
> supports it ..

Well, it happens that I do have a VM with MSVC 2019 64-bit and Windows
10 (17763.rs5_release.180914-1434 coming directly from Microsoft's
website).  Testing with it, I am not seeing any issues related to
temporary file renames, though getting a disk full I could find some
errors in writing and/or closing temporary statistics file because of
ENOSPC but that's not a surprise in this situation, and regression
tests just work fine.  Honestly, I see nothing to act on based on the
information provided in this thread.
--
Michael

Вложения

Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Justin Pryzby
Дата:
On Mon, Jun 15, 2020 at 11:49:33PM -0300, Ranier Vilela wrote:
> I can confirm that the problem is in pgrename (dirmod.c),
> something is not OK, with MoveFileEx, even with the
> (MOVEFILE_REPLACE_EXISTING) flag.
> 
> Replacing MoveFileEx, with
> unlink (to);
> rename (from, to);
> 
> #if defined (WIN32) &&! defined (__ CYGWIN__)
> unlink(to);
> while (rename (from, to)! = 0)
> #else
> while (rename (from, to) <0)
> #endif
> 
> The log messages have disappeared.
> 
> I suspect that if the target (to) file exists, MoveFileEx, it is failing to
> rename, even with the flag enabled.
> 
> Windows have the native rename function (
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=vs-2019
> )
> However, it fails if the target (to) file exists.
> 
> Question, is it acceptable delete the target file, if it exists, to
> guarantee the rename?

I don't think so - the idea behind writing $f.tmp and renaming it to $f is that
it's *atomic*.

I found an earlier thread addressing the same problem - we probably both
should've found this earlier.
https://commitfest.postgresql.org/27/2230/

https://www.postgresql.org/message-id/flat/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw%40mail.gmail.com#9b04576b717175e9dbf03cc991977d3f

That thread goes back to 2017, so I don't think this is a new problem in v13.
I'm not sure why you wouldn't also see the same behavior in v12.

There's a couple patches in that thread, and the latest patch was rejected.

Maybe you'd want to test them out and provide feedback.

BTW, the first patch does:

!               if (filePresent)
!               {
!                       if (ReplaceFile(to, from, NULL, REPLACEFILE_IGNORE_MERGE_ERRORS, 0, 0))
!                               break;
!               }
!               else
!               {
!                       if (MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
!                               break;
!               }

Since it's racy to first check if the file exists, I would've thought we should
instead do:

    ret = ReplaceFile()
    if ret == OK:
        break
    else if ret == FileDoesntExist:
        if MoveFileEx():
            break

Or, should we just try to create a file first, to allow ReplaceFile() to always
work ?

-- 
Justin



Re: Postgresql13_beta1 (could not rename temporary statistics file)Windows 64bits

От
Ranier Vilela
Дата:
Em ter., 16 de jun. de 2020 às 01:10, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Mon, Jun 15, 2020 at 11:49:33PM -0300, Ranier Vilela wrote:
> I can confirm that the problem is in pgrename (dirmod.c),
> something is not OK, with MoveFileEx, even with the
> (MOVEFILE_REPLACE_EXISTING) flag.
>
> Replacing MoveFileEx, with
> unlink (to);
> rename (from, to);
>
> #if defined (WIN32) &&! defined (__ CYGWIN__)
> unlink(to);
> while (rename (from, to)! = 0)
> #else
> while (rename (from, to) <0)
> #endif
>
> The log messages have disappeared.
>
> I suspect that if the target (to) file exists, MoveFileEx, it is failing to
> rename, even with the flag enabled.
>
> Windows have the native rename function (
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=vs-2019
> )
> However, it fails if the target (to) file exists.
>
> Question, is it acceptable delete the target file, if it exists, to
> guarantee the rename?

I don't think so - the idea behind writing $f.tmp and renaming it to $f is that
it's *atomic*.
"atomic" here means, rename operation only, see.
1. create $f.tmp
2. write
3. close
4. rename to $f
 

I found an earlier thread addressing the same problem - we probably both
should've found this earlier.
https://commitfest.postgresql.org/27/2230/
https://www.postgresql.org/message-id/flat/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw%40mail.gmail.com#9b04576b717175e9dbf03cc991977d3f
Very interesting.
That link, gave me a light.
Users report this same situation in production environments with a high load.
Here I have only one notebook, with 8gb ram, windows 10 64 bits (2004), msvc 2019 (64 bits).
Without any load, just starting and stopping the server, the messages appear in the log.
 


That thread goes back to 2017, so I don't think this is a new problem in v13.
I'm not sure why you wouldn't also see the same behavior in v12.
Windows Server 2003 (32 bits) with Postgres 9.6 (32 bits), none log (could rename).`
Windows Server 2016 (64 bits) with Postgres 12 (64 bits), have log (cound rename).

Yes V12 can confirm, too.

File postgresql-Mon.log:
2019-10-14 11:56:26 GMT [4844]: [1-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
File postgresql-Sat.log:
2019-09-28 10:15:52 GMT [4804]: [2-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
2019-10-05 12:01:23 GMT [4792]: [1-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
2019-10-05 23:55:31 GMT [4792]: [2-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
File postgresql-Sun.log:
2019-10-06 07:43:36 GMT [4792]: [3-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
File postgresql-Tue.log:
2019-10-01 07:31:46 GMT [4804]: [3-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
2019-10-22 14:44:25 GMT [3868]: [1-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
File postgresql-Wed.log:
2019-10-02 22:20:52 GMT [4212]: [1-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
2019-10-09 11:05:02 GMT [3712]: [1-1] user=,db=,app=,client= LOG:  could not rename temporary statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat": Permission denied
 

There's a couple patches in that thread, and the latest patch was rejected.

Maybe you'd want to test them out and provide feedback.
I will try.


BTW, the first patch does:

!               if (filePresent)
!               {
!                       if (ReplaceFile(to, from, NULL, REPLACEFILE_IGNORE_MERGE_ERRORS, 0, 0))
!                               break;
!               }
!               else
!               {
!                       if (MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
!                               break;
!               }

Since it's racy to first check if the file exists, I would've thought we should
instead do:

        ret = ReplaceFile()
        if ret == OK:
                break
        else if ret == FileDoesntExist:
                if MoveFileEx():
                        break

Or, should we just try to create a file first, to allow ReplaceFile() to always
work ?
This seemingly, it works too.
+ while (rename(from, to) != 0 && !MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING | MOVEFILE_COPY_ALLOWED))

Attached a patch.

regards,
Ranier Vilela
Вложения