Re: Remove_temp_files_after_crash and significant recovery/startup time

Поиск
Список
Период
Сортировка
От Euler Taveira
Тема Re: Remove_temp_files_after_crash and significant recovery/startup time
Дата
Msg-id 184df50e-f87f-4427-9ea4-431f4c752b40@www.fastmail.com
обсуждение исходный текст
Ответ на Remove_temp_files_after_crash and significant recovery/startup time  ("McCoy, Shawn" <shamccoy@amazon.com>)
Список pgsql-hackers
On Fri, Sep 10, 2021, at 5:58 PM, McCoy, Shawn wrote:

I noticed that the new parameter remove_temp_files_after_crash is currently set to a default value of "true" in the version 14 release. It seems this was discussed in this thread [1], and it doesn't look to me like there's been a lot of stress testing of this feature.

 

In our fleet there have been cases where we have seen hundreds of thousands of temp files generated.  I found a case where we helped a customer that had a little over 2.2 million temp files.  Single threaded cleanup of these takes a significant amount of time and delays recovery. In RDS, we mitigated this by moving the pgsql_tmp directory aside, start the engine and then separately remove the old temp files.

2.2 million temporary files? I'm wondering in what circumstances your system is
generating those temporary files. Low work_mem and thousands of connections?
Low work_mem and a huge analytic query? When I designed this feature I thought
about some extreme cases, that's why this behavior is controlled by a GUC. We
can probably add a sentence that explains the recovery delay caused by dozens
of thousands of temporary files.


After noticing the current plans to default this GUC to "on" in v14, just thought I'd raise the question of whether this should get a little more discussion or testing with higher numbers of temp files?

 

Crash a backend is per se a rare condition (at least it should be). Crash while
having millions of temporary files in your PGDATA is an even rarer condition. I
saw several cases related to this issue and none of them generates millions of
temporary files (at most a thousand files). IMO the benefits  outweigh the
issues as I explained in [1]. Service continuity (for the vast majority of
cases) justifies turning it on by default.

If your Postgres instance is generating millions of temporary files, it seems
your setup needs some tuning.


 

--
Euler Taveira

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Remove_temp_files_after_crash and significant recovery/startup time
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)