Обсуждение: temporary data after diskspace error

Поиск
Список
Период
Сортировка

temporary data after diskspace error

От
Willy-Bas Loos
Дата:
Hi,

We have a server with postgresql 9.4.12 on ubuntu.
There has been a sudden rise in the amount of disk space used by postgresql, causing a diskspace error:

2020-01-22 17:24:37 CET db: ip: us: PANIC:  could not write to file "pg_xlog/xlogtemp.23346": No space left on device
2020-01-22 17:24:37 CET db: ip: us: LOG:  WAL writer process (PID 23346) was terminated by signal 6: Aborted
2020-01-22 17:24:37 CET db: ip: us: LOG:  terminating any other active server processes

The disk was at roughly 75% before and something or someone added  150 GB to the database, bringing the disk space usage to 100%.
The query that got the initial error was creating a rather large table, but it is not confirmed that this is the only source of the large-ish data amount. But it is possible.

Now i can see in pg_stat_database and postgresql/9.4/main/base/pgsql_tmp that there is 90GB of temporary files in the database.

Could the amount of temp files be caused by the unfinished query? I'm not sure how strong Signal 6 is exactly.

And also: How can i make postgres clean up the files?
Can it be done without restarting the cluster?
Will restarting it help?


--
Willy-Bas Loos

Re: temporary data after diskspace error

От
Ron
Дата:
On 1/27/20 10:05 AM, Willy-Bas Loos wrote:
> Hi,
>
> We have a server with postgresql 9.4.12 on ubuntu.
> There has been a sudden rise in the amount of disk space used by 
> postgresql, causing a diskspace error:
>
> 2020-01-22 17:24:37 CET db: ip: us: PANIC:  could not write to file 
> "pg_xlog/xlogtemp.23346": No space left on device
> 2020-01-22 17:24:37 CET db: ip: us: LOG:  WAL writer process (PID 23346) 
> was terminated by signal 6: Aborted
> 2020-01-22 17:24:37 CET db: ip: us: LOG:  terminating any other active 
> server processes
>
> The disk was at roughly 75% before and something or someone added  150 GB 
> to the database, bringing the disk space usage to 100%.
> The query that got the initial error was creating a rather large table, 
> but it is not confirmed that this is the only source of the large-ish data 
> amount. But it is possible.
>
> Now i can see in pg_stat_database and postgresql/9.4/main/base/pgsql_tmp 
> that there is 90GB of temporary files in the database.
>
> Could the amount of temp files be caused by the unfinished query? I'm not 
> sure how strong Signal 6 is exactly.
>
> And also: How can i make postgres clean up the files?
> Can it be done without restarting the cluster?
> Will restarting it help?

Restarting postgresql clears out pgsql_tmp.  "pg_ctl restart -D ..." should 
be all you need.

-- 
Angular momentum makes the world go 'round.



Re: temporary data after diskspace error

От
Tom Lane
Дата:
Willy-Bas Loos <willybas@gmail.com> writes:
> And also: How can i make postgres clean up the files?
> Can it be done without restarting the cluster?
> Will restarting it help?

A restart will clean out temp files.  I don't think there's any
terribly safe way to do it without that.  You could manually
remove such files that haven't been accessed recently, but the
risk of human error is high.

            regards, tom lane



Re: temporary data after diskspace error

От
Michael Lewis
Дата:
In terms of preventing this happening again, you might consider setting some reasonable temp_file_limit as default. Alternatively or additionally, you can set up another volume and direct temp files to use that to avoid the server crashing if excessive temp files are used in a moment's time. Then only that one query fails that used too much space.

It always bears mentioning that 9.4 is very old and upgrading is recommended to at least v11 that has been out for more than a year.

"PostgreSQL 9.4 will stop receiving fixes on February 13, 2020, which is the next planned cumulative update release."

Re: temporary data after diskspace error

От
Willy-Bas Loos
Дата:
Ok, thanks everyone!
Will there be a lot of downtime to delete those 90GB of temp files?

Will postgres just delete those files without processing them or should I brace for some downtime?


Op ma 27 jan. 2020 17:15 schreef Tom Lane <tgl@sss.pgh.pa.us>:
Willy-Bas Loos <willybas@gmail.com> writes:
> And also: How can i make postgres clean up the files?
> Can it be done without restarting the cluster?
> Will restarting it help?

A restart will clean out temp files.  I don't think there's any
terribly safe way to do it without that.  You could manually
remove such files that haven't been accessed recently, but the
risk of human error is high.

                        regards, tom lane

Re: temporary data after diskspace error

От
Tom Lane
Дата:
Willy-Bas Loos <willybas@gmail.com> writes:
> Will there be a lot of downtime to delete those 90GB of temp files?
> Will postgres just delete those files without processing them or should I
> brace for some downtime?

It's just a directory scan and an unlink() for each file that has
the right filename pattern to be a temp file.  If you've got a
reasonable filesystem I wouldn't expect it to take long.
Maybe a minute? (but don't quote me on that)

            regards, tom lane



Re: temporary data after diskspace error

От
Willy-Bas Loos
Дата:
I did the restart and it took seconds.
This was on a SSD.
BTW on ubuntu and debian i never use pg_ctl directly, postgresql-common has a very nice CLI for this.
For the restart i used: sudo pg_ctlcluster --force 9.4 main restart

Thanks for all the good advice!

On Mon, Jan 27, 2020 at 10:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Willy-Bas Loos <willybas@gmail.com> writes:
> Will there be a lot of downtime to delete those 90GB of temp files?
> Will postgres just delete those files without processing them or should I
> brace for some downtime?

It's just a directory scan and an unlink() for each file that has
the right filename pattern to be a temp file.  If you've got a
reasonable filesystem I wouldn't expect it to take long.
Maybe a minute? (but don't quote me on that)

                        regards, tom lane


--
Willy-Bas Loos