Re: Modification of data in base folder and very large tables

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Modification of data in base folder and very large tables
Дата
Msg-id CAMkU=1xLS4nMcddSSLZYF8xEA-0AasZTcXYBn+315KrQTP6Rbw@mail.gmail.com
обсуждение исходный текст
Ответ на Modification of data in base folder and very large tables  (Ogden Brash <info@litika.com>)
Список pgsql-performance
On Wed, Oct 9, 2019 at 4:33 AM Ogden Brash <info@litika.com> wrote:
# lsof -p 6600 | wc -l;
840

# lsof -p 6601 | wc -l;
906

Is that normal? That there be so many open file pointers? ~900 open file pointers for each of the processes?

I don't think PostgreSQL makes any effort to conserve file handles, until it starts reaching the max.  So any file that has ever been opened will remain open, unless it was somehow invalidated (e.g. the file needs to be deleted).  If those processes were previously loading smaller tables before the got bogged down in the huge ones, a large number of handles would not be unexpected.
 
 
The next I did was go to see the actual data files, to see how many there are. In my case they are in postgresql/11/main/base/24576 and there are 2076 files there. That made sense. However, I found that when I list them by modification date I see something interesting:

-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.7
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.8
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.9
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.10
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.11
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.12
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.13
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.14
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.16
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.15
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.17
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.18
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.19
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.21
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.22
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.23
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.24
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.25
-rw------- 1 postgres postgres 1073741824 Oct  8 13:05 27083.26
-rw------- 1 postgres postgres   19062784 Oct  8 13:05 27082_fsm
-rw------- 1 postgres postgres  544489472 Oct  8 13:05 27077.34
-rw------- 1 postgres postgres  169705472 Oct  8 13:05 27082.72
-rw------- 1 postgres postgres  978321408 Oct  8 13:05 27083.27
-rw------- 1 postgres postgres  342925312 Oct  8 13:05 27076.88

If you notice, the file size is capped at 1 GB and as the giant table has grown it has added more files in this directory. However, the mysterious thing to me is that it keeps modifying those files constantly - even the ones that are completely full. So for the two large tables it has been restoring all week, the modification time for the ever growing list of files is being updating constantly.

The bgwriter, the checkpointer, and autovac, plus any backends that decide they need a clean page from the buffer cache can all touch those files.  They might touch them in ways that are not IO intensive, but still cause the modification time to get updated. In my hands, one all dirty buffers a given file have been flushed and all contents in the file have been vacuumed, its mtime stops changing just due to copy in which is directed to later files.

It is also squishy what it even means to modify a file.  I think filesystems have heuristics to avoid "gratuitous" updates to mtime, which make it hard to recon with.
 

Could it be that thats why I am seeing a slowdown over the course of the week - that for some reason as the number of files for the table has grown, the system is spending more and more time seeking around the disk to touch all those files for some reason?

I don't think lsof or mtime are effective ways to research this.  How about running strace -ttt -T -y on those processes?
 
Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Get the planner used by a query?
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: Query slow again after adding an `OR` operation (was: SlowPostgreSQL 10.6 query)