On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote:
> I have a table with monthly partitions, starting from 2011.
> Each partition has about 2-3 billion rows and is about 40TB of data in total.
> I am running a select query to \copy the data into csv files for each month.
>
> I see there is an AUTOVACUUM job started on all the partitions that have been
> read (on which select has run). I am assuming the reason why Postgres is
> autovacuuming these large partitions is this:
>
> db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid) as age,
> c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <> 2147483647 ORDER BY 3 DESC LIMIT 20;
> table_name | type | age | relfrozenxid
> ---------------------------+------+-----------+--------------
> hist_omx_orderaudit201506 | r | 201463508 | 319891
> hist_omx_orderaudit201509 | r | 201418497 | 364902
> hist_omx_orderaudit201205 | r | 201142042 | 641357
>
> autovacuum_freeze_max_age
> ---------------------------
> 200000000
> (1 row)
>
> Am I wrong in my understanding why this autovacuum is getting triggered?
>
> However, my main question is that why do the data files in the tablespace, corresponding
> to these partition tables (that have been read or selected recently), or the ones
> autovacuumed recently have a latest timestamp?
>
> There was no insert or update on these partitions, so why do I see an updated timestamp?
> Does autovacuum change something on the datafiles?
Yes, autovacuum will freeze old rows in the table, which modifies the data files.
To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as soon
as you know that its data won't get modified any more.
Still, I wouldn't be sure that the file modification timestamp won't change
any more after that. But that should be pretty irrelevant; at any rate, no
substantial data modifications will occur after that.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com