Hey Jeff,
yes, we now relaxed the idle in transaction setting to 15 mins.
i was hesitant to increase the settings as it blocked auto vaccum. We use hot_standby_feedback = true also as we split reads/writes and allow long running queries on read replicas, this too affects auto vaccum.
so overall, all the options i set to ensure auto vaccum gets triggered get impacted by increased idle in tx and hot_standby_feedback = true, both of which seem to be necessary for the setup now.
we have been trying to work with sharding using (mutli coordinator FDW) on our own (and have been successful although have hiccups), using directory based sharding in pg10. (if we cannot handle growth, all goes to mongo for its automatic sharding and failover)
I have to admit we can do better here though. we need to rebalance the data in the shards when we come close to 90% disk. those are long delete/upsert queries. We have very aggressive autovaccum to ensure we do not have a lot of stale stats.
I have plans to rearchitect the whole setup with pg11 where we plan to introduce time based sharding and then table partitioning in each shard further by time and also use Materialized views, for day old data with pre aggregated fields on each shard so that explain does not have to work too hard :)
i guess, i am diverting the query, but just saying :)
Thanks for suggestions and help Jeff. Appreciate it.