Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> I see the error showing up every night on 2 different servers. But it's a
> bit of a heisenbug because If I go there now it won't be reproducible.
Do you have query logging enabled ? If not, could you consider it on at least
one of those servers ? I'm interested to know what ELSE is running at the time
that query failed.
Perhaps you could enable query logging JUST for the interval of time that the
server usually errors ? The CSV logs can be imported to postgres for analysis.
You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN
..AND .. GROUP BY 1 ORDER BY 2;
And just maybe there'd be a query there that only runs once per day which would
allow reproducing the error at will. Or utility command like vacuum..
I think ideally you'd set:
log_statement = all
log_min_messages = info
log_destination = 'stderr,csvlog'
# stderr isn't important for this purpose, but I keep it set to capture crash messages, too
You should set these to something that works well at your site:
log_rotation_age = '2min'
log_rotation_size = '32MB'
I would normally set these, and I don't see any reason why you wouldn't set
them too:
log_checkpoints = on
log_lock_waits = on
log_temp_files = on
log_min_error_statement = notice
log_temp_files = 0
log_min_duration_statement = '9sec'
log_autovacuum_min_duration = '999sec'
And I would set these too but maybe you'd prefer to do something else:
log_directory = /var/log/postgresql
log_file_mode = 0640
log_filename = postgresql-%Y-%m-%d_%H%M%S.log
Justin