Обсуждение: Any plans to expose publicly (via stored proc) relation_needs_vacanalyze, or some flavour of do_autovacuum?

Поиск
Список
Период
Сортировка

Any plans to expose publicly (via stored proc) relation_needs_vacanalyze, or some flavour of do_autovacuum?

От
Vlad Romascanu
Дата:
Imagine the following sequence of events:
1. a "writer" session begins a transaction, growing the number of live
tuples in several tables (e.g. via COPY) from mere tens (or hundreds)
to tens of thousands of tuples, then COMMITs
2. one or more "reader" sessions perform a SELECT ... JOIN on the very
tables grown by the "writer" session

Currently, unless the autovacuum daemon happens to run after the
"writer" session COMMITs and complete before the "reader" session(s)
SELECT, then the "reader" session(s) may complete execution in minutes
instead of milliseconds because of an incorrect execution plan based
on stale statistics.  Have seen this happen in reality.

In my specific case, all write operations are serialized via an
application-level mutex, i.e. there will only be one "writer" session
and multiple "reader" sessions at any given time.  In such a setup,
the most obvious workaround is to explicitly ANALYZE the affected
tables, prior to COMMITting, in the "writer" session.  New data would
thus be committed along with up-to-date statistics.

However, let's tweak the above scenario and assume that, later on, the
"writer" session only INSERTs one row in a table which now counts
several million tuples.-- indiscriminately running ANALYZE after such
a measly INSERTion is overkill.

The logical, amended solution would then be to have the "writer"
session perform, after INSERTion but before COMMITTing, the same
calculation that the autovacuum daemon currently performs inside
relation_needs_vacanalyze, based on the same configuration parameters
that the autovacuum daemon uses, before deciding whether to explicitly
ANALYZE or not the affected application tables.  Ideally one would not
want to duplicate the relation_needs_vacanalyze logic (not to mention
having to guess the value of last_anl_tuples, which is not exposed via
any storedproc -- one would have to assume that it is more or less
equivalent to pg_class.reltuples.)

So my question is: does it sound reasonable to (and/or are there any
existing plans to) expose either relation_needs_vacanalyze, or a
per-table flavour of do_autoanalyze, to the public via a stored proc
for those in my situation or who need more granular control over
autovacuuming than the autovacuum daemon does?

Thank you,
Vlad.

Vlad Romascanu <vromascanu@accurev.com> writes:
> The logical, amended solution would then be to have the "writer"
> session perform, after INSERTion but before COMMITTing, the same
> calculation that the autovacuum daemon currently performs inside
> relation_needs_vacanalyze, based on the same configuration parameters
> that the autovacuum daemon uses, before deciding whether to explicitly
> ANALYZE or not the affected application tables.  Ideally one would not
> want to duplicate the relation_needs_vacanalyze logic (not to mention
> having to guess the value of last_anl_tuples, which is not exposed via
> any storedproc -- one would have to assume that it is more or less
> equivalent to pg_class.reltuples.)

I don't know if you actually looked at that code, but it's driven off of
statistics counters that are only updated at commit; and furthermore are
maintained in a different process altogether.  So what you have in mind
isn't going to work ...

            regards, tom lane