Re: PROC_IN_ANALYZE stillborn 13 years ago

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: PROC_IN_ANALYZE stillborn 13 years ago
Дата
Msg-id 20200807213727.qjfvltalwt63xbib@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: PROC_IN_ANALYZE stillborn 13 years ago  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2020-08-06 18:02:26 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > In fact using conceptually like a new snapshot for each sample tuple
> > actually seems like it'd be somewhat of an improvement over using a
> > single snapshot.
> 
> Dunno, that feels like a fairly bad idea to me.  It seems like it would
> overemphasize the behavior of whatever queries happened to be running
> concurrently with the ANALYZE.  I do follow the argument that using a
> single snapshot for the whole ANALYZE overemphasizes a single instant
> in time, but I don't think that leads to the conclusion that we shouldn't
> use a snapshot at all.

I didn't actually want to suggest that we should take a separate
snapshot for every sampled row - that'd be excessively costly. What I
wanted to say was that I don't think that I don't see a clear accuraccy
benefit. E.g. not seeing any of the values inserted more recently will
under-emphasize those in the histogram.

What precisely do you mean with "overemphasize" above? I mean those will
e the rows most likely to live after the analyze is done, so including
them doesn't seem like a bad thing to me?


> Another angle that would be worth considering, aside from the issue
> of whether the sample used for pg_statistic becomes more or less
> representative, is what impact all this would have on the tuple count
> estimates that go to the stats collector and pg_class.reltuples.
> Right now, we don't have a great story at all on how the stats collector's
> count is affected by combining VACUUM/ANALYZE table-wide counts with
> the incremental deltas reported by transactions happening concurrently
> with VACUUM/ANALYZE.  Would changing this behavior make that better,
> or worse, or about the same?

Hm. Vacuum already counts rows that are inserted concurrently with the
vacuum scan, if it encounters them. Analyze doesn't. Seems like we'd at
least be wrong in a more consistent manner than before...

IIUC both analyze and vacuum will overwrite concurrent changes to
n_live_tuples. So taking concurrently committed changes into account
seems like it'd be the right thing?

We probably could make this more accurate by accounting separately for
"recently inserted and committed" rows, and taking the difference of
n_live_tuples before/after into account.  But I'm a bit doubtful that
it's worth it?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: PROC_IN_ANALYZE stillborn 13 years ago
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Should the nbtree page split REDO routine's locking work more like the locking on the primary?