Re: New IndexAM API controlling index vacuum strategies
От | Peter Geoghegan |
---|---|
Тема | Re: New IndexAM API controlling index vacuum strategies |
Дата | |
Msg-id | CAH2-WznWWQwgydT37X=c2xe1QBV5k53HVuTfLfxhbopAAA_Kgg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: New IndexAM API controlling index vacuum strategies (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: New IndexAM API controlling index vacuum strategies
(Peter Geoghegan <pg@bowt.ie>)
|
Список | pgsql-hackers |
On Mon, Dec 28, 2020 at 10:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The second test checking if maintenane_work_mem space is running out > also makes sense to me. Perhaps another idea would be to compare the > number of collected garbage tuple to the total number of heap tuples > so that we do lazy_vacuum_heap() only when we’re likely to reclaim a > certain amount of garbage in the table. Right. Or to consider if this is an anti-wraparound VACUUM might be nice -- maybe we should skip index vacuuming + lazy_vacuum_heap() if and only if we're under pressure to advance datfrozenxid for the whole DB, and really need to hurry up. (I think that we could both probably think of way too many ideas like this one.) > That's a very good result in terms of skipping lazy_vacuum_heap(). How > much the table and indexes bloated? Also, I'm curious about that which > tests in choose_vacuum_strategy() turned vacuum_heap on: 130 test or > test if maintenance_work_mem space is running out? And what was the > impact on clearing all-visible bits? The pgbench_accounts heap table and 3 out of 4 of its indexes (i.e. all indexes except "abalance_ruin") had zero growth. They did not even become larger by 1 block. As I often say when talking about work in this area, this is not a quantitative difference -- it's a qualitative difference. (If they grew even a tiny amount, say by only 1 block, further growth is likely to follow.) The "abalance_ruin" index was smaller with the patch. Its size started off at 253,779 blocks with both the patch and master branch (which is very small, because of B-Tree deduplication). By the end of 2 pairs of runs for the patch (2 3 hour runs) the size grew to 502,016 blocks. But with the master branch it grew to 540,090 blocks. (For reference, the primary key on pgbench_accounts started out at 822,573 blocks.) My guess is that this would compare favorably with "magic VACUUM" [1] (I refer to a thought experiment that is useful for understanding the principles behind bottom-up index deletion). The fact that "abalance_ruin" becomes bloated probably doesn't have that much to do with MVCC versioning. In other words, I suspect that the index wouldn't be that smaller in a traditional two-phase locking database system with the same workload. Words like "bloat" and "fragmentation" have always been overloaded/ambiguous in highly confusing ways, which is why I find it useful to compare a real world workload/benchmark to some kind of theoretical ideal behavior. This test wasn't particularly sympathetic to the patch because most of the indexes (all but the PK) were useless -- they did not get used by query plans. So the final size of "abalance_ruin" (or any other index) isn't even the truly important thing IMV (the benchmark doesn't advertise the truly important thing for me). The truly important thing is that the worst case number of versions *per logical row* is tightly controlled. It doesn't necessarily matter all that much if 30% of an index's tuples are garbage, as long as the garbage tuples are evenly spread across all logical rows in the table (in practice it's pretty unlikely that that would actually happen, but it's possible in theory, and if it did happen it really wouldn't be so bad). [1] https://postgr.es/m/CAH2-Wz=rPkB5vXS7AZ+v8t3VE75v_dKGro+w4nWd64E9yiCLEQ@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Masahiko SawadaДата:
Сообщение: Re: New IndexAM API controlling index vacuum strategies