Optimizing away second VACUUM heap scan when only BRIN indexes on table

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Optimizing away second VACUUM heap scan when only BRIN indexes on table
Дата
Msg-id CAM3SWZSvhJLdEPbB1v9bX_VDd78Dd0usWhdWcUyfmjcnAMNc6A@mail.gmail.com
обсуждение исходный текст
Ответы Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table
Список pgsql-hackers
The BRIN README notes:

"""
Since no heap TIDs are stored in a BRIN index, it's not necessary to scan the
index when heap tuples are removed.  It might be that some summary values can
be tightened if heap tuples have been deleted; but this would represent an
optimization opportunity only, not a correctness issue.  It's simpler to
represent this as the need to re-run summarization on the affected page range
rather than "subtracting" values from the existing one.  This is not
currently implemented.
"""

While I haven't studied the question in depth, I'm pretty sure that
what we do to "VACUUM" BRIN indexes has no dependency on the heap TIDs
that are pointed to by the index, because there really aren't any. The
README goes on to say:

"""
Note that if there are no indexes on the table other than the BRIN index,
usage of maintenance_work_mem by vacuum can be decreased significantly, because
no detailed index scan needs to take place (and thus it's not necessary for
vacuum to save TIDs to remove).  It's unlikely that BRIN would be the only
indexes in a table, though, because primary keys can be btrees only, and so
we don't implement this optimization.
"""

I don't think it's realistic to suppose that BRIN indexes will rarely
be the only indexes on a table. I'm not especially concerned about
maintenance_work_mem in this scenario, though -- I'm much more
concerned about the possibly unnecessary second heap scan during a
VACUUM of what is presumably a very large table. A second heap scan
occurs (lazy_vacuum_heap() is called), where that would not occur if
there were no indexes whatsoever (see commit ed8969b1).

I realize that the second scan performed by lazy_vacuum_heap() only
visits those pages known to contain dead tuples. However, the
experience of seeing problems with the random sampling of ANALYZE
makes me think that that might not be very helpful. There is no good
reason to think that there won't be a uniform distribution of dead
tuples across the heap, and so only visiting pages known to contain
dead tuples might be surprisingly little help even when there are
relatively few VACUUM-able tuples in the table.

Has any thought been given to how we could make VACUUM avoid a second
heap scan iff there are only BRIN indexes, without compromising
anything else? In other words, by killing heap TIDs *before* any
"VACUUM" of BRIN index(es) occurs, avoiding a call to
lazy_vacuum_heap(), just as when there are no indexes on the table
whatsoever?

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: parallelism and sorting
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: problem with msvc linker - cannot build orafce