Re: Add index scan progress to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка
От Imseih (AWS), Sami
Тема Re: Add index scan progress to pg_stat_progress_vacuum
Дата
Msg-id 077D7D2E-FDC1-4884-A522-2E7770A2ED8C@amazon.com
обсуждение исходный текст
Ответ на Re: Add index scan progress to pg_stat_progress_vacuum  ("Bossart, Nathan" <bossartn@amazon.com>)
Ответы Re: Add index scan progress to pg_stat_progress_vacuum  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers

On 12/15/21, 4:10 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

    On 12/1/21, 3:02 PM, "Imseih (AWS), Sami" <simseih@amazon.com> wrote:
    > The current implementation of pg_stat_progress_vacuum does not
    > provide progress on which index is being vacuumed making it
    > difficult for a user to determine if the "vacuuming indexes" phase
    > is making progress. By exposing which index is being scanned as well
    > as the total progress the scan has made for the current cycle, a
    > user can make better estimations on when the vacuum will complete.

    +1

    > The proposed patch adds 4 new columns to pg_stat_progress_vacuum:
    >
    > 1. indrelid - the relid of the index being vacuumed
    > 2. index_blks_total - total number of blocks to be scanned in the
    > current cycle
    > 3. index_blks_scanned - number of blocks scanned in the current
    > cycle
    > 4. leader_pid - if the pid for the pg_stat_progress_vacuum entry is
    > a leader or a vacuum worker. This patch places an entry for every
    > worker pid ( if parallel ) as well as the leader pid

    nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed?  IMO it
    is more analogous to heap_blks_vacuumed.

No, What is being tracked is the number of index blocks scanned from the total index blocks. The block will be scanned
regardlessif it will be vacuumed or not. 
 

    This will tell us which indexes are currently being vacuumed and the
    current progress of those operations, but it doesn't tell us which
    indexes have already been vacuumed or which ones are pending vacuum.
    I think such information is necessary to truly understand the current
    progress of vacuuming indexes, and I can think of a couple of ways we
    might provide it:

      1. Make the new columns you've proposed return arrays.  This isn't
         very clean, but it would keep all the information for a given
         vacuum operation in a single row.  The indrelids column would be
         populated with all the indexes that have been vacuumed, need to
         be vacuumed, or are presently being vacuumed.  The other index-
         related columns would then have the associated stats and the
         worker PID (which might be the same as the pid column depending
         on whether parallel index vacuum was being done).  Alternatively,
         the index column could have an array of records, each containing
         all the information for a given index.
      2. Create a new view for just index vacuum progress information.
         This would have similar information as 1.  There would be an
         entry for each index that has been vacuumed, needs to be
         vacuumed, or is currently being vacuumed.  And there would be an
         easy way to join with pg_stat_progress_vacuum (e.g., leader_pid,
         which again might be the same as our index vacuum PID depending
         on whether we were doing parallel index vacuum).  Note that it
         would be possible for the PID of these entries to be null before
         and after we process the index.
      3. Instead of adding columns to pg_stat_progress_vacuum, adjust the
         current ones to be more general, and then add new entries for
         each of the indexes that have been, need to be, or currently are
         being vacuumed.  This is the most similar option to your current
         proposal, but instead of introducing a column like
         index_blks_total, we'd rename heap_blks_total to blks_total and
         use that for both the heap and indexes.  I think we'd still want
         to add a leader_pid column.  Again, we have to be prepared for
         the PID to be null in this case.  Or we could just make the pid
         column always refer to the leader, and we could introduce a
         worker_pid column.  That might create confusion, though.

    I wish option #1 was cleaner, because I think it would be really nice
    to have all this information in a single row.  However, I don't expect
    much support for a 3-dimensional view, so I suspect option #2
    (creating a separate view for index vacuum progress) is the way to go.
    The other benefit of option #2 versus option #3 or your original
    proposal is that it cleanly separates the top-level vacuum operations
    and the index vacuum operations, which are related at the moment, but
    which might not always be tied so closely together.

Option #1 is not clean as you will need to unnest the array to make sense out of it. It will be too complex to use.
Option #3 I am reluctant to spent time looking at this option. It's more valuable to see progress per index instead of
total.
 
Option #2 was one that I originally designed but backed away as it was introducing a new view. Thinking about it a bit
more,this is a cleaner approach. 
 
1. Having a view called pg_stat_progress_vacuum_worker to join with pg_stat_progress_vacuum is clean
2. No changes required to pg_stat_progress_vacuum
3. I’ll lean towards calling the view " pg_stat_progress_vacuum_worker" instead of " pg_stat_progress_vacuum_index", to
perhapsallow us to track other items a vacuum worker may do in future releases. As of now, only indexes are vacuumed by
workers.
I will rework the patch for option #2

    Nathan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Add index scan progress to pg_stat_progress_vacuum