Re: Set visibility map bit after HOT prune

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Set visibility map bit after HOT prune
Дата
Msg-id CABOikdPpMWT4tDyEJ4j9XM1_rM=sL-L+Ds6hWOG4WcCbSMxWBA@mail.gmail.com
обсуждение исходный текст
Ответ на Set visibility map bit after HOT prune  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers

Sorry for a long pause on this thread. A new arrival at home kept me occupied all the time.

This thread saw a lot of ideas and suggestions from different people. I don't think we had an agreement one way or the other on any of them, but let me summarize the discussion for archival and taking further action if deemed necessary.

Suggestion 1: Set the visibility map bit after HOT prune

The rational for this idea is to improve the chances of an index-only scan happening after HOT prune. This is especially interesting when the a table gets random updates or deletes, each of which will clear the VM bit. The table may not still come up for vacuum, either because the number of updates/deletes are not over the vac threshold or because subsequent HOT prune did not leave any work for vacuum. The only place where we set the VM bits again is during vacuum. So this idea would add another path where VM bits are set. This would also help vacuums to avoid visiting those heap pages that don't have any work to be done.

The main objection to this idea is that this may result in too much flip-flopping of the bit, especially if the HOT prune is to be followed by an UPDATE to the page. This is a valid concern. But the way HOT prune works today, it has no linkage to the future UPDATE operations other than the fact that it frees up space for future UPDATE operations. But the prune can happen even in a select code path. Suggestion 2 below is about changing this behavior, but my point is to consider 1 unless and until we do 2. Tom and Simon opposed saying we need to take a holistic view.  Another concern with this idea is that VM bit set operation now generates WAL and repeated setting/clearing of the bit may increase WAL activity. I suggested to piggy back the VM bit set logging with the HOT prune WAL log. Robert raised some doubts regarding increased full-page writes if VM set LSN is recorded in the heap page LSN. I am not sure if that applies if we piggy back the operation though because HOT prune WAL would anyway record LSN in the heap page.

If we do this, we should also consider updating FSM after prune because vacuum may not scan this page at all.

Suggestion 2: Move HOT prune logic somewhere else

Tom/Simon suggested that we ought to consider moving HOT prune to some other code path. When we implemented HOT a few years back, we wanted to make it as less invasive as possible. But now the code has proven stability, we can experiment a few more things. Especially, we would like to prune only if the page is going to receive an UPDATE soon. Otherwise, pruning may unnecessarily add overheads to a simple read-only query and the space freed up by prune may not even be used soon/ever. Tom suggested that we can teach planner/executor to distinguish between a scan on a normal relation vs result relation. I'd some concerns that even if we have such mechanism, it may not be enough because a scan does not guarantee that the tuple will be finally updated because it may fail qualification etc.

Simon has strong views regarding burdening SELECTs with maintenance work, but Robert and I are not convinced that its necessarily a bad idea to let SELECTs do a little extra work which can help to keep the overall state healthy.  But in general, it might be a good idea to try such approaches and see if we can extract more out of the system. Suggestion 5 and 6 also popped up to handle this problem in a slightly different way.

Suggestion 3: Don't clear visibility map bit after HOT update

I proposed this during the course of discussion and Andreas F liked/supported the idea. This could be useful when most/all updates are HOT updates. So the page does not need any work during vacuum (assuming HOT prune will take care of it) and index-only scans still work because the index pointers will be pointing to a valid HOT chain. Tom/Simon didn't quite like it because they were worried that this will change the meaning on the VM. I (and I think even Andreas) don't think that way. Of course, there are some concerns because this will break the use of PD_ALL_VISIBLE flag for avoiding MVCC checks during heap scans. There are couple of suggestions to fix that like having another page level bit to differentiate these two states. Doing that will help us skip MVCC checks even if there are one or more DEAD line pointers in the page. We should also run some performance tests to see how much benefit is really served by skipping MVCC checks in heap scans. We can weigh that against the benefit of keeping the VM bit set.

Suggestion 4: Freeze tuples during HOT prune

I suggested that we can also freeze tuples during HOT prune. The rational for doing it this way is to remove unnecessary work from vacuum by piggy-backing the freeze logging in the HOT prune WAL record. Today vacuum will generate additional WAL and dirty the buffers again just to freeze the tuples. There are couple of objections to this idea. One is pushes background work into foreground operation. My explanation to that is the additional work is not much since we are already doing a lot other things in HOT prune and the extra work is justified because it will save us much more extra work later on. Another objection to the idea is that we might freeze a tuple which gets unfreezed again because say it gets updated or deleted. Also, we may lose forensic information by overwriting the xmin too soon. A possible solution (and Robert mentioned that it was suggested even before) for the latter issue is to have a separate tuple header flag to designate frozen tuples. This might be good in any case since we then don't lose forensic information ever.

Suggestion 5: Add a cost model so that a transaction does only limited maintenance activity such as hint bit setting and HOT prune

Simon liked the idea and suggested having a GUC like transaction_cleanup_cost. One can set this at a session level and the amount of extra work that is done by any transaction will be governed by its value. The default would be the current behavior and we might place some lower limit so that every transaction at least contribute that much toward maintenance activities. I did not see any objections to the idea.

Suggestion 6: Move maintenance activity to background workers

The idea here is to leave the maintenance activities to the background threads. For example, when a backend finds that a heap page needs pruning, it can send that information to the background thread instead of doing that itself. Once 9.3 has background thread infrastructure, this might be worth exploring further.

Thanks,
Pavan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: pg_upgrade with parallel tablespace copying
Следующее
От: james
Дата:
Сообщение: Re: json api WIP patch