Re: visibility map
От | Robert Haas |
---|---|
Тема | Re: visibility map |
Дата | |
Msg-id | AANLkTimGPG+D=7g=MLDw+Yi7jhE6Tg3RphV+Z8PBJNNd@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: visibility map (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Список | pgsql-hackers |
On Tue, Nov 23, 2010 at 3:42 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > That's an interesting idea. You pickyback setting the vm bit on the freeze > WAL record, on the assumption that you have to write the freeze record > anyway. However, if that assumption doesn't hold, because the tuples are > deleted before they reach vacuum_freeze_min_age, it's no better than the > naive approach of WAL-logging the vm bit set separately. Whether that's > acceptable or not, I don't know. I don't know, either. I was trying to think of the cases where this would generate a net increase in WAL before I sent the email, but couldn't fully wrap my brain around it at the time. Thanks for summarizing. Here's another design to poke holes in: 1. Imagine that the visibility map is divided into granules. For the sake of argument let's suppose there are 8K bits per granule; thus each granule covers 64M of the underlying heap and 1K of space in the visibility map itself. 2. In shared memory, create a new array called the visibility vacuum array (VVA), each element of which has room for a backend ID, a relfilenode, a granule number, and an LSN. Before setting bits in the visibility map, a backend is required to allocate a slot in this array, XLOG the slot allocation, and fill in its backend ID, relfilenode number, and the granule number whose bits it will be manipulating, plus the LSN of the slot allocation XLOG record. It then sets as many bits within that granule as it likes. When done, it sets the backend ID of the VVA slot to InvalidBackendId but does not remove it from the array immediately; such a slot is said to have been "released". 3. When visibility map bits are set, the LSN of the page is set to the new-VVA-slot XLOG record, so that the visibility map page can't hit the disk before the new-VVA-slot XLOG record. Also, the contents of the VVA, sans backend IDs, are XLOG'd at each checkpoint. Thus, on redo, we can compute a list of all VVA slots for which visibility-bit changes might already be on disk; we go through and clear both the visibility map bit and the PD_ALL_VISIBLE bits on the underlying pages. 4. To free a VVA slot that has been released, we must xlogflush as far as the record that allocated the slot and sync the visibility map and heap segments containing that granule. Thus, all slots released before a checkpoint starts can be freed after it completes. Alternatively, an individual backend can free a previously-released slot by perfoming the xlog flush and syncs itself. (This might require a few more bookkeeping details to be stored in the VVA, but it seems manageable.) One problem with this design is that the visibility map bits never get set on standby servers. If we don't XLOG setting the bit then I suppose that doesn't happen now either, but it's more sucky (that's the technical term) if you're relying on it for index-only scans (which are also relevant on the standby, either during HS or if promoted) versus if you're only relying on it for vacuum (which doesn't happen on the standby anyway unless and until it's promoted). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: