Re: Freeze avoidance of very large table.

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Freeze avoidance of very large table.
Дата
Msg-id 55206743.7070401@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Freeze avoidance of very large table.  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers
On 4/4/15 5:10 PM, Jim Nasby wrote:
> On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>> +                case HEAPTUPLE_LIVE:
>> +                case HEAPTUPLE_RECENTLY_DEAD:
>> +                case HEAPTUPLE_INSERT_IN_PROGRESS:
>> +                case HEAPTUPLE_DELETE_IN_PROGRESS:
>> +                    if (heap_prepare_freeze_tuple(tuple.t_data,
>> freezelimit,
>> +                                                  mxactcutoff,
>> &frozen[nfrozen]))
>> +                        frozen[nfrozen++].offset = offnum;
>> +                    break;
>
> This doesn't seem safe enough to me. Can't there be tuples that are
> still new enough that they can't be frozen, and are still live? I don't
> think it's safe to leave tuples as dead either, even if they're hinted.
> The hint may not be written. Also, the patch seems to be completely
> ignoring actually freezing the toast relation; I can't see how that's
> actually safe.
>
> I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple
> returned false we did a second check on the tuple to ensure it was truly
> frozen.
>
> Somewhat related... instead of forcing the freeze to happen
> synchronously, can't we set this up so a table is in one of three
> states? Read/Write, Read Only, Frozen. AT_SetReadOnly and
> AT_SetReadWrite would simply change to the appropriate state, and all
> the vacuum infrastructure would continue to process those tables as it
> does today. lazy_vacuum_rel would become responsible for tracking if
> there were any non-frozen tuples if it was also attempting a freeze. If
> it discovered there were none, AND the table was marked as ReadOnly,
> then it would change the table state to Frozen and set relfrozenxid =
> InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite
> could change relfrozenxid to it's own Xid as an optimization. Doing it
> that way leaves all the complicated vacuum code in one place, and would
> eliminate concerns about race conditions with still running
> transactions, etc.
>
> BTW, you also need to put things in place to ensure it's impossible to
> unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not
> sure what the right way to do that would be.

Answering my own question... I think visibilitymap_clear() would be the 
right place. AFAICT this is basically as critical as clearing the VM, 
and that function has the Relation, so it can see what mode the relation 
is in.

There is another possibility here, too. We can completely divorce a 
ReadOnly mode (which I think is useful for other things besides 
freezing) from the question of whether we need to force-freeze a 
relation if we create a FrozenMap, similar to the visibility map. This 
has the added advantage of helping freeze scans on relations that are 
not ReadOnly in the case of tables that are insert-mostly or any other 
pattern where most pages stay all-frozen.

Prior to the visibility map this would have been a rather daunting 
project, but I believe this could piggyback on the VM code rather 
nicely. Anytime you clear the VM you clearly must clear the FrozenMap as 
well. The logic for setting the FM is clearly different, but that would 
be entirely self-contained to vacuum. Unlike the VM, I don't see any 
point to marking special bits in the page itself for FM.

It would be nice if each bit in the FM covered multiple pages, but that 
can be optimized later.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: Freeze avoidance of very large table.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: COALESCE() query yield different result with MJ vs. NLJ/HJ