Re: Freezing without write I/O

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Freezing without write I/O
Дата	19 сентября 2013 г. 15:40:42
Msg-id	20130919124035.GI8288@awork2.anarazel.de обсуждение исходный текст
Ответ на	Re: Freezing without write I/O (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы	Re: Freezing without write I/O (Andres Freund <andres@2ndquadrant.com>) Re: Freezing without write I/O (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2013-09-19 14:42:19 +0300, Heikki Linnakangas wrote:
> On 18.09.2013 16:22, Andres Freund wrote:
> >* Why can we do a GetOldestXmin(allDbs = false) in
> >   BeginXidLSNRangeSwitch()?
> 
> Looks like a bug. I think I got the arguments backwards, was supposed to be
> allDbs = true and ignoreVacuum = false. I'm not sure if vacuums could be
> ignored, but 'false' is the safe option.

Not sure either...

> >* the lsn ranges file can possibly become bigger than 512bytes (the size
> >   we assume to be written atomically) and you write it inplace. If we
> >   fail halfway through writing, we seem to be able to recover by using
> >   the pageMatureLSN from the last checkpoint, but it seems better to
> >   do the fsync(),rename(),fsync() dance either way.
> 
> The lsn-range array is also written to the xlog in toto whenever it changes,
> so on recovery, the ranges will be read from the WAL, and the ranges file
> will be recreated at the end-of-recovery checkpoint.

But we will read the file before we read the WAL record covering it,
won't we?

> >* Should we preemptively freeze tuples on a page in lazy_scan_heap if we
> >   already have dirtied the page? That would make future modifcations
> >   cheaper.

> Hmm, dunno. Any future modification will also dirty the page, so you're not
> actually saving any I/O by freezing it earlier. You're just choosing to do
> the CPU work and WAL insertion earlier than you have to. And if the page is
> not modified later, it is a waste of cycles. That said, maybe it would
> indeed be better to do it in vacuum when you have a chance to reduce latency
> in the critical path, even if it might be a waste.

Well, if we already dirtied the page (as in vacuum itself, to remove
dead tuples) we already have to do WAL. True it's a separate record, but
that should probably be fixed.

> >* lazy_scan_heap now blocks acquiring a cleanup lock on every buffer
> >   that contains dead tuples. Shouldn't we use some kind of cutoff xid
> >   there? That might block progress too heavily. Also the comment above
> >   it still refers to the old logic.
> 
> Hmm, you mean like skip the page even if there are some dead tuples on it,
> as long as the dead tuples are not older than X xids? I guess we could do
> that. Or the other thing mentioned in the comments, ie. remember the page
> and come back to it later.
>
> For now though, I think it's good enough as it is.

I have seen vacuum being slowed down considerably because we couldn't
acquire a cleanup lock. It's not that infrequent to have pages that are
pinned most of the time. And that was when we only acquired the cleanup
lock when necessary for freezing.

Not processing the page will not mark it as all-visible which basically
is remembering it...

> >* switchFinishXmin and nextSwitchXid should probably be either volatile
> >   or have a compiler barrier between accessing shared memory and
> >   checking them. The compiler very well could optimize them away and
> >   access shmem all the time which could lead to weird results.
> 
> Hmm, that would be a weird "optimization". Is that really legal for the
> compiler to do? Better safe than sorry, I guess.

I think it is. The compiler doesn't know anything about shared memory or
threads, so it doesn't see that those parameters can change. It will be
forced to load them as soon as an non-inlined function is called which
actually might make it safe in this case - unless you compile in LTO
mode where it recognizes that TransactionIdFollowsOrEquals() won't
modify thigns...

> >* I wonder whether the fact that we're doing the range switches after
> >   acquiring an xid could be problematic if we're preventing xid
> >   allocation due to the checks earlier in that function?
> 
> You mean that you might get into a situation where you cannot assign a new
> XID because you've reached xidStopLimit, and you cannot advance xidStopLimit
> because you can't switch to a new xid-lsn range, because no new XIDs are
> being assigned? Yeah, that would be nasty. The range management stuff should
> really be moved out of GetNewTransaction(), maybe do them in walwriter or
> bgwriter as Alvaro suggested.

Yes, I think we should do that. I am not sure yet where to put it
best though. Doing it in the wal writer doesn't seem to be a good idea, doing
more there will increase latency for normal backends.

> >* I think heap_lock_tuple() needs to unset all-visible, otherwise we
> >   won't vacuum that page again which can lead to problems since we
> >   don't do full-table vacuums again?
> 
> It's OK if the page is never vacuumed again, the whole point of the patch is
> that old XIDs can be just left lingering in the table. The next time the
> page is updated, perhaps to lock a tuple again, the page will be frozen and
> the old xmax will be cleared.

Yes, you're right, it should be possible to make it work that way. But
currently, we look at xmax and infomask of a tuple in heap_lock_tuple()
*before* the PageUpdateNeedsFreezing() call. Currently we would thus
create a new multixact with long dead xids as members and such.

Or am I missing something?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 19 сентября 2013 г., 15:38:28
Сообщение: Re: Assertions in PL/PgSQL

Следующее

От: Kevin Grittner
Дата: 19 сентября 2013 г., 15:41:49
Сообщение: Re: record identical operator

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Freezing without write I/O

Предыдущее

Следующее