Re: Index Scans become Seq Scans after VACUUM ANALYSE

Поиск
Список
Период
Сортировка
От Curt Sampson
Тема Re: Index Scans become Seq Scans after VACUUM ANALYSE
Дата
Msg-id Pine.NEB.4.43.0206211106390.437-100000@angelic.cynic.net
обсуждение исходный текст
Ответ на Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Bruce Momjian <pgman@candle.pha.pa.us>)
Ответы Re: Index Scans become Seq Scans after VACUUM ANALYSE
Список pgsql-hackers
On Thu, 20 Jun 2002, Bruce Momjian wrote:

> > MS SQL Server has an interesting way of dealing with this. They have a
> > "torn" bit in each 512-byte chunk of a page, and this bit is set the
> > same for each chunk. When they are about to write out a page, they first
> > flip all of the torn bits and then do the write. If the write does not
> > complete due to a system crash or whatever, this can be detected later
> > because the torn bits won't match across the entire page.
>
> I was wondering, how does knowing the block is corrupt help MS SQL?

I'm trying to recall, but I can't off hand. I'll have to look it
up in my Inside SQL Server book, which is at home right now,
unfortunately. I'll bring the book into work and let you know the
details later.

> Right now, we write changed pages to WAL, then later write them to disk.

Ah. You write the entire page? MS writes only the changed tuple.
And DB2, in fact, goes one better and writes only the part of the
tuple up to the change, IIRC. Thus, if you put smaller and/or more
frequently changed columns first, you'll have smaller logs.

> I have always been looking for a way to prevent these WAL writes.  The
> 512-byte bit seems interesting, but how does it help?

Well, this would at least let you reduce the write to the 512-byte
chunk that changed, rather than writing the entire 8K page.

> And how does the bit help them with partial block writes?  Is the bit at
> the end of the block?  Is that reliable?

The bit is somewhere within every 512 byte "disk page" within the
8192 byte "filesystem/database page." So an 8KB page is divided up
like this:
   | <----------------------- 8 Kb ----------------------> |
   | 512b | 512b | 512b | 512b | 512b | 512b | 512b | 512b |

Thus, the tear bits start out like this:
   |  0   |  0   |  0   |  0   |  0   |  0   |  0   |  0   |

After a successful write of the entire page, you have this:
   |  1   |  1   |  1   |  1   |  1   |  1   |  1   |  1   |

If the write is unsuccessful, you end up with something like this:
   |  1   |  1   |  1   |  1   |  1   |  0   |  0   |  0   |

And now you know which parts of your page got written, and which
parts didn't.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Rod Taylor"
Дата:
Сообщение: Domain coercion
Следующее
От: David Ford
Дата:
Сообщение: Re: cvs read lock