Re: finding changed blocks using WAL scanning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: finding changed blocks using WAL scanning
Дата
Msg-id CA+TgmobvLUuu75QQQSsAe=+beB_GBQm1faY96iyqSBPeokp9EQ@mail.gmail.com
обсуждение исходный текст
Ответ на finding changed blocks using WAL scanning  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: finding changed blocks using WAL scanning  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
> There is one thing that does worry me about the file-per-LSN-range
> approach, and that is memory consumption when trying to consume the
> information.  Suppose you have a really high velocity system.  I don't
> know exactly what the busiest systems around are doing in terms of
> data churn these days, but let's say just for kicks that we are
> dirtying 100GB/hour.  That means, roughly 12.5 million block
> references per hour.  If each block reference takes 12 bytes, that's
> maybe 150MB/hour in block reference files.  If you run a daily
> incremental backup, you've got to load all the block references for
> the last 24 hours and deduplicate them, which means you're going to
> need about 3.6GB of memory.  If you run a weekly incremental backup,
> you're going to need about 25GB of memory.  That is not ideal.  One
> can keep the memory consumption to a more reasonable level by using
> temporary files.  For instance, say you realize you're going to need
> 25GB of memory to store all the block references you have, but you
> only have 1GB of memory that you're allowed to use.  Well, just
> hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno,
> writing each batch to a separate temporary file, and then process each
> of those 32 files separately.  That does add some additional I/O, but
> it's not crazily complicated and doesn't seem too terrible, at least
> to me.  Still, it's something not to like.

Oh, I'm being dumb.  We should just have the process that writes out
these files sort the records first.  Then when we read them back in to
use them, we can just do a merge pass like MergeAppend would do.  Then
you never need very much memory at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: PostgreSQL pollutes the file system
Следующее
От: David Rowley
Дата:
Сообщение: Re: Reducing the runtime of the core regression tests