Re: Millions of tables

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Millions of tables
Дата
Msg-id b4b058f0-e81b-047f-c714-5f0c4f693b99@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Millions of tables  ("Alex Ignatov \(postgrespro\)" <a.ignatov@postgrespro.ru>)
Ответы Re: Millions of tables
Список pgsql-performance
On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote:
> With millions of tables you have to set    autovacuum_max_workers
>  sky-high =). We have some situation when at thousands of tables
> autovacuum can’t vacuum all tables that need it. Simply it vacuums some
> of most modified table and never reach others. Only manual vacuum can
> help with this situation. With wraparound issue it can be a nightmare

Specifically, autovac isn't going to start worrying about anti-wrap
vacuums until tables start hitting autovacuum_freeze_max_age (or
autovacuum_multixact_freeze_max_age). Any tables that hit that threshold
go to the front of the line for being vacuumed. (But keep in mind that
there is no universal line, just what each worker computes on it's own
when it's started).

Where things will completely fall apart for you is if a lot of tables
all have roughly the same relfrozenxid (or relminmxid), like they would
immediately after a large load. In that scenario you'll suddenly have
loads of work for autovac to do, all at the same time. That will make
the database, DBAs and you Very Unhappy (tm).

Somehow, some way, you *must* do a vacuum of the entire database.
Luckily the freeze map in 9.6 means you'd only have to do that one time
(assuming the data really is static). In any older version, (auto)vacuum
will need to eventually *read everything in every table* at least once
every ~2B transactions.

The only impact the number of tables is going to have on this is
granularity. If you have a small number of large tables, you'll have
(auto)vacuum processes that will need to run *uninterrupted* for a long
time to move the freeze threshold on each table. If you have tons of
small tables, you'll need tons of separate (auto)vacuums, but each one
will run for a shorter interval, and if one gets interrupted it won't be
as big a deal.

There is one potentially significant difference between autovac and
manual vacuums here; autovac treats toast tables as just another table,
with their own stats and their own freeze needs. If you're generating a
lot of toast records that might make a difference.

When it comes to vacuum, you might find
https://www.pgcon.org/2015/schedule/events/829.en.html useful.

On a different topic... I didn't see anything in the thread about what
you're storing, but with the row counts you're talking about I'm
guessing it's something that's time-series.
https://github.com/ElephantStack/ElephantStack is a project exploring
the idea of using Postgres array types as a far more efficient way to
store that kind of data; instead of an overhead of 24 bytes per row
(plus indexes) arrays give you essentially zero overhead per row.
There's no code yet, but a few of us have done testing on some real
world data (see the google group referenced from the README).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: Unexpected expensive index scan
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Millions of tables