Re: partitioning an existing table - efficient pg_dump

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: partitioning an existing table - efficient pg_dump
Дата
Msg-id 20171230184247.GH4172@telsasoft.com
обсуждение исходный текст
Ответ на Re: partitioning an existing table  (Robert Blayzor <rblayzor.bulk@inoc.net>)
Список pgsql-performance
On Sat, Dec 30, 2017 at 09:19:05AM -0500, Robert Blayzor wrote:
> On Dec 30, 2017, at 12:38 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:

> > BTW depending on your requirements, it may be possible to make pg_dump much
> > more efficient.  For our data, it's reasonable to assume that a table is
> > "final" if its constraints exclude data older than a few days ago, and it can
> > be permanently dumped and excluded from future, daily backups, which makes the
> > backups smaller and faster, and probably causes less cache churn, etc.  But I
> > imagine you might have different requirements, so that may be infeasible, or
> > you'd maybe have to track insertions, either via p
> 
> The idea is only only keep a # of months available for searching over a period of months. Those months could be 3 or
more,up to a year, etc. But being able to just drop and entire child table for pruning is very attractive. Right now
theaverage months data is about 2-3 million rows each. Data is just inserted and then only searched. Never updated…
 
> 
> I also like the idea of skipping all this older data from a PGdump. We archive records inserted into these tables
dailyinto cold storage. ie: export and compressed. So the data is saved cold. We dump the DB nightly also, but probably
wouldmake sense to skip anything outside of the newest child table. Just not sure how to make that happen, yet….
 

For us, I classify the tables as "partitioned" or "not partitioned" and
subdivide "partitioned" into "recent" or "historic" based on table names; but
if you design it from scratch then you'd have the opportunity to keep a list of
partitioned tables, their associated date range, date of most recent insertion,
and most recent "final" backup.

This is the essence of it:
snap= ... SELECT pg_export_snapshot();
pg_dump --snap "$snap" -T "$ptnreg" -f nonpartitioned.new
pg_dump --snap "$snap" -t "$recent" -f recent.new
loop around historic partitioned tables and run "final" pg_dump if it's been
  INSERTed more recently than it's been dumped.
remove any "final" pg_dump not included in any existing backup (assuming you
  keep multiple copies on different rotation).

Note that pg_dump -t/-T is different from "egrep" in a few special ways..

Justin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Blayzor
Дата:
Сообщение: Re: partitioning an existing table
Следующее
От: Hans Braxmeier
Дата:
Сообщение: Restoring a table is ten times slower on Ubuntu 14.04 than on Ubuntu16.04