Re: big database resulting in small dump

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: big database resulting in small dump
Дата
Msg-id 500A08B5.7050000@ringerc.id.au
обсуждение исходный текст
Ответ на big database resulting in small dump  (Ilya Ivanov <forn@ngs.ru>)
Список pgsql-general
On 07/21/2012 02:05 AM, Ilya Ivanov wrote:
I have a 8.4 database (installed on ubuntu 10.04 x86_64). It holds Zabbix database. The database on disk takes 10Gb. SQL dump takes only 2Gb. I've gone through http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php and got some hints. Naturally, the biggest table is history (the second biggest is history_uint. Together they make about 95% of total size). I've tried to perform CLUSTER on it, but seemed to be taking forever (3 hours and still not completed). So I cancelled it and went with database drop and restore. It resulted in database taking up 6.4Gb instead of 10Gb. This is a good improvement, but still isn't quite what I expect. I would appreciate some clarification.

To elaborate on the answers already posted:

Plain text dumps only contain the data its self. In many databases the table contents are a small part of the overall database size. Additionally, data is stored on disk in a structure optimised for speed of access, not disk space consumption, so the same data can be much more compact in the dump format.

Finally, data in dumps is much, much more efficiently compressed than it is in the tables. In tables the main rows aren't compressed at all, and TOASTed values like big text fields are individually compressed, which is immensely less space efficient than compressing them all together. On the other hand, it allows random access where the dump format just doesn't - kind of important for a database!


The rest of the space is used by:

- Indexes, which can get quite big.

- Free space in tables from deleted rows that haven't yet been replaced by a new inserted row.
  If you have a non-default FILLFACTOR there can be lots of this.

- "bloat" - wasted space in tables and indexes, typically caused by insufficiently frequent autovacuum

- ... probably more I've forgotten


When you dump and reload you not only get rid of any bloat in your tables and indexes, but you effectively REINDEX your database. PostgreSQL can often create much more compact and efficient index structures when it does a CREATE INDEX on a full table (like when restoring a dump) than when it does a CREATE INDEX on an empty table followed by lots of inserts.

--
Craig Ringer

В списке pgsql-general по дате отправления:

Предыдущее
От: "David Johnston"
Дата:
Сообщение: Re: A Better Way? (Multi-Left Join Lookup)
Следующее
От: Chris Angelico
Дата:
Сообщение: Re: Select Rows With Only One of Two Values