Обсуждение: Increased size of database dump even though LESS consumed storage

Поиск
Список
Период
Сортировка

Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Hi all,

for various reasons I've migrated my database schema from 4 tables
with some additional indexes to keep integrity and stuff to 1 table
only. That made some of the former used indexes obsolete and resulted
in overall less consumed storage:

The old schema consumed ~42 GiB, while the new is ~16 GiB without the
formerly available indexes and ~25 GiB with the same logical indexes.
Though, a created dump of the new schema has increased from ~5,52 GiB
to 6,38 GiB. Of course I'm using the same settings to create both
dumps:

> pg_dump.exe "--username=%USERNAME%" "--encoding=UTF-8" "--compress=9" "--format=c" "--dbname=%DB_NAME%" >
"%DMP_PATH%"

My expectation was that the dump would be smaller as well, because the
data itself is the same, while lots of duplicate IDs, obsolete indexes
etc. in not available tables anymore have been removed.

Any explanation for the increased dump size? Thanks!

The tables themself were changed like the following:

> CREATE TABLE clt_rec
> (
>   id          bigserial                 NOT NULL,
>   oms_rec     bigint                    NOT NULL,
>   captured_at timestamp with time zone  NOT NULL,
>   rssi        smallint                  NOT NULL
> );

> CREATE TABLE oms_rec
> (
>   id        bigserial NOT NULL,
>   meter     integer   NOT NULL,
>   encrypted bytea,
>   decrypted bytea
> );

> CREATE TABLE clt_rec_src
> (
>   id          bigserial NOT NULL,
>   real_estate integer   NOT NULL,
>   collector   integer   NOT NULL,
>   clt_rec     bigserial
> );

> CREATE TABLE meter_status_history
> (
>   id      serial  NOT NULL,
>   oms_rec bigint  NOT NULL,
>   status  smallint
> );

vs.

> CREATE TABLE datagram
> (
>   id            bigserial                 NOT NULL,
>   src_re        integer                   NOT NULL,
>   src_clt       integer                   NOT NULL,
>   src_meter     integer                   NOT NULL,
>   captured_at   timestamp with time zone  NOT NULL,
>   captured_rssi smallint                  NOT NULL,
>   oms_status    smallint                  NOT NULL,
>   oms_enc       bytea,
>   oms_dec       bytea
> );

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: Increased size of database dump even though LESS consumed storage

От
Tom Lane
Дата:
=?utf-8?B?VGhvcnN0ZW4gU2Now7ZuaW5n?=<tschoening@am-soft.de> writes:
> for various reasons I've migrated my database schema from 4 tables
> with some additional indexes to keep integrity and stuff to 1 table
> only. That made some of the former used indexes obsolete and resulted
> in overall less consumed storage:
> The old schema consumed ~42 GiB, while the new is ~16 GiB without the
> formerly available indexes and ~25 GiB with the same logical indexes.
> Though, a created dump of the new schema has increased from ~5,52 GiB
> to 6,38 GiB. Of course I'm using the same settings to create both
> dumps:
>> pg_dump.exe "--username=%USERNAME%" "--encoding=UTF-8" "--compress=9" "--format=c" "--dbname=%DB_NAME%" >
"%DMP_PATH%"
> My expectation was that the dump would be smaller as well, because the
> data itself is the same, while lots of duplicate IDs, obsolete indexes
> etc. in not available tables anymore have been removed.

Removing indexes won't in itself make any noticeable difference in the
size of pg_dump output, since an index is just represented by a CREATE
INDEX (or equivalent) command.

My guess is that the rearrangement somehow made the table data less
amenable to compression.  gzip depends on finding similar substrings
within a fairly narrow window (a few KB), so at least in principle,
just changing the order of rows could make a difference.  I'd sort
of expect compression opportunity losses to more or less balance out
with opportunity gains over such a large data volume, but maybe you
were unlucky.  Or perhaps the move into a single table was done in
such a way that it actually decreased locality-of-reference, eg maybe
similar rows were grouped before and now they're not.

            regards, tom lane



Re: Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Guten Tag Thorsten Schöning,
am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie:

> Any explanation for the increased dump size? Thanks!

Some more data:

> --compress=0:       20  vs. 25 GiB
> --compress=DEFAULT: 6,5 vs. 5,6 GiB
> --compress=9:       6,4 vs. 5,5 GiB

From the docs:

> For the custom archive format, this specifies compression of
> individual table-data segments[...]

https://www.postgresql.org/docs/10/app-pgdump.html

So the new table layout really seems to negatively impact compression,
even though in both cases the "bytea" are at the end of the table. The
only difference is the number of overall columns per table. Wouldn't
have expected that.

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Guten Tag Thorsten Schöning,
am Dienstag, 9. Februar 2021 um 10:56 schrieben Sie:

> Any explanation for the increased dump size? Thanks!

While I don't have a detailed explanation myself, there's the
following hint [in the docs][1]:

> For the custom archive format, this specifies compression of
> individual table-data segments[...]

https://www.postgresql.org/docs/10/app-pgdump.html

I've changed the layout of my tables entirely and most likely have
changed the "individual table-data segments" as well. So I did some
further testing and got the following numbers. It's always OLD vs. NEW
schema for the schema documented earlier:

> --compress=0: 25  vs. 20  GiB
> --compress=?: 5,6 vs. 6,5 GiB
> --compress=9: 5,5 vs. 6,4 GiB

"?" means default settings, meaning "--compress" has been REMOVED from
the shell command.

As can be seen, pretty much as soon as compression comes into play,
the new table layout is less efficient, even though in OLD and NEW
schema exactly the same "bytea" were used, have been placed at the end
of their individual tables and alike. But things become more
interesting with another version of my schema, which as well indicates
a strong relationship of dump size, compression and table layout/size.

I've changed the new table "datagram" to be a partitioned one with
partitions containing rows per year and per half-year. Each partition
contains far less rows than before this way and while I only tested
with "--compress=9" this time, the numbers are quite interesting:

> unpartitioned:     6,4 GiB
> half-yearly parts: 4,8 GiB
> yearly parts:      4,8 GiB

The interesting part this time is that the table layout for all
partitions is the same like before, only the number of rows per table
is different. Though, the number of rows overall is the same like
before, the same data, IDs etc. Though, this time the dump really is
smaller than with the OLD schema containing far more data because of
duplicate IDs and stuff.

I wouldn't have expected table layout to be that important.

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: Increased size of database dump even though LESS consumed storage

От
Francisco Olarte
Дата:
Thorsten:

On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning <tschoening@am-soft.de> wrote:
...
> I've changed the new table "datagram" to be a partitioned one with
> partitions containing rows per year and per half-year. Each partition
> contains far less rows than before this way and while I only tested
> with "--compress=9" this time, the numbers are quite interesting:
>
> > unpartitioned:     6,4 GiB
> > half-yearly parts: 4,8 GiB
> > yearly parts:      4,8 GiB
>
> The interesting part this time is that the table layout for all
> partitions is the same like before, only the number of rows per table
> is different. Though, the number of rows overall is the same like
> before, the same data, IDs etc. Though, this time the dump really is
> smaller than with the OLD schema containing far more data because of
> duplicate IDs and stuff.
> I wouldn't have expected table layout to be that important.

Compresion is dependent on detectable redundancy on the input. pg_dump
more or less gzips per-table "copy to stdout" dumps. If your data
evolves in a determined way having it sorted by ingestion time may
increase detectable redundancy a lot, and partitioning sorts partially
by date ( or fully if you have made the partitions by range-querying
via index scan ). In this case it may not be the layout, but the
order.

Given you seem to be able to test, you may try sorting the full table
by the column you use for partitioning. IIRC cluster will do the trick
if it is indexed.

( This has happened to me compressing document dumps, presorting by
some chosen fields improved my compression ratio a bit, IIRC it was
due to the compressor searching for duplicates on a limited window
only, this is why I use "detectable" redundancy )

Francisco Olarte.



Re: Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id             bigserial                 NOT NULL,
>   src_re         integer                   NOT NULL,
>   src_clt        integer                   NOT NULL,
>   src_meter      integer                   NOT NULL,
>   captured_at    timestamp with time zone  NOT NULL,
>   captured_rssi  smallint                  NOT NULL,
>   oms_status     smallint                  NOT NULL,
>   oms_enc        bytea,
>   oms_dec        bytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Guten Tag Francisco Olarte,
am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:

> [...]and partitioning sorts partially
> by date ( or fully if you have made the partitions by range-querying
> via index scan ).[...]

That statement is especially interesting not only because of my
dump-size, but I'm running into the problem that queries spanning more
than one partition seem to prefer sequential scan over using indexes.
My indexe seems to only be used when querying the rows of one
partition.

So, the following is my definition, should that be "range-queried via
index scan" properly? :-)

> CREATE TABLE datagram
> (
>   id             bigserial                 NOT NULL,
>   src_re         integer                   NOT NULL,
>   src_clt        integer                   NOT NULL,
>   src_meter      integer                   NOT NULL,
>   captured_at    timestamp with time zone  NOT NULL,
>   captured_rssi  smallint                  NOT NULL,
>   oms_status     smallint                  NOT NULL,
>   oms_enc        bytea,
>   oms_dec        bytea
> ) PARTITION BY RANGE (captured_at);

> CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM ('1970-01-01') TO ('1970-07-01');
> CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM ('1970-07-01') TO ('1971-01-01');
> [...]
> INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;
> CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, captured_at DESC);

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: Increased size of database dump even though LESS consumed storage

От
Francisco Olarte
Дата:
Thorsten:

On Wed, Feb 10, 2021 at 12:36 PM Thorsten Schöning
<tschoening@am-soft.de> wrote:
> Guten Tag Francisco Olarte,
> am Mittwoch, 10. Februar 2021 um 10:38 schrieben Sie:
> > [...]and partitioning sorts partially
> > by date ( or fully if you have made the partitions by range-querying
> > via index scan ).[...]
> That statement is especially interesting not only because of my
> dump-size, but I'm running into the problem that queries spanning more
> than one partition seem to prefer sequential scan over using indexes.
> My indexe seems to only be used when querying the rows of one
> partition.

You know the drill, show your indexes, post some explain for these,
those statements are too fuzzy to infer any useful advice.

> So, the following is my definition, should that be "range-queried via
> index scan" properly? :-)
> > CREATE TABLE datagram
> > (
> >   id             bigserial                 NOT NULL,
> >   src_re         integer                   NOT NULL,
> >   src_clt        integer                   NOT NULL,
> >   src_meter      integer                   NOT NULL,
> >   captured_at    timestamp with time zone  NOT NULL,
> >   captured_rssi  smallint                  NOT NULL,
> >   oms_status     smallint                  NOT NULL,
> >   oms_enc        bytea,
> >   oms_dec        bytea
> > ) PARTITION BY RANGE (captured_at);
> > CREATE TABLE datagram_y1970_h1 PARTITION OF datagram FOR VALUES FROM ('1970-01-01') TO ('1970-07-01');
> > CREATE TABLE datagram_y1970_h2 PARTITION OF datagram FOR VALUES FROM ('1970-07-01') TO ('1971-01-01');
> > [...]

Given that definition there is no possible index scan, as there are no indexes.

> > INSERT INTO datagram([...]) SELECT * FROM datagram_unpart;

But given this what I was suggesting is not true. What I meant is that
if you had big datagram_unpart table indexed by captured_at with good
index correlation, which given the name of the field would be
expected, and populated the partitions one by one by doing range
queries over datagram unpart you may have hit index scans on each
population query and end up with perfectly sorted partitions, and that
could help compressions.

Given your query, unless the optimizer is performing some supernatural
( for me ) trickery my bet would be on a full table scan plus
insertion which would left you with whatever order you had originally
for each partition.

But if your data, or your "data shape" has some natural correlation
with captured timestamps ( i.e., I know that my CDR data has it ),
puttig similar data together could have improved your compression
ratios. Bear in mind I do not know it.

Anyway, I would not worry too much about the backup ratios with the
sizes you have, I would worry more on doing things like augmenting
fill factors and similar things if your data processing is similar to
mine ( data enters at time captured_at, sometimes is fixed due to
errors but normally never moves after being in the tables for a couple
of months, and when it does it is a special problems which can be
manually tuned ). It will not improve the backup too much, but can
improve other things. i.e., in my big CDR ( that means call detail
record, telephony ) tables I let them insert into "normal" monthly
partitions, but when they are some month olds I coalesce them into
fully sorted, unindexed, 100% filled yearly partitions. Those
partitions are then put into an "historic" schema which I do NOT
backup automatically, I only do it after the ( manually fired )
coalescing and the ( manually done ) very ocasional long term fixups (
I rarely have to touch nothing older than a month ).

Regards.
    Francisco Olarte.



Re: Increased size of database dump even though LESS consumed storage

От
Francisco Olarte
Дата:
Thorsten:

On Wed, Feb 10, 2021 at 12:46 PM Thorsten Schöning
<tschoening@am-soft.de> wrote:
....
UUppzz, completely missed the second message.

> > CREATE INDEX idx_datagram_for_time_window ON datagram USING btree (src_meter, captured_at DESC);

Depending on the query and data "shape", this may be chosen or not for
queries. You should be the one more probable to know it.

Francisco Olarte.



Re: Increased size of database dump even though LESS consumed storage

От
Thorsten Schöning
Дата:
Guten Tag Thorsten Schöning,
am Mittwoch, 10. Februar 2021 um 09:58 schrieben Sie:

>> unpartitioned:     6,4 GiB
>> half-yearly parts: 4,8 GiB
>> yearly parts:      4,8 GiB

The above number for `unpartitioned` might be wrong: I've re-created
the same database, applied the same data migration like in the past
and created a dump which was 4,8 GiB in size. Maybe I simply did
something wrong and left some renamed table with data in the formerly
used database, while NOT doing so for the other test databases. So my
entire question might simply be based on user errors. :-)

Sadly I don't seem to have the former dumps anymore to have a look at
its content.

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska