Обсуждение: Database Bloat
Hi, I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows each ). The flat file size is only 25M. However, the equivalent database table is 548MB. This is without any indexes applied and auto vacuum turned on. I have read that the bloat can be around 5 times greater for tables than flat files so over 20 times seems quite excessive. Any ideas on how to go about decreasing this bloat or is this not unexpected for such large tables? Thanks
On 08/20/12 10:53 AM, elliott wrote: > Hi, > > I am using PostgreSQL 9.1 and loading very large tables ( 13 million > rows each ). The flat file size is only 25M. However, the equivalent > database table is 548MB. This is without any indexes applied and auto > vacuum turned on. I have read that the bloat can be around 5 times > greater for tables than flat files so over 20 times seems quite > excessive. > > Any ideas on how to go about decreasing this bloat or is this not > unexpected for such large tables? what do the fields of this table look like (output from \d tablename would be useful) -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of elliott > Sent: Monday, August 20, 2012 1:54 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Database Bloat > > Hi, > > I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows each > ). The flat file size is only 25M. However, the equivalent database table is > 548MB. This is without any indexes applied and auto vacuum turned on. I > have read that the bloat can be around 5 times greater for tables than flat > files so over 20 times seems quite excessive. > > Any ideas on how to go about decreasing this bloat or is this not unexpected > for such large tables? > > Thanks > Kinda guessing here but that 5x estimate has some assumptions built in. I am guessing that a table that has many large plain-text data would compress more than one with a mixture of numbers, varchars, dates and other less-compressible datatypes. It would help to provide a general description of the structure of said tables and how large individual fields (if bytea or text) tend to be. I would think that filesystem parameters come into play as well and you do not specify the OS on which you are running. Do you have any idea which specific tables are "bloated"? If there are only a few main contributors what is different about them? More questions than answers but something to ponder while you wait for more knowledgeable persons to respond. David J.
envdb=# \d astgtm2_n60e073; Table "public.astgtm2_n60e073" Column | Type | Modifiers --------+---------+----------- lat | real | lon | real | alt | integer | Indexes: "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER On 8/20/2012 2:10 PM, John R Pierce wrote: > On 08/20/12 10:53 AM, elliott wrote: >> Hi, >> >> I am using PostgreSQL 9.1 and loading very large tables ( 13 million >> rows each ). The flat file size is only 25M. However, the >> equivalent database table is 548MB. This is without any indexes >> applied and auto vacuum turned on. I have read that the bloat can >> be around 5 times greater for tables than flat files so over 20 times >> seems quite excessive. >> >> Any ideas on how to go about decreasing this bloat or is this not >> unexpected for such large tables? > > > what do the fields of this table look like (output from \d tablename > would be useful) > >
On 08/20/12 11:46 AM, elliott wrote: > envdb=# \d astgtm2_n60e073; > Table "public.astgtm2_n60e073" > Column | Type | Modifiers > --------+---------+----------- > lat | real | > lon | real | > alt | integer | > Indexes: > "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER so, you DO have an index. what type does this function q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which means every 12 byte (real,real,int) row has a corresponding b-tree entry of 8 bytes plus b-tree stuff. I see you used cluster on this, did you do the CLUSTER after populating the table, and before checking the pg_total_relation_size that you reported as 20X your data ? -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On Mon, Aug 20, 2012 at 2:33 PM, John R Pierce <pierce@hogranch.com> wrote: > On 08/20/12 11:46 AM, elliott wrote: >> >> envdb=# \d astgtm2_n60e073; >> Table "public.astgtm2_n60e073" >> Column | Type | Modifiers >> --------+---------+----------- >> lat | real | >> lon | real | >> alt | integer | >> Indexes: >> "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER > > > so, you DO have an index. what type does this function > q3c_ang2ipix(real,real) return ? googling it suggested a bigint, which > means every 12 byte (real,real,int) row has a corresponding b-tree entry of > 8 bytes plus b-tree stuff. I see you used cluster on this, did you do the > CLUSTER after populating the table, and before checking the > pg_total_relation_size that you reported as 20X your data ? Apparently it returns a bigint: https://www.google.com/search?q=q3c_ang2ipix&rlz=1C1CHKZ_enUS437US437&sugexp=chrome,mod=17&sourceid=chrome&ie=UTF-8 Anyways, the overhead for vary narrow tuples is going to be higher for very narrow tables. Your tuple is only 12 bytes. Each tuple has 23 bytes of overhead typically see: (http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html). Also, measuring total database size is pretty silly because there's a number of things that are global and don't increase as your data increases. To get table size, try doing SELECT pg_size_pretty(pg_relation_size('astgtm2_n60e073')); SELECT pg_size_pretty(pg_relation_size('q3c_astgtm2_n60e073_idx')); merlin
elliott <elliott@cpi.com> writes: > Hi, > > I am using PostgreSQL 9.1 and loading very large tables ( 13 million > rows each ). The flat file size is only 25M. However, the equivalent > database table is 548MB. This is without any indexes applied and auto > vacuum turned on. I have read that the bloat can be around 5 times > greater for tables than flat files so over 20 times seems quite > excessive. > > Any ideas on how to go about decreasing this bloat or is this not > unexpected for such large tables? Well, check if the table has a low fill-factor setting. \d+ footable Hasn't come up yet in the remarks by others on this thread but worth verifying. > Thanks > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 732.216.7255
On Mon, Aug 20, 2012 at 10:53 AM, elliott <elliott@cpi.com> wrote: > Hi, > > I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows > each ). The flat file size is only 25M. That is only 2 bytes per row. Is the size given for the flat file for a compressed file? Cheers, Jeff
Yes, it is a tif file. Uncompressed it is around 85M. On 8/22/2012 1:20 PM, Jeff Janes wrote: > On Mon, Aug 20, 2012 at 10:53 AM, elliott<elliott@cpi.com> wrote: > >> Hi, >> >> I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows >> each ). The flat file size is only 25M. >> > That is only 2 bytes per row. Is the size given for the flat file for > a compressed file? > > Cheers, > > Jeff > >
On Wed, Aug 22, 2012 at 12:25 PM, elliott <elliott@cpi.com> wrote: > Yes, it is a tif file. Uncompressed it is around 85M. ok, 85 -> 548mb is reasonable considering you have very narrow rows and an index that covers 2/3 of your column data. if you want to see dramatic reduction in table size, you probably need to explore use of arrays in some fashion. merlin
On 08/22/12 10:25 AM, elliott wrote: > Yes, it is a tif file. Uncompressed it is around 85M. a tif file is a pixel map image, eg, graphics, no? I thought we were talking about CSV data here? -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of John R Pierce > Sent: Wednesday, August 22, 2012 4:32 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Database Bloat > > On 08/22/12 10:25 AM, elliott wrote: > > Yes, it is a tif file. Uncompressed it is around 85M. > > a tif file is a pixel map image, eg, graphics, no? I thought we were > talking about CSV data here? > If I had to hazard a guess I think he meant "TAR" since he is talking about compression... David J.