Обсуждение: Database size with large objects

Поиск
Список
Период
Сортировка

Database size with large objects

От
"Michael Goldner"
Дата:

I'm running postgresql 8.1.8 on RHEL4.

I have a database with a single table that includes an oid reference to a
large object.  After loading 100GB of large objects using lo_import(), I
find that my total database size has grown by about 270GB.  What is the
reason for the difference in space usage?

There are about 400,000 records in the table, with each large object the
same size (on average).  I have autovacuum running, and had done a vacuum
--full --analyze prior to the load.

Max_fsm_pages is very high (1,500,000), and the vacuum --analyze confirms
that it is sufficient.

Is this just something that I have to live with?

Thanks,

Mike

Re: Database size with large objects

От
Tom Lane
Дата:
"Michael Goldner" <MGoldner@agmednet.com> writes:
> I have a database with a single table that includes an oid reference to a
> large object.  After loading 100GB of large objects using lo_import(), I
> find that my total database size has grown by about 270GB.  What is the
> reason for the difference in space usage?

Try VACUUM VERBOSE on your table and also on pg_largeobject to get a
sense of where the space went.  I'm wondering whether you allowed for
indexes and for other data in your table.

            regards, tom lane

Re: Database size with large objects

От
Michael Goldner
Дата:
On 11/4/07 8:26 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Michael Goldner" <MGoldner@agmednet.com> writes:
>> I have a database with a single table that includes an oid reference to a
>> large object.  After loading 100GB of large objects using lo_import(), I
>> find that my total database size has grown by about 270GB.  What is the
>> reason for the difference in space usage?
>
> Try VACUUM VERBOSE on your table and also on pg_largeobject to get a
> sense of where the space went.  I'm wondering whether you allowed for
> indexes and for other data in your table.
>
> regards, tom lane

OK, I vacuumed both tables.  The single table vacuumed in about 2 seconds
and the stats look normal:

INFO:  vacuuming "public.image"
INFO:  index "image_pkey" now contains 386749 row versions in 1691 pages
DETAIL:  21799 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.10u sec elapsed 0.14 sec.
INFO:  "image": removed 21799 row versions in 716 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "image": found 21799 removable, 386749 nonremovable row versions in
19589 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 183153 unused item pointers.
0 pages are entirely empty.
CPU 0.09s/0.19u sec elapsed 0.32 sec.
INFO:  vacuuming "pg_toast.pg_toast_176283637"
INFO:  index "pg_toast_176283637_index" now contains 0 row versions in 1
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_176283637": found 0 removable, 0 nonremovable row versions
in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

The pg_largeobject table, however, seems a bit odd:

INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  index "pg_largeobject_loid_pn_index" now contains 105110204 row
versions in 404151 pages
DETAIL:  778599 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 21.24s/48.07u sec elapsed 273.15 sec.
INFO:  "pg_largeobject": removed 778599 row versions in 775264 pages
DETAIL:  CPU 54.73s/29.70u sec elapsed 2203.32 sec.
INFO:  "pg_largeobject": found 778599 removable, 105094846 nonremovable row
versions in 34803136 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 70 unused item pointers.
0 pages are entirely empty.
CPU 1031.40s/339.21u sec elapsed 10875.66 sec.
VACUUM

In particular, "105094846 nonremovable row versions in 34803136 pages" seems
really high given that there are only about 400,000 large objects.  The
majority of the lobs are less than 1MB.  However, a small subset can range
as high as 1GB.

I see that a single object is spread across multiple pages (loid, pageno) in
the pg_largeobject table.  Is it necessary or possible to adjust the page
size for greater efficiency?

Thanks,

Mike


Re: Database size with large objects

От
Tom Lane
Дата:
Michael Goldner <mgoldner@agmednet.com> writes:
> The pg_largeobject table, however, seems a bit odd:

> INFO:  vacuuming "pg_catalog.pg_largeobject"
> INFO:  index "pg_largeobject_loid_pn_index" now contains 105110204 row
> versions in 404151 pages
> DETAIL:  778599 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 21.24s/48.07u sec elapsed 273.15 sec.
> INFO:  "pg_largeobject": removed 778599 row versions in 775264 pages
> DETAIL:  CPU 54.73s/29.70u sec elapsed 2203.32 sec.
> INFO:  "pg_largeobject": found 778599 removable, 105094846 nonremovable row
> versions in 34803136 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 70 unused item pointers.
> 0 pages are entirely empty.
> CPU 1031.40s/339.21u sec elapsed 10875.66 sec.
> VACUUM

> In particular, "105094846 nonremovable row versions in 34803136 pages" seems
> really high given that there are only about 400,000 large objects.

Well, each row in pg_largeobject is a 2K (at most) chunk of a large
object.  There is something funny here because if there's only 100GB
of LO data, that would average out to less than 1K per row, which is
half what I'd expect.  Do you have another source of large objects
that are not-so-large-as-all-that and might be dragging down the
average?

It might be interesting to look at stats such as
    select sum(length(data)) from pg_largeobject;
to confirm that your 100GB estimate for the data payload is accurate.

            regards, tom lane

Re: Database size with large objects

От
Michael Goldner
Дата:


On 11/5/07 12:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Michael Goldner <mgoldner@agmednet.com> writes:
>> The pg_largeobject table, however, seems a bit odd:
>
>> INFO:  vacuuming "pg_catalog.pg_largeobject"
>> INFO:  index "pg_largeobject_loid_pn_index" now contains 105110204 row
>> versions in 404151 pages
>> DETAIL:  778599 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 21.24s/48.07u sec elapsed 273.15 sec.
>> INFO:  "pg_largeobject": removed 778599 row versions in 775264 pages
>> DETAIL:  CPU 54.73s/29.70u sec elapsed 2203.32 sec.
>> INFO:  "pg_largeobject": found 778599 removable, 105094846 nonremovable row
>> versions in 34803136 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>> There were 70 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 1031.40s/339.21u sec elapsed 10875.66 sec.
>> VACUUM
>
>> In particular, "105094846 nonremovable row versions in 34803136 pages" seems
>> really high given that there are only about 400,000 large objects.
>
> Well, each row in pg_largeobject is a 2K (at most) chunk of a large
> object.  There is something funny here because if there's only 100GB
> of LO data, that would average out to less than 1K per row, which is
> half what I'd expect.  Do you have another source of large objects
> that are not-so-large-as-all-that and might be dragging down the
> average?
>
> It might be interesting to look at stats such as
> select sum(length(data)) from pg_largeobject;
> to confirm that your 100GB estimate for the data payload is accurate.
>
> regards, tom lane

That select returns the following:

image=# select sum(length(data)) from pg_largeobject;
     sum
--------------
 215040008847
(1 row)

The actual on-disk data before the import was about half (this is a
production environment so data has been added over the last 2 days):

# du -sh /pglog/image_lo/
102G    /pglog/image_lo/


--
Mike Goldner
Vice President Networks and Technology
AG Mednet, Inc.
The Pilot House
Lewis Wharf
Boston, MA  02110
617.854.3225 (office)
617.909.3009 (mobile)



Re: Database size with large objects

От
Tom Lane
Дата:
Michael Goldner <mgoldner@agmednet.com> writes:
> On 11/5/07 12:19 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> It might be interesting to look at stats such as
>> select sum(length(data)) from pg_largeobject;
>> to confirm that your 100GB estimate for the data payload is accurate.

> That select returns the following:

> image=# select sum(length(data)) from pg_largeobject;
>      sum
> --------------
>  215040008847
> (1 row)

Hmm, so given that you had 34803136 pages in pg_largeobject, that works
out to just about 75% fill factor.  That is to say, you're only getting
3 2K rows per page and not 4.  If the rows were full-size then 4 would
obviously not fit (there is some overhead...) but the normal expectation
in pg_largeobject is that tuple compression will shave enough space to
make up for the overhead and let you get 4 rows per page.  Are your
large objects mostly pre-compressed data?

            regards, tom lane