Обсуждение: An out of memory error when doing a vacuum full
To all, The facts: PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI drives in hardware RAID 0 configuration. Database size with indexes is currently 122GB. DB size before we completed the vacuum full was 150GB. We have recently done a major update to a table, f_pageviews, in our data warehouse. The f_pageviews table contains about 118 million rows. Schema is at the end of this message. We probably touched 80% of those rows with the update. We then commenced to drop all indexes on said table, except the primary key, and attempted to do a vacuum full on the entire DB. You can see the output below it failed. We then tried to do the vacuum full on the f_pageviews table alone and the same error occurred. I did vacuum full on other tables in the schema, one of them about 8 times larger but with very few dead tuples, and all complete successfully. We ended up dumping the table and reloading it to eliminate the dead tuples. After the reload we did the vacuum full with no problems. Does anyone have an explanation as to why this might occur? Thanks. --sean nohup /usr/local/pgsql/bin/vacuumdb -d tripmaster -U tripmaster -f -z -v -t f_pageviews > & /tmp/vacuum2.log & tail -f /tmp/vacuum2.log INFO: vacuuming "public.f_pageviews" INFO: "f_pageviews": found 17736235 removable, 111796026 nonremovable row versions in 1552349 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 156 to 244 bytes long. There were 134565418 unused item pointers. Total free space (including removable row versions) is 27898448080 bytes. 583420 pages are or will become empty, including 0 at the end of the table. 1088195 pages containing 27860101432 free bytes are potential move destinations. CPU 238.91s/27.44u sec elapsed 1261.80 sec. INFO: index "f_pageviews_pkey" now contains 111796026 row versions in 210003 pages DETAIL: 15618120 index row versions were removed. 734 index pages have been deleted, 734 are currently reusable. CPU 96.09s/139.03u sec elapsed 1569.93 sec. vacuumdb: vacuuming of table "f_pageviews" in database "tripmaster" failed: ERROR: out of memory DETAIL: Failed on request of size 350. \d f_pageviews Table "public.f_pageviews" Column | Type | Modifiers ------------------------+---------+------------------------------------------------------------- id | integer | not null default nextval('public.f_pageviews_id_seq'::text) date_key | integer | not null time_key | integer | not null content_key | integer | not null location_key | integer | not null session_key | integer | not null subscriber_key | text | not null persistent_cookie_key | integer | not null ip_key | integer | not null referral_key | integer | not null servlet_key | integer | not null tracking_key | integer | not null provider_key | text | not null marketing_campaign_key | integer | not null orig_airport | text | not null dest_airport | text | not null commerce_page | boolean | not null default false job_control_number | integer | not null sequenceid | integer | not null default 0 url_key | integer | not null useragent_key | integer | not null web_server_name | text | not null default 'Not Available'::text cpc | integer | not null default 0 referring_servlet_key | integer | default 1 first_page_key | integer | default 1 newsletterid_key | text | not null default 'Not Available'::text Indexes: "f_pageviews_pkey" primary key, btree (id) "idx_page_views_content" btree (content_key) "idx_pageviews_date_dec_2003" btree (date_key) WHERE ((date_key >= 335) AND (date_key <= 365)) "idx_pageviews_date_nov_2003" btree (date_key) WHERE ((date_key >= 304) AND (date_key <= 334)) "idx_pageviews_referring_servlet" btree (referring_servlet_key) "idx_pageviews_servlet" btree (servlet_key) "idx_pageviews_session" btree (session_key)
Is your system using full RAM? Ie, what does limits -a show? Regards. Fernando. En un mensaje anterior, Sean Shanny escribió: > To all, > > The facts: > > PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI > drives in hardware RAID 0 configuration. Database size with indexes is > currently 122GB. DB size before we completed the vacuum full was 150GB.
Take a look at datasize: your processes are allowed a maximum of 512 Mb RAM. Read the handbook to find out how to reconfigure your kernel and the limits (and/or ulimit) man page to tweak the values for individual processes. Good luck! Fernando. En un mensaje anterior, Sean Shanny escribió: > limits -a > Resource limits (current): > cputime infinity secs > filesize infinity kb > datasize 524288 kb > stacksize 65536 kb > coredumpsize infinity kb > memoryuse infinity kb > memorylocked infinity kb > maxprocesses 5547 > openfiles 11095 > sbsize infinity bytes > vmemoryuse infinity kb
Sean Shanny <shannyconsulting@earthlink.net> writes: > Does anyone have an explanation as to why this might occur? What have you got vacuum_mem set to? Also, what ulimit settings is the postmaster running under? (I'm wondering exactly how large the backend process has grown when it gets the failure.) regards, tom lane
Fernando Schapachnik <fernando@mecon.gov.ar> writes: > Take a look at datasize: your processes are allowed a maximum of 512 Mb RAM. > Read the handbook to find out how to reconfigure your kernel and the limits > (and/or ulimit) man page to tweak the values for individual processes. 512Mb does not seem like an unreasonable per-process limit (indeed, it may be too high considering he's got just 4G RAM to go around among all the processes). So I'd not recommend Sean try to solve the problem by raising it. The real question is why is the backend getting that big. I'm suspicious that sort_mem and/or vacuum_mem are set to unrealistically large values, but we don't have that info yet. regards, tom lane
From postgresql.conf: # - Memory - shared_buffers = 10000 # min 16, at least max_connections*2, 8KB each sort_mem = 64000 # min 64, size in KB vacuum_mem = 32767 # min 1024, size in KB The ulimit is set to unlimited as far as I can tell. --sean Tom Lane wrote: >Sean Shanny <shannyconsulting@earthlink.net> writes: > > >>Does anyone have an explanation as to why this might occur? >> >> > >What have you got vacuum_mem set to? Also, what ulimit settings is the >postmaster running under? (I'm wondering exactly how large the backend >process has grown when it gets the failure.) > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
Sean Shanny <shannyconsulting@earthlink.net> writes: > sort_mem = 64000 # min 64, size in KB You might want to lower that; a complex query could easily use several times sort_mem. Whether this is the immediate source of your problem with the other query is hard to tell. > vacuum_mem = 32767 # min 1024, size in KB That seems all right, but I recollect now that it only applies to plain VACUUM not VACUUM FULL. VACUUM FULL needs to keep track of *all* the free space in a table, and so it's certainly possible that vacuuming a huge table with many dead tuples could require lots of memory. I can't recall anyone else ever complaining about VACUUM FULL running out of memory, though, so there may be some other contributing factor in your situation. Too bad you reloaded the table --- it would be interesting to see if increasing your 512Mb datasize ulimit would have allowed the VACUUM FULL to complete. (Not but what it would've taken forever :-() regards, tom lane
Tom, I will lower the sort_mem and see what happens. :-) I agree that we probably pushed the limits of a vacuum full with the size table we had and the large percentage of change in the table. We did NOT run any vacuum jobs during the update process, that my have helped to allow the updates to use the reclaimed space in place. We are going to continue to run into this issue as the table is only going to get larger and we are still making tweaks. An example would be adding a new column to the f_pageviews table, we have done this several times as we discover new data that needs to be captured. This of course entails an ALTER TABLE ADD COLUMN, then an ALTER TABLE ALTER COLUMN to set the default, then an update to set the 'historical' rows to the default value, and finally a SET NOT NULL on the column as none of our data is allowed to have a value of null. This pretty much entails updating all the rows in the table. We are thinking instead of doing an UPDATE it would be better to make a new temp table, run the code that contains the update logic but instead of updating the real table write the updated and non updated rows to the temp table , and then do a drop, rename? Thanks again for your time. --sean p.s. By the way I am unable to send mail to you directly. I get these errors: (I assume you just don't want email from earthlink?) tgl@sss.pgh.pa.us SMTP error from remote mailer after MAIL FROM:<shannyconsulting@earthlink.net>: host sss.pgh.pa.us [192.204.191.242]: 550 5.0.0 If you would like to talk to me, find a more responsible ISP than earthlink Tom Lane wrote: >Sean Shanny <shannyconsulting@earthlink.net> writes: > > >>sort_mem = 64000 # min 64, size in KB >> >> > >You might want to lower that; a complex query could easily use several >times sort_mem. Whether this is the immediate source of your problem >with the other query is hard to tell. > > > >>vacuum_mem = 32767 # min 1024, size in KB >> >> > >That seems all right, but I recollect now that it only applies to plain >VACUUM not VACUUM FULL. VACUUM FULL needs to keep track of *all* the >free space in a table, and so it's certainly possible that vacuuming a >huge table with many dead tuples could require lots of memory. I can't >recall anyone else ever complaining about VACUUM FULL running out of >memory, though, so there may be some other contributing factor in your >situation. Too bad you reloaded the table --- it would be interesting >to see if increasing your 512Mb datasize ulimit would have allowed the >VACUUM FULL to complete. (Not but what it would've taken forever :-() > > regards, tom lane > > >
Sean Shanny <shannyconsulting@earthlink.net> writes: > By the way I am unable to send mail to you directly. I get these > errors: (I assume you just don't want email from earthlink?) > tgl@sss.pgh.pa.us > SMTP error from remote mailer after MAIL FROM:<shannyconsulting@earthlink.net>: > host sss.pgh.pa.us [192.204.191.242]: 550 5.0.0 If you would like to talk to me, find a more responsible ISP than earthlink Sorry about that --- I blocked earthlink in disgust about six months ago, when I noticed that my virus-rejection script had been faithfully bouncing virus mails from the *same* earthlink account to abuse at abuse.earthlink.net every few days for the preceding nine months, and their abuse people quite obviously had taken zero action to get their customer disinfected. Assuming they have any abuse people, that is, and abuse@ isn't just an alias for /dev/null. I'm not planning to unblock until I see some evidence that they have a functioning abuse desk. regards, tom lane