Обсуждение: stale WAL files?
[1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT: select count(*) from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\43c2-a1f9-7b68f1618cd4';[2]2019-03-15 17:07:30.818 MDT [35020] ERROR: could not write to hash-join temporary file: No space left on device[3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT: update segment as s set events_less = s.events_less + b.events_less, events_equal\= s.events_equal + b.events_equal, events_greater = s.events_greater + b.events_greater, threshold_events = s.threshold_events +\b.threshold_events from bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where s.markerset_id = '8d723d2f-1281-48c5-9\016-2dab3f4d242b' and s.probandset_id = b.probandset_id and s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\ndbase = b.endbase and s.probandset_id >= '90000000-0000-0000-0000-000000000000' and s.probandset_id < 'a0000000-0000-0000-0000-\000000000000'2019-03-15 17:07:30.818 MDT [317161] PANIC: could not write to file "pg_wal/xlogtemp.317161": No space left on device[4]2019-03-15 17:08:19.231 MDT [35166] ERROR: unterminated dollar-quoted string at or near "$$begin--new probandset_group recordinsert into probandset_group(id,name) select uuid_generate_v4(),'1808_p3c2n4';--create necessary new probandset and probandset_group_member recordsselect addprobandset(a.name, people, groupname) from(select powerset((select array_agg(name order by name) lst from base.personwhere name in (superset))) as nameexcept (select regexp_split_to_array(p.name,',') from probandset p )) awhere array_length(a.name,1)>1;--update superset field for new probandset_groupupdate probandset_group set proband_superset_id = (select id from probandset where name = superset);end;[1] Last reported normal operation[2] death knell[3] First failed operation from our application[4] continued effort from create function work
On 3/25/19 5:10 PM, Rob Sargent wrote: > PG10.7, Centos7 > > On Mar15 we filled our default tablespace/WAL partition. Cleaned up > some old dumps and restarted. pg_wal had apparently exploded but > cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all > databases on that server (except template0). All seems fine except for > 271 WALs from MAR16 going nowhere fast. Of course we see transient WALS > come and go every day. We did a restart of the server last Friday > (Mar22) without issue. > > -rw-------. 1 postgres postgres 16777216 Mar 16 16:32 > 0000000100000CE9000000DD > ...(269 others) > -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 > 0000000100000CEA000000E9 > > What’s keeping these alive. Or can they be deleted? Any normal admin > routine likely to clean these up? wal_keep_segments? Do you have replication set up and replication slots in use? > > > One of our crew was apparently trying to create a function: he had a > run-away json query which caused memory grief but I don’t see that > reflected anywhere in the log. The log file for that day goes south as > follows. > > [1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT: select count(*) > from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\ > 43c2-a1f9-7b68f1618cd4'; > [2]2019-03-15 17:07:30.818 MDT [35020] ERROR: could not write to > hash-join temporary file: No space left on device > [3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT: update segment as > s set events_less = s.events_less + b.events_less, events_equal\ > = s.events_equal + b.events_equal, events_greater = > s.events_greater + b.events_greater, threshold_events = > s.threshold_events +\ > b.threshold_events from > bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where > s.markerset_id = '8d723d2f-1281-48c5-9\ > 016-2dab3f4d242b' and s.probandset_id = b.probandset_id and > s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\ > ndbase = b.endbase and s.probandset_id >= > '90000000-0000-0000-0000-000000000000' and s.probandset_id < > 'a0000000-0000-0000-0000-\ > 000000000000' > 2019-03-15 17:07:30.818 MDT [317161] PANIC: could not write to file > "pg_wal/xlogtemp.317161": No space left on device > [4]2019-03-15 17:08:19.231 MDT [35166] ERROR: unterminated > dollar-quoted string at or near "$$ > > begin > --new probandset_group record > insert into probandset_group(id,name) select > uuid_generate_v4(),'1808_p3c2n4'; > > --create necessary new probandset and > probandset_group_member records > select addprobandset(a.name, people, groupname) from > (select powerset((select array_agg(name order > by name) lst from base.person > where name in (superset))) as name > except (select > regexp_split_to_array(p.name,',') from probandset p )) a > where array_length(a.name,1)>1; > > --update superset field for new probandset_group > update probandset_group set proband_superset_id = > (select id from probandset where name = superset); > end; > > [1] Last reported normal operation > [2] death knell > [3] First failed operation from our application > [4] continued effort from create function work > > Thanks in advance. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote: > No, sorry I should have said that up front. We’re simple folk. What is the WAL position (LSN) where Postgres is writing to and what is the set of WAL segments in pg_wal (or pg_xlog if that's a server older than 10)? Please double-check the configuration value of wal_keep_segments, and as mentioned upthread, could you make sure that you have no replication slots active? This can be done simply by querying pg_replication_slots. Please note as well that checkpoints are server-wide, so there is no point to run them on all databases. Only one command will be effective for all databases. -- Michael
Вложения
On 3/28/19 7:30 AM, Michael Paquier wrote: > On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote: >> No, sorry I should have said that up front. We’re simple folk. > What is the WAL position (LSN) postgres=# select * from pg_current_wal_flush_lsn(); pg_current_wal_flush_lsn -------------------------- CEA/E57EAA8 (1 row) postgres=# select * from pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- CEA/E57EAA8 (1 row) postgres=# select * from pg_current_wal_lsn(); pg_current_wal_lsn -------------------- CEA/E57EAA8 (1 row) > where Postgres is writing to and what > is the set of WAL segments in pg_wal (or pg_xlog if that's a server > older than 10)? select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit (1 row) This is pg10 so it's pg_wal. ls -ltr -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 0000000100000CEA000000B1 -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 0000000100000CEA000000B2 ... 217 more on through to ... -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E8 -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9 -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 0000000100000CEA0000000E > Please double-check the configuration value of > wal_keep_segments, #wal_keep_segments = 0 # in logfile segments, 16MB each > and as mentioned upthread, could you make sure that > you have no replication slots active? This could be part of the problem? #max_replication_slots = 10 # max number of replication slots but select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- (0 rows) > This can be done simply by > querying pg_replication_slots. Please note as well that checkpoints > are server-wide, so there is no point to run them on all databases. > Only one command will be effective for all databases. I suspected as much, but there weren't many dbs so I went all in. > -- > Michael
On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: > This is pg10 so it's pg_wal. ls -ltr > > > -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 > 0000000100000CEA000000B1 > -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 > 0000000100000CEA000000B2 > > ... 217 more on through to ... > > -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 > 0000000100000CEA000000E8 > -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 > 0000000100000CEA000000E9 > -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 > 0000000100000CEA0000000E In Postgres 10 and older versions, the server keeps WAL segment for the last completed segment, and the previous completed segment. So even if a checkpoint is issued, the current WAL insert point is never really going to be on the first segment in pg_wal. Isn't that the origin of what you think is a problem? So, say, if you issue a checkpoint again, don't you see 0000000100000CEA000000B1 going away? In Postgres 11, WAL segments worth only one checkpoint are kept around. -- Michael
Вложения
> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote: > >> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: >> This is pg10 so it's pg_wal. ls -ltr >> >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B1 >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B2 >> >> ... 217 more on through to ... >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E8 >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E9 >> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 >> 0000000100000CEA0000000E > > In Postgres 10 and older versions, the server keeps WAL segment for > the last completed segment, and the previous completed segment. So > even if a checkpoint is issued, the current WAL insert point is never > really going to be on the first segment in pg_wal. Isn't that the > origin of what you think is a problem? So, say, if you issue a > checkpoint again, don't you see 0000000100000CEA000000B1 going away? > > In Postgres 11, WAL segments worth only one checkpoint are kept > around. > -- > Michael I’m on the road all through the weekend with limited connectivity. But recall that I’m getting new, transient WAL files
> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote: > >> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: >> This is pg10 so it's pg_wal. ls -ltr >> >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B1 >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B2 >> >> ... 217 more on through to ... >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E8 >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E9 >> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 >> 0000000100000CEA0000000E > > In Postgres 10 and older versions, the server keeps WAL segment for > the last completed segment, and the previous completed segment. So > even if a checkpoint is issued, the current WAL insert point is never > really going to be on the first segment in pg_wal. Isn't that the > origin of what you think is a problem? So, say, if you issue a > checkpoint again, don't you see 0000000100000CEA000000B1 going away? I had CEA000000015, generated today, as only entry since Mar 16. Issued checkpoint, soon had CEA0000016 and 15. Five minuteslater I still hav 15 and 16 (and of course all 271 from Mar 16). > > In Postgres 11, WAL segments worth only one checkpoint are kept > around. > -- > Michael
> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote: > >> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: >> This is pg10 so it's pg_wal. ls -ltr >> >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B1 >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B2 >> >> ... 217 more on through to ... >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E8 >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E9 >> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 >> 0000000100000CEA0000000E > Today there are 210 Mar 16 WAL files versus the originally reported 271. I cannot at this point confirm the original count,other that to say I used “ls -ltr | grep ‘Mar 16’ | wc -l” to get both numbers. Is it interesting that the earliest files (by ls time stamp) are not lowest numerically? Those two file names end “0000B[12]”(written at 16:33) in a range across the directory from “00001A” through “0000E9”? “0000B0” was written at16:53and “0000B3” was written at 16:54 Is there any analysis of the file names I could do which might shed any light on the issue?
> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote: > >> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: >> This is pg10 so it's pg_wal. ls -ltr >> >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B1 >> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >> 0000000100000CEA000000B2 >> >> ... 217 more on through to ... >> >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E8 >> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >> 0000000100000CEA000000E9 >> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 >> 0000000100000CEA0000000E > I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_waldir such that I can see which files get processed. It’s none of the files I’ve listed previously
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote: > > >>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote: >>> >>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote: >>> This is pg10 so it's pg_wal. ls -ltr >>> >>> >>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >>> 0000000100000CEA000000B1 >>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33 >>> 0000000100000CEA000000B2 >>> >>> ... 217 more on through to ... >>> >>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >>> 0000000100000CEA000000E8 >>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 >>> 0000000100000CEA000000E9 >>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46 >>> 0000000100000CEA0000000E > I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_waldir such that I can see which files get processed. It’s none of the files I’ve listed previously Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16
Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--
I’m under fighting a nasty cold. It may take a day or two for me to response to recent questions and suggestion.
On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?--
I think my attempt to send postgresql.conf and pg_settings report ran afoul of the mailer. Here again by attachment.On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?--
Вложения
two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?--
two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?--
On Apr 3, 2019, at 5:23 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <robjsargent@gmail.com> wrote:two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:Please share your complete postgresql.conf file and the results from this query:On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:
> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal. ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted). I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously
Two more have been cleaned up. 001C and 001D generated at 16:38 Mar 16SELECT * FROM pg_settings;has someone in the past configured wal archiving?You've ran out of disk space as this log message you shared states:No space left on devicewhat's the output of df -h--BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?--As per your configuration :max_wal_size = 50GBthis seems to be the cause for the WAL files piling up.this has been declared twice, the last one is taking effect.
--
As per your configuration :max_wal_size = 50GBthis seems to be the cause for the WAL files piling up.this has been declared twice, the last one is taking effect.--
I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before). Are these being held simply because of the high max_wal_size value?
This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per. In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId). Each such iteration is in it's own transaction. I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records. I have not, however, dropped the last incarnation.
As per your configuration :max_wal_size = 50GBthis seems to be the cause for the WAL files piling up.this has been declared twice, the last one is taking effect.--I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before). Are these being held simply because of the high max_wal_size value?
This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per. In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId). Each such iteration is in it's own transaction. I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records. I have not, however, dropped the last incarnation.
What's your current max_wal_size parameter?SHOW max_wal_size;If it's 8GB as your configuration's previous value, you would get a constant share of 512 WAL files. If it's a development environment set it to the desired size, the smaller the value, the more frequent the checkpoints, but your checkpoint_timeout value is 300 (5 minutes) which is likely to be happening first, and thus being the one triggering checkpoints that often.On Wed, Apr 10, 2019 at 1:12 PM Rob Sargent <robjsargent@gmail.com> wrote:
As per your configuration :max_wal_size = 50GBthis seems to be the cause for the WAL files piling up.this has been declared twice, the last one is taking effect.--I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before). Are these being held simply because of the high max_wal_size value?
This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per. In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId). Each such iteration is in it's own transaction. I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records. I have not, however, dropped the last incarnation.
We have not yet reconfigured the max_wal_size parameter, it is still 50GB
postgres-# show max_wal_size
postgres-# ;
max_wal_size
--------------
50GB
(1 row)
I'm sorry, I don't follow your thesis that this setting would lead to "a constant" 512 WAL files. There was a distinct burst of WALs in a 10 minute period yesterday, and no accumulation since then (though a manual checkpoint does generate another WAL)