Обсуждение: stale WAL files?

Поиск
Список
Период
Сортировка

stale WAL files?

От
Rob Sargent
Дата:
PG10.7, Centos7

On Mar15 we filled our default tablespace/WAL partition.  Cleaned up some old dumps and restarted.  pg_wal had apparently exploded but cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all databases on that server (except template0).  All seems fine except for 271 WALs from MAR16 going nowhere fast.  Of course we see transient WALS come and go every day.  We did a restart of the server last Friday (Mar22) without issue.

-rw-------. 1 postgres postgres 16777216 Mar 16 16:32 0000000100000CE9000000DD
 ...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9

What’s keeping these alive.  Or can they be deleted? Any normal admin routine likely to clean these up?


One of our crew was apparently trying to create a function: he had a run-away json query which caused memory grief but I don’t see that reflected anywhere in the log. The log file for that day goes south as follows.  

[1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT:  select count(*) from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\
43c2-a1f9-7b68f1618cd4';
[2]2019-03-15 17:07:30.818 MDT [35020] ERROR:  could not write to hash-join temporary file: No space left on device
[3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT:  update segment as s set events_less = s.events_less + b.events_less, events_equal\
 = s.events_equal + b.events_equal, events_greater = s.events_greater + b.events_greater, threshold_events = s.threshold_events +\
 b.threshold_events from bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where s.markerset_id = '8d723d2f-1281-48c5-9\
016-2dab3f4d242b' and s.probandset_id = b.probandset_id and s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
ndbase = b.endbase and  s.probandset_id >= '90000000-0000-0000-0000-000000000000' and s.probandset_id < 'a0000000-0000-0000-0000-\
000000000000'
2019-03-15 17:07:30.818 MDT [317161] PANIC:  could not write to file "pg_wal/xlogtemp.317161": No space left on device
[4]2019-03-15 17:08:19.231 MDT [35166] ERROR:  unterminated dollar-quoted string at or near "$$

        begin
              --new probandset_group record
              insert into probandset_group(id,name) select uuid_generate_v4(),'1808_p3c2n4';

              --create necessary new probandset and probandset_group_member records
              select addprobandset(a.name, people, groupname) from
                     (select powerset((select array_agg(name order by name) lst from base.person
                             where name in (superset))) as name
                             except (select regexp_split_to_array(p.name,',') from  probandset p  )) a
                     where array_length(a.name,1)>1;

              --update superset field for new probandset_group
              update probandset_group set proband_superset_id = (select id from probandset where name = superset);
        end;

[1] Last reported normal operation
[2] death knell
[3] First failed operation from our application
[4] continued effort from create function work
 
Thanks in advance.




Re: stale WAL files?

От
Adrian Klaver
Дата:
On 3/25/19 5:10 PM, Rob Sargent wrote:
> PG10.7, Centos7
> 
> On Mar15 we filled our default tablespace/WAL partition.  Cleaned up 
> some old dumps and restarted.  pg_wal had apparently exploded but 
> cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all 
> databases on that server (except template0).  All seems fine except for 
> 271 WALs from MAR16 going nowhere fast.  Of course we see transient WALS 
> come and go every day.  We did a restart of the server last Friday 
> (Mar22) without issue.
> 
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:32 
> 0000000100000CE9000000DD
>   ...(269 others)
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01 
> 0000000100000CEA000000E9
> 
> What’s keeping these alive.  Or can they be deleted? Any normal admin 
> routine likely to clean these up?

wal_keep_segments?

Do you have replication set up and replication slots in use?

> 
> 
> One of our crew was apparently trying to create a function: he had a 
> run-away json query which caused memory grief but I don’t see that 
> reflected anywhere in the log. The log file for that day goes south as 
> follows.
> 
>     [1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT:  select count(*)
>     from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\
>     43c2-a1f9-7b68f1618cd4';
>     [2]2019-03-15 17:07:30.818 MDT [35020] ERROR:  could not write to
>     hash-join temporary file: No space left on device
>     [3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT:  update segment as
>     s set events_less = s.events_less + b.events_less, events_equal\
>       = s.events_equal + b.events_equal, events_greater =
>     s.events_greater + b.events_greater, threshold_events =
>     s.threshold_events +\
>       b.threshold_events from
>     bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where
>     s.markerset_id = '8d723d2f-1281-48c5-9\
>     016-2dab3f4d242b' and s.probandset_id = b.probandset_id and
>     s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
>     ndbase = b.endbase and  s.probandset_id >=
>     '90000000-0000-0000-0000-000000000000' and s.probandset_id <
>     'a0000000-0000-0000-0000-\
>     000000000000'
>     2019-03-15 17:07:30.818 MDT [317161] PANIC:  could not write to file
>     "pg_wal/xlogtemp.317161": No space left on device
>     [4]2019-03-15 17:08:19.231 MDT [35166] ERROR:  unterminated
>     dollar-quoted string at or near "$$
> 
>              begin
>                    --new probandset_group record
>                    insert into probandset_group(id,name) select
>     uuid_generate_v4(),'1808_p3c2n4';
> 
>                    --create necessary new probandset and
>     probandset_group_member records
>                    select addprobandset(a.name, people, groupname) from
>                           (select powerset((select array_agg(name order
>     by name) lst from base.person
>                                   where name in (superset))) as name
>                                   except (select
>     regexp_split_to_array(p.name,',') from  probandset p  )) a
>                           where array_length(a.name,1)>1;
> 
>                    --update superset field for new probandset_group
>                    update probandset_group set proband_superset_id =
>     (select id from probandset where name = superset);
>              end;
> 
>     [1] Last reported normal operation
>     [2] death knell
>     [3] First failed operation from our application
>     [4] continued effort from create function work
> 
> Thanks in advance.
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: stale WAL files?

От
Michael Paquier
Дата:
On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:
> No, sorry I should have said that up front. We’re simple folk.

What is the WAL position (LSN) where Postgres is writing to and what
is the set of WAL segments in pg_wal (or pg_xlog if that's a server
older than 10)?  Please double-check the configuration value of
wal_keep_segments, and as mentioned upthread, could you make sure that
you have no replication slots active?  This can be done simply by
querying pg_replication_slots.  Please note as well that checkpoints
are server-wide, so there is no point to run them on all databases.
Only one command will be effective for all databases.
--
Michael

Вложения

Re: stale WAL files?

От
Rob Sargent
Дата:
On 3/28/19 7:30 AM, Michael Paquier wrote:
> On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:
>> No, sorry I should have said that up front. We’re simple folk.
> What is the WAL position (LSN)
postgres=# select * from pg_current_wal_flush_lsn();
  pg_current_wal_flush_lsn
--------------------------
  CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_insert_lsn();
  pg_current_wal_insert_lsn
---------------------------
  CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_lsn();
  pg_current_wal_lsn
--------------------
  CEA/E57EAA8
(1 row)

> where Postgres is writing to and what
> is the set of WAL segments in pg_wal (or pg_xlog if that's a server
> older than 10)?

  select version();
version
---------------------------------------------------------------------------------------------------------
  PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

This is pg10 so it's pg_wal.  ls -ltr


-rw-------. 1 postgres postgres 16777216 Mar 16 16:33 
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33 
0000000100000CEA000000B2

  ... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46 
0000000100000CEA0000000E

> Please double-check the configuration value of
> wal_keep_segments,
#wal_keep_segments = 0          # in logfile segments, 16MB each
> and as mentioned upthread, could you make sure that
> you have no replication slots active?

This could be part of the problem?

#max_replication_slots = 10     # max number of replication slots

but

   select * from pg_replication_slots;
  slot_name | plugin | slot_type | datoid | database | temporary | 
active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)


> This can be done simply by
> querying pg_replication_slots.  Please note as well that checkpoints
> are server-wide, so there is no point to run them on all databases.
> Only one command will be effective for all databases.
I suspected as much, but there weren't many dbs so I went all in.
> --
> Michael



Re: stale WAL files?

От
Michael Paquier
Дата:
On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
> This is pg10 so it's pg_wal.  ls -ltr
>
>
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
> 0000000100000CEA000000B1
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
> 0000000100000CEA000000B2
>
>  ... 217 more on through to ...
>
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
> 0000000100000CEA000000E8
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
> 0000000100000CEA000000E9
> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
> 0000000100000CEA0000000E

In Postgres 10 and older versions, the server keeps WAL segment for
the last completed segment, and the previous completed segment.  So
even if a checkpoint is issued, the current WAL insert point is never
really going to be on the first segment in pg_wal.  Isn't that the
origin of what you think is a problem?  So, say, if you issue a
checkpoint again, don't you see 0000000100000CEA000000B1 going away?

In Postgres 11, WAL segments worth only one checkpoint are kept
around.
--
Michael

Вложения

Re: stale WAL files?

От
Rob Sargent
Дата:

> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
> In Postgres 10 and older versions, the server keeps WAL segment for
> the last completed segment, and the previous completed segment.  So
> even if a checkpoint is issued, the current WAL insert point is never
> really going to be on the first segment in pg_wal.  Isn't that the
> origin of what you think is a problem?  So, say, if you issue a
> checkpoint again, don't you see 0000000100000CEA000000B1 going away?
>
> In Postgres 11, WAL segments worth only one checkpoint are kept
> around.
> --
> Michael

I’m on the road all through the weekend with limited connectivity. But recall that I’m getting new, transient WAL files



Re: stale WAL files?

От
Gmail
Дата:

> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
> In Postgres 10 and older versions, the server keeps WAL segment for
> the last completed segment, and the previous completed segment.  So
> even if a checkpoint is issued, the current WAL insert point is never
> really going to be on the first segment in pg_wal.  Isn't that the
> origin of what you think is a problem?  So, say, if you issue a
> checkpoint again, don't you see 0000000100000CEA000000B1 going away?
I had CEA000000015, generated today, as only entry since Mar 16. Issued checkpoint, soon had CEA0000016 and 15.  Five
minuteslater I still hav 15 and 16 (and of course all 271 from Mar 16). 

>
> In Postgres 11, WAL segments worth only one checkpoint are kept
> around.
> --
> Michael



Re: stale WAL files?

От
Gmail
Дата:

> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
Today there are 210 Mar 16 WAL files versus the originally reported 271.  I cannot at this point confirm the original
count,other that to say I used “ls -ltr | grep ‘Mar 16’ | wc -l” to get both numbers. 

Is it interesting that the earliest files (by ls time stamp) are not lowest numerically? Those two file names end
“0000B[12]”(written at 16:33) in a range across the directory from “00001A” through “0000E9”? “0000B0” was written
at16:53and “0000B3” was written at 16:54 
Is there any analysis of the file names I could do which might shed any light on the issue?


Re: stale WAL files?

От
Gmail
Дата:
> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the
pg_waldir such that I can see which files get processed. It’s none of the files I’ve listed previously 


Re: stale WAL files?

От
Gmail
Дата:

> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the
pg_waldir such that I can see which files get processed. It’s none of the files I’ve listed previously 

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16




Re: stale WAL files?

От
Rene Romero Benavides
Дата:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: stale WAL files?

От
Rene Romero Benavides
Дата:

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: stale WAL files?

От
Gmail
Дата:
I’m under fighting a nasty cold.  It may take a day or two for me to response to recent questions and suggestion.




Re: stale WAL files?

От
Gmail
Дата:
Sorry folks, I’m still home nursing a nasty chest cold and my only tool today is an iPad.
I have failed to get the postgresql.conf into the copy buffer so that, along with the results of pg_settings, will have to wait for another day.

Today there are “only” 135 Mar 16 WAL files.  I haven’t sorted out which have been cleaned up but can do so if that’s thought to be helpful.

There is still 2.2G in the pg_wal directory but that disc has ~360G left. (I believe the burst of WAL files was the result of a novice using LIMIT with a to-Json function and the target table has >100M rows.)

Given that current WALs come and go regularly, I think the CHECKPOINT is running frequently enough (for normal loads at least).


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: stale WAL files?

От
Rob Sargent
Дата:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I think my attempt to send postgresql.conf and pg_settings report ran afoul of the mailer. Here again by attachment.


Вложения

Re: stale WAL files?

От
Rob Sargent
Дата:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.


Re: stale WAL files?

От
Rene Romero Benavides
Дата:


On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: stale WAL files?

От
Rob Sargent
Дата:


On Apr 3, 2019, at 5:23 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:



On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <robjsargent@gmail.com> wrote:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <rene.romero.b@gmail.com> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <rene.romero.b@gmail.com> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <robjsargent@gmail.com> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <robjsargent@gmail.com> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.

That’s an interesting catch.  Thank you.  I’ll have that reverted that to default.
Note that the WAL files are all the default 16M however.
Currently we’re down to 88 Mar 16 WAL files.  My inclination is to wait this out, to see if all of Mar 16 goes away quietly then reset our backups.


 
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


Re: stale WAL files?

От
Rob Sargent
Дата:



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before).  Are these being held simply because of the high max_wal_size value?

This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per.  In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId).  Each such iteration is in it's own transaction.  I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records.  I have not, however, dropped the last incarnation.

Re: stale WAL files?

От
Rene Romero Benavides
Дата:
What's your current max_wal_size parameter?
SHOW max_wal_size; 
If it's 8GB as your configuration's previous value, you would get a constant share of 512 WAL files. If it's a development environment set it to the desired size, the smaller the value, the more frequent the checkpoints, but your checkpoint_timeout value is 300 (5 minutes) which is likely to be happening first, and thus being the one triggering checkpoints that often.   

On Wed, Apr 10, 2019 at 1:12 PM Rob Sargent <robjsargent@gmail.com> wrote:



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before).  Are these being held simply because of the high max_wal_size value?

This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per.  In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId).  Each such iteration is in it's own transaction.  I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records.  I have not, however, dropped the last incarnation.



--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: stale WAL files?

От
Rob Sargent
Дата:


On 4/10/19 6:03 PM, Rene Romero Benavides wrote:
What's your current max_wal_size parameter?
SHOW max_wal_size; 
If it's 8GB as your configuration's previous value, you would get a constant share of 512 WAL files. If it's a development environment set it to the desired size, the smaller the value, the more frequent the checkpoints, but your checkpoint_timeout value is 300 (5 minutes) which is likely to be happening first, and thus being the one triggering checkpoints that often.   

On Wed, Apr 10, 2019 at 1:12 PM Rob Sargent <robjsargent@gmail.com> wrote:



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before).  Are these being held simply because of the high max_wal_size value?

This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per.  In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId).  Each such iteration is in it's own transaction.  I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records.  I have not, however, dropped the last incarnation.


We have not yet reconfigured the max_wal_size parameter, it is still 50GB

postgres-# show max_wal_size
postgres-# ;
 max_wal_size
--------------
 50GB
(1 row)

I'm sorry, I  don't follow your thesis that this setting would lead to "a constant" 512 WAL files.  There was a distinct burst of WALs in a 10 minute period yesterday, and no accumulation since then (though a manual checkpoint does generate another WAL)