On Mon, Nov 14, 2022 at 5:47 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Mon, Nov 14, 2022 at 05:08:15PM +0530, Amit Kapila wrote:
> > > #v+
> > > 2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled
write-aheadlog file ""000000010001039D00000082""",,,,,,,,,""
> ...
> > > #v-
> > >
> > > So, the whole set of missing files was recycled at the same time.
> > >
> > > One more thing that I have is copy of pg_replication_slots info. This data is taken periodically, around every 1
minute.Each line, aside from pg_replication_slots slots data has also output of pg_current_wal_lsn() and timestamp.
Lookslike this:
> > >
> > > #v+
> > > timestamp pg_current_wal_lsn slot_name plugin slot_type datoid database temporary active
active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn
> > ...
> > ...
> > > 2022-11-10 21:02:52 UTC 103A7/B45ECEA8 focal14 pgoutput logical 16607 canvas f f
\N \N 3241434855 1039D/83825958 1039D/911A8DB0
> > > 2022-11-10 21:03:52 UTC 103A7/C4355F60 focal14 pgoutput logical 16607 canvas f t
21748 \N 3241443528 1039D/83825958 1039D/955633D0
> > ...
> >
> > I think from the above two it is clear that the slot with restart_lsn
> > 1039D/83825958 is active and we seem to have recycled corresponding
> > segment.
> >
> > You may have the LOG for "attempting to remove WAL segments older than
> > log file %s", if so, please share. Also on similar lines, I think we
>
> Absolutely.
>
> There is something weird happening:
>
What exactly weird you are seeing in this? To me, it appears as if the
system due to some reason ignores an existing slot that has
restart_lsn as 1039D/83825958.
>
> If I'll add "recycled" lines we will get:
>
> #v+
> ...
> 2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled
write-aheadlog file ""000000010001039D00000082""",,,,,,,,,""
> 2022-11-10 21:06:07.521 UTC,,,28925,,636378d6.70fd,4594902,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to
removeWAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
>
Actually, the last LOG was what I was not expecting because of the
existence of a corresponding slot. This is what we have to find out
how it has computed the value "000000000001039D0000008A" for the file
to be removed.
> > need to add some LOGs before calling RemoveOldXlogFiles() to know
> > where our computation to remove files goes off. Something, like the
> > attached, would be helpful but note I have added this quickly on my
> > working branch to show you what I have in mind. You may need small
> > changes based on the branch you are working on.
>
> That will be complicated.
>
> Changing pg (recompile, and rerun) requires careful preparation with our
> customers. Which takes time.
>
> Additionally we are using precompiled binaries from pgdg, that makes the
> process slightly more involved.
>
I can understand but OTOH it is not easy to proceed without more information.
--
With Regards,
Amit Kapila.