Обсуждение: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)

Поиск
Список
Период
Сортировка

Hi Team,

We are facing an issue related to index corruption after migrating our PostgreSQL 11 setup from RHEL 7 to RHEL 9 using streaming replication (base backup method).

After bringing up the standby on RHEL 9, we observed that certain tables are not returning results when queries use indexed scans. Upon investigation, the following check confirms index corruption:

The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records.
Rebuilding the indexes fixed the issue temporarily but we have many indexes and our DB size is more than 10TB.

Environment details:

  • PostgreSQL Version: 11.15

  • OS on primary: RHEL 7.9

  • OS on standby: RHEL 9.6

  • Replication Type: Streaming replication (initialized using pg_basebackup)

  • Data Directory initialized from RHEL 7 base backup

Issue Summary:

  • Indexes appear and are the same size as per prod  on standby after base backup restore.

  • We are able to read  the data from the tables without index scans on standy by RHEL 9.

  • No filesystem or WAL errors observed in logs.

Could this be related to OS-level binary or page layout differences between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.


Thanks & Regards,
krishna.


Hi Krishna,

This might be due to the upgrade in glibc between RHEL7 and RHEL8, which includes updated collations that change sort orders and thus invalidate existing indexes, forcing them to be rebuilt (on a RHEL>7 system) if they contain any values which sort differently under the new collation than the old one.

More details can be found here, including a query to identify affected indexes: https://wiki.postgresql.org/wiki/Locale_data_changes

[RHEL] Version 8 uses the new locale data. Therefore, caution will be necessary when upgrading."

Thanks, Chris.

On Wed, 22 Oct 2025 at 11:53, Bala M <krishna.pgdba@gmail.com> wrote:

Hi Team,

We are facing an issue related to index corruption after migrating our PostgreSQL 11 setup from RHEL 7 to RHEL 9 using streaming replication (base backup method).

After bringing up the standby on RHEL 9, we observed that certain tables are not returning results when queries use indexed scans. Upon investigation, the following check confirms index corruption:

The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records.
Rebuilding the indexes fixed the issue temporarily but we have many indexes and our DB size is more than 10TB.

Environment details:

  • PostgreSQL Version: 11.15

  • OS on primary: RHEL 7.9

  • OS on standby: RHEL 9.6

  • Replication Type: Streaming replication (initialized using pg_basebackup)

  • Data Directory initialized from RHEL 7 base backup

Issue Summary:

  • Indexes appear and are the same size as per prod  on standby after base backup restore.

  • We are able to read  the data from the tables without index scans on standy by RHEL 9.

  • No filesystem or WAL errors observed in logs.

Could this be related to OS-level binary or page layout differences between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.


Thanks & Regards,
krishna.


On 10/22/25 02:03, Bala M wrote:
> Hi Team,
> 
> We are facing an issue related to index corruption after migrating our 
> PostgreSQL 11 setup from *RHEL 7* to *RHEL 9* using *streaming 
> replication* (base backup method).

Postgres 11 is two years past EOL per:

https://www.postgresql.org/support/versioning/


> 
> Thanks & Regards,
> *krishna.*
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com> wrote:
  • PostgreSQL Version: 11.15


Ouch! Not only is the major version end of life, but that's not even the latest revision of 11. At this point, you should use logical replication to migrate from your v11 to a shiny new Postgres v18 on your RHEL9 box. As Chris pointed out, the indexes are a libc problem, but your bigger problem is being on v11.15. If you want to trade complexity for time, you could consider using pg_dump to do the upgrade instead of logical replication.


Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

On Wed, Oct 22, 2025 at 1:35 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com> wrote:
  • PostgreSQL Version: 11.15


Ouch! Not only is the major version end of life, but that's not even the latest revision of 11.
 
Do pre-built PG11 binaries even exist for RHEL9?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/22/25 11:18, Ron Johnson wrote:
> On Wed, Oct 22, 2025 at 1:35 PM Greg Sabino Mullane <htamfids@gmail.com 
> <mailto:htamfids@gmail.com>> wrote:
> 
>     On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com
>     <mailto:krishna.pgdba@gmail.com>> wrote:
> 
>           *
> 
>             PostgreSQL Version: 11.15
> 
> 
>     Ouch! Not only is the major version end of life, but that's not even
>     the latest revision of 11.
> 
> Do pre-built PG11 binaries even exist for RHEL9?

I believe you can get them here:

https://yum.postgresql.org/repopackages/#archivedrepos

I don't use RH do I can't verify.

> 
> -- 
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!


-- 
Adrian Klaver
adrian.klaver@aklaver.com



  • Acceptable downtime: ~1 day

  • Logical replication: Not feasible due to the number of schemas, tables, and overall data volume

I'm not sure why this is not feasible. Can you expand on this?

* For a 15 TB database with roughly 1 day downtime, what would be the most reliable approach to migrate from RHEL 7 → RHEL 9 while avoiding collation/index corruption issues?

pg_dump is the most reliable, and the slowest. Keep in mind that only the actual data needs to move over (not the indexes, which get rebuilt after the data is loaded). You could also mix-n-match pg_logical and pg_dump if you have a few tables that are super large. Whether either approach fits in your 24 hour window is hard to say without you running some tests.

* Would using pg_upgrade (with --check and --clone options) be safe when moving between OS versions with different glibc libraries?

No, you cannot use pg_upgrade for this. It can move your system across Postgres versions, but across servers/operating systems.

* If we temporarily remain on PostgreSQL 11, is it mandatory to rebuild all indexes after restoring the base backup on RHEL 9 to ensure data consistency? Would running REINDEX DATABASE across all databases be sufficient?

Yes, and yes. 

* Are there any community-tested procedures or best practices for migrating large (15 TB+) environments between RHEL 7 and RHEL 9 with minimal downtime?

Yes - logical replication is both battle-tested and best practice for such an upgrade. But with such a large downtime window, investigate pg_dump to v18. You can find a large table and dump just that one table to start getting some measurements, e.g. run from the new server:

pg_dump -h my_rhel7_server -d mydb -t mybigtable | psql -h localhost -d mydb -f -

Make sure log_min_duration_statement is set on the new server to help you see how long each step takes.


Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

On 10/23/25 07:51, Bala M wrote:
> Hi All,
> 
> Thank you for the valuable responses regarding the *index corruption 
> issue* we observed during our test migration of a *PostgreSQL 11* 
> environment from *RHEL 7 to RHEL 9* using *streaming replication*.
> 
> Based on the replies, I understand that.

>     Suggested upgrading to a newer PostgreSQL version (e.g., v16 or v18)
>     and using *logical replication* or *pg_dump/restore* for a clean
>     rebuild.

I am not sure version 18 would a good choice at this time, it has just 
been released and has no bug fix releases against it yet. Given the 
other complications in your upgrade, OS upgrade and multi-version jump, 
I don't think you want to add a brand new version to the mix. Postgres 
17 on the other hand has been out a year and has had 6 bug/security 
fixes. It is probably a more stable target version.

-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Thu, Oct 23, 2025 at 11:21 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
  • Acceptable downtime: ~1 day

  • Logical replication: Not feasible due to the number of schemas, tables, and overall data volume

I'm not sure why this is not feasible. Can you expand on this?

* For a 15 TB database with roughly 1 day downtime, what would be the most reliable approach to migrate from RHEL 7 → RHEL 9 while avoiding collation/index corruption issues?

pg_dump is the most reliable, and the slowest. Keep in mind that only the actual data needs to move over (not the indexes, which get rebuilt after the data is loaded). You could also mix-n-match pg_logical and pg_dump if you have a few tables that are super large. Whether either approach fits in your 24 hour window is hard to say without you running some tests.

Last year. I did a dump/restore of a 4.3TB (inclusive of indices; heavy on poorly-compressible BYTEA) database from RHEL6 + 9.6.24 to RHEL 8 + 14.latest.  It took just under 11 hours.

Gzip Level = 1
Remote database size: 4307406 MB
RemoteThreads: 16
LocalThreads: 24
SharedBuffs: 32 GB
MaintWorkMem: 3 GB
CheckPoint: 30 min
MaxWalSize: 36 GB
WalBuffs: 128 MB

Both systems were SAN-attached ESX VMs on the same virtual network

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Hi,

This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to rebuild all indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour)


Devrim


On 22 October 2025 12:03:06 EEST, Bala M <krishna.pgdba@gmail.com> wrote:

Hi Team,

We are facing an issue related to index corruption after migrating our PostgreSQL 11 setup from RHEL 7 to RHEL 9 using streaming replication (base backup method).

After bringing up the standby on RHEL 9, we observed that certain tables are not returning results when queries use indexed scans. Upon investigation, the following check confirms index corruption:

The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records.
Rebuilding the indexes fixed the issue temporarily but we have many indexes and our DB size is more than 10TB.

Environment details:

  • PostgreSQL Version: 11.15

  • OS on primary: RHEL 7.9

  • OS on standby: RHEL 9.6

  • Replication Type: Streaming replication (initialized using pg_basebackup)

  • Data Directory initialized from RHEL 7 base backup

Issue Summary:

  • Indexes appear and are the same size as per prod  on standby after base backup restore.

  • We are able to read  the data from the tables without index scans on standy by RHEL 9.

  • No filesystem or WAL errors observed in logs.

Could this be related to OS-level binary or page layout differences between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.


Thanks & Regards,
krishna.


On Fri, 24 Oct 2025 at 07:17, Devrim Gündüz <devrim@gunduz.org> wrote:
> This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to
rebuildall indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour) 

There was a mention in [1] about a wiki page which includes a query to
identify which indexes could be affected. While that *could* be all
indexes, it does seem unlikely.

David

[1] https://postgr.es/m/CAOg7f80JJCMHXhNLUjnHMwBFV-f9Msv4L=PML0CdaZ4n47uEeA@mail.gmail.com



 

Just wanted to clarify, this failed because OP used streaming which may not work between OS versions?  But logical would have been just fine, between OS and PG versions or even to Windows?  I always assumed streaming would “just work” as long as it’s the same major PG version and Linux-to-Linux regardless of OS/glibc version.  That’s an awesome piece of design work BTW… congratulations and thank you for that.  It never occurred to me that there could be an OS influencing factor like the glibc version for streaming replication.

 

I almost took the same streaming path about two months ago to go from RH7 to RH8; instead opted to take some downtime overnight and do a dump/restore to go from RH7 to RH8 because logical replication was not an option and it seemed safer.  Sounds like I chose the better path by chance.

 

Thank you for sharing your knowledge. 

 

 

 


Scot Kreienkamp | Applications Infrastructure Architect | La-Z-Boy Corporate 
(734) 384-6403 | 1-734-915-1444 | Scot.Kreienkamp@la-z-boy.com
One La-Z-Boy Drive | Monroe, Michigan 48162 | la-z-boy.com
 facebook.com/lazboy  | instagram.com/lazboy | youtube.com/lazboy

LaZboy Logo

From: Devrim Gündüz <devrim@gunduz.org>
Sent: Thursday, October 23, 2025 2:17 PM
To: pgsql-general@lists.postgresql.org; Bala M <krishna.pgdba@gmail.com>
Subject: Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)

 

Hi,

This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to rebuild all indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour)


Devrim

 

On 22 October 2025 12:03:06 EEST, Bala M <krishna.pgdba@gmail.com> wrote:

Hi Team,

We are facing an issue related to index corruption after migrating our PostgreSQL 11 setup from RHEL 7 to RHEL 9 using streaming replication (base backup method).

After bringing up the standby on RHEL 9, we observed that certain tables are not returning results when queries use indexed scans. Upon investigation, the following check confirms index corruption:

The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records. 
Rebuilding the indexes  fixed the issue temporarily but we have many indexes  and our DB size is more than 10TB.

Environment details:

· PostgreSQL Version: 11.15

· OS on primary: RHEL 7.9

· OS on standby: RHEL 9.6

· Replication Type: Streaming replication (initialized using pg_basebackup)

· Data Directory initialized from RHEL 7 base backup

Issue Summary:

· Indexes appear and are the same size as per prod  on standby after base backup restore.

· We are able to read  the data from the tables without index scans on standy by RHEL 9.

· No filesystem or WAL errors observed in logs.

Could this be related to OS-level binary or page layout differences between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.

 

Thanks & Regards,
krishna.

 

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

Вложения
On Thu, Oct 23, 2025 at 4:24 PM Scot Kreienkamp <Scot.Kreienkamp@la-z-boy.com> wrote:


I always assumed streaming would “just work” as long as it’s the same major PG version and Linux-to-Linux regardless of OS/glibc version....It never occurred to me that there could be an OS influencing factor like the glibc version for streaming replication.


In addition to the locale checking when things are accessed, when you bring up a database cluster it checks some pg_controldata entries to make sure they match what the server's source code was built with.  If any of them are off, it won't run against those databases.

As a simple example that happens sometimes, if your main Linux PG install increased the block size changed at compile time, a different PG binary built with the default sizes will fail trying to read data from the modified one.  Because all these compile options have to match, sometimes you can't migrate a database built with one Linux distribution to another.  When that happens it's sometimes possible to hack together a custom build that matches the origin primary better, but now you're into packaging your own PG binaries.

On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
I am not sure version 18 would a good choice at this time, it has just been released and has no bug fix releases against it yet. Given the
other complications in your upgrade, OS upgrade and multi-version jump, I don't think you want to add a brand new version to the mix. Postgres
17 on the other hand has been out a year and has had 6 bug/security fixes. It is probably a more stable target version.

I hear your concern, but I'm pretty confident in v18. Additionally, they are on version 11 (eleven!) so obviously major upgrades are a rare thing, so might as well buy themselves another year. :)

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

On Thu, Oct 23, 2025 at 10:51 AM Bala M <krishna.pgdba@gmail.com> wrote:
Any advice, recommendations, or shared experiences from others who have performed similar migrations would be greatly appreciated.

Some related advice: put some system in place to make sure you are always running the latest revision in whatever major version you end up. See:


In particular, note this part:

The community considers performing minor upgrades to be less risky than continuing to run an old minor version.

You also want to put something in place to make sure your major version does not  fall so far behind again. You don't need to upgrade every year, but certainly target a major upgrade every 2-3 years. As you will discover, the major upgrade process is going to be much easier than this current upgrade we are talking about in this thread.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

On Fri, Oct 24, 2025 at 8:59 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Thu, Oct 23, 2025 at 10:51 AM Bala M <krishna.pgdba@gmail.com> wrote:
Any advice, recommendations, or shared experiences from others who have performed similar migrations would be greatly appreciated.

Some related advice: put some system in place to make sure you are always running the latest revision in whatever major version you end up. See:


In particular, note this part:

The community considers performing minor upgrades to be less risky than continuing to run an old minor version.

Not only that, but applying minor version update RPMs is fast.
Last weekend, this took 12 seconds to run:
alias suip='sudo -iupostgres'
suip pg_ctl stop -mfast -wt9999 \
    && sudo yum install -y Pg17.6_RHEL8/*rpm \
    && suip pg_ctl start -wt9999 \
    && psql -Xc "select version()"

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/24/25 05:53, Greg Sabino Mullane wrote:
> On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     I am not sure version 18 would a good choice at this time, it has
>     just been released and has no bug fix releases against it yet. Given
>     the
>     other complications in your upgrade, OS upgrade and multi-version
>     jump, I don't think you want to add a brand new version to the mix.
>     Postgres
>     17 on the other hand has been out a year and has had 6 bug/security
>     fixes. It is probably a more stable target version.
> 
> 
> I hear your concern, but I'm pretty confident in v18. Additionally, they 

Which is pretty much the criteria for a GA release, confidence that it 
is ready to face the general public. That is not the same thing as it 
being guaranteed bug free. Only time facing the manipulations of said 
public proves how many bugs there are and how bad they are.

> are on version 11 (eleven!) so obviously major upgrades are a rare 
> thing, so might as well buy themselves another year. :)

Major upgrade being the operative phrase here. The OP and company are 
going to face enough new hoops to jump through in the move from 11, 
adding a new release to that mix is pushing it a little too far in my 
estimation.

> 
> Cheers,
> Greg
> 
> --
> Crunchy Data - https://www.crunchydata.com <https://www.crunchydata.com>
> Enterprise Postgres Software Products & Tech Support
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/24/25 05:53, Greg Sabino Mullane wrote:
> On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     I am not sure version 18 would a good choice at this time, it has
>     just been released and has no bug fix releases against it yet. Given
>     the
>     other complications in your upgrade, OS upgrade and multi-version
>     jump, I don't think you want to add a brand new version to the mix.
>     Postgres
>     17 on the other hand has been out a year and has had 6 bug/security
>     fixes. It is probably a more stable target version.
>
>
> I hear your concern, but I'm pretty confident in v18. Additionally, they

Which is pretty much the criteria for a GA release, confidence that it
is ready to face the general public. That is not the same thing as it
being guaranteed bug free. Only time facing the manipulations of said
public proves how many bugs there are and how bad they are.

"Never trust a .0 release with important data" is just as true in 2025 as it was in 1985.

That's a chicken and egg problem, though, isn't it?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/24/25 08:00, Ron Johnson wrote:
> On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

> "Never trust a .0 release with important data" is just as true in 2025 
> as it was in 1985.
> 
> That's a chicken and egg problem, though, isn't it?

There is nothing stopping you from setting up a test instance and 
kicking the tires on a new release to see if your setup will work 
correctly. Though the issue in this case is deciding what part of the 15 
TB you would test against or finding the resources to replicate the 
entire setup.


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Thu, 23 Oct 2025 at 17:21, Greg Sabino Mullane <htamfids@gmail.com> wrote

pg_dump is the most reliable, and the slowest. Keep in mind that only the actual data needs to move over (not the indexes, which get rebuilt after the data is loaded). You could also mix-n-match pg_logical and pg_dump if you have a few tables that are super large. Whether either approach fits in your 24 hour window is hard to say without you running some tests.

Long time ago I had a similar problem and did a "running with scissors" restore. This means:

1.- Prepare normal configuration, test, etc for the new version.
2.- Prepare a restore configuration, with fsync=off, wallevel=minimal, whatever option gives you any speed advantage.

As the target was empty, if restore failed we could just clean and restart.

3.- Dump, boot with the restore configuration, restore, clean shutdown, switch to production configuration, boot again and follow on.

Time has passed and I lost my notes, but I remember the restore was much faster than doing it with the normal production configuration. Given current machine speeds, it maybe doable.


Francisco Olarte.

On Sat, 25 Oct 2025 at 04:51, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 10/24/25 08:00, Ron Johnson wrote:
> > On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver
> > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
> > "Never trust a .0 release with important data" is just as true in 2025
> > as it was in 1985.
> >
> > That's a chicken and egg problem, though, isn't it?
>
> There is nothing stopping you from setting up a test instance and
> kicking the tires on a new release to see if your setup will work
> correctly.

I'd say it's exactly that attitude that causes people to think .0
should be avoided. Beta versions are meant for test instances. It'd be
good if people encouraged their use more often rather than pushing
people to defer til GA.

David



On Friday, October 24, 2025, David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, 25 Oct 2025 at 04:51, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 10/24/25 08:00, Ron Johnson wrote:
> > On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver
> > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
> > "Never trust a .0 release with important data" is just as true in 2025
> > as it was in 1985.
> >
> > That's a chicken and egg problem, though, isn't it?
>
> There is nothing stopping you from setting up a test instance and
> kicking the tires on a new release to see if your setup will work
> correctly.

I'd say it's exactly that attitude that causes people to think .0
should be avoided. Beta versions are meant for test instances. It'd be
good if people encouraged their use more often rather than pushing
people to defer til GA.

I’d say it’s negativity bias that causes this.  Remembering the times serious bugs were discovered in .0-.2 releases strongly leans one to avoid .0-.2 releases.  Counteracting that bias with data seems needed.  Yes, it would be great if more people tested betas; but there is no way for users to have any idea of how much or how effective such efforts have been for a given release.

If someone comes here asking for advice I’m going to be conservative.  The ones who are going to be pushing the envelope or doing beta testing are not asking for this kind of advice.  They do it likely to benefit from new features and some subset does it to be the canary for the community because they can handle the rare negative outcome.

David J.

On 10/24/25 15:37, David Rowley wrote:
> On Sat, 25 Oct 2025 at 04:51, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 10/24/25 08:00, Ron Johnson wrote:
>>> On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver
>>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>>
>>> "Never trust a .0 release with important data" is just as true in 2025
>>> as it was in 1985.
>>>
>>> That's a chicken and egg problem, though, isn't it?
>>
>> There is nothing stopping you from setting up a test instance and
>> kicking the tires on a new release to see if your setup will work
>> correctly.
> 
> I'd say it's exactly that attitude that causes people to think .0
> should be avoided. Beta versions are meant for test instances. It'd be
> good if people encouraged their use more often rather than pushing
> people to defer til GA.

1) From previous posts to this list folks have mentioned their 
organizations prohibit touching anything less then a GA or maybe a late 
RC. That comes from on high and I doubt the folks issuing the orders are 
on this list.

2) The attitude comes from lessons learned in the School of Hard Knocks. 
Until someone or someones can guarantee a new GA release will not eat 
your data or spring security leaks then the prudent thing to do is wait 
to see what happens when it hits the world at large. I learned this 
lesson, pitfalls of jumping into something new, across fields outside of 
software as well. In other words 'new and improved' is not always the 
case, see 737 MAX as case in point.

3) Progress happens and you need to keep up. A little caution is good 
thing though, especially if you are the one who is being held 
responsible for any adverse outcomes.

> 
> David


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Sat, 25 Oct 2025 at 13:40, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 10/24/25 15:37, David Rowley wrote:
> > On Sat, 25 Oct 2025 at 04:51, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> >>
> >> On 10/24/25 08:00, Ron Johnson wrote:
> > I'd say it's exactly that attitude that causes people to think .0
> > should be avoided. Beta versions are meant for test instances. It'd be
> > good if people encouraged their use more often rather than pushing
> > people to defer til GA.
>
> 1) From previous posts to this list folks have mentioned their
> organizations prohibit touching anything less then a GA or maybe a late
> RC. That comes from on high and I doubt the folks issuing the orders are
> on this list.

That seems bizarre to me. If they want new releases of PostgreSQL to
be as stable as possible as early as possible, then beta and RC are
much better times to test than .0 is.

> 2) The attitude comes from lessons learned in the School of Hard Knocks.
> Until someone or someones can guarantee a new GA release will not eat
> your data or spring security leaks then the prudent thing to do is wait
> to see what happens when it hits the world at large. I learned this
> lesson, pitfalls of jumping into something new, across fields outside of
> software as well. In other words 'new and improved' is not always the
> case, see 737 MAX as case in point.

I don't see why this reason is applicable at all to my statement. I
didn't state that everyone should go and run with .0. I said we
shouldn't encourage people to test beta and RC versions, as if they
don't do that then .0 won't be as stable as if they did test (and
report issues). It seems like simple cause and effect to me.

> 3) Progress happens and you need to keep up. A little caution is good
> thing though, especially if you are the one who is being held
> responsible for any adverse outcomes.

We're talking test servers here. I assume they can be recreated
without too much trouble.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 25 Oct 2025 at 13:40, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> 1) From previous posts to this list folks have mentioned their
>> organizations prohibit touching anything less then a GA or maybe a late
>> RC. That comes from on high and I doubt the folks issuing the orders are
>> on this list.

> That seems bizarre to me. If they want new releases of PostgreSQL to
> be as stable as possible as early as possible, then beta and RC are
> much better times to test than .0 is.

I think the folks issuing that sort of order believe that testing is
Somebody Else's Problem.  The folly of that approach is pretty evident
to those of us toiling in the software trenches, but maybe not so much
from the C-suite.

            regards, tom lane



On Sat, 25 Oct 2025 at 14:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Sat, 25 Oct 2025 at 13:40, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> >> 1) From previous posts to this list folks have mentioned their
> >> organizations prohibit touching anything less then a GA or maybe a late
> >> RC. That comes from on high and I doubt the folks issuing the orders are
> >> on this list.
>
> > That seems bizarre to me. If they want new releases of PostgreSQL to
> > be as stable as possible as early as possible, then beta and RC are
> > much better times to test than .0 is.
>
> I think the folks issuing that sort of order believe that testing is
> Somebody Else's Problem.  The folly of that approach is pretty evident
> to those of us toiling in the software trenches, but maybe not so much
> from the C-suite.

Yeah, I expect that'll be true all too often. However, it doesn't mean
that people around here can't attempt to propagate an education to
them on why they're wrong. There certainly is much less hope if they
only receive confirmation.

David



On 10/24/25 18:06, David Rowley wrote:
> On Sat, 25 Oct 2025 at 13:40, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>

>> 2) The attitude comes from lessons learned in the School of Hard Knocks.
>> Until someone or someones can guarantee a new GA release will not eat
>> your data or spring security leaks then the prudent thing to do is wait
>> to see what happens when it hits the world at large. I learned this
>> lesson, pitfalls of jumping into something new, across fields outside of
>> software as well. In other words 'new and improved' is not always the
>> case, see 737 MAX as case in point.
> 
> I don't see why this reason is applicable at all to my statement. I
> didn't state that everyone should go and run with .0. I said we
> shouldn't encourage people to test beta and RC versions, as if they
> don't do that then .0 won't be as stable as if they did test (and
> report issues). It seems like simple cause and effect to me.

I am not following, from your previous post:

"Beta versions are meant for test instances. It'd be
good if people encouraged their use more often rather than pushing
people to defer til GA"

That seems to be the opposite of what you say above.

> 
>> 3) Progress happens and you need to keep up. A little caution is good
>> thing though, especially if you are the one who is being held
>> responsible for any adverse outcomes.
> 
> We're talking test servers here. I assume they can be recreated
> without too much trouble.

Yes, but the OP was talking about upgrading a production database 
directly to 18. That was what my reply was referring to and what I was 
counseling against.

> 
> David


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Sat, 25 Oct 2025 at 17:36, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> I am not following, from your previous post:
>
> "Beta versions are meant for test instances. It'd be
> good if people encouraged their use more often rather than pushing
> people to defer til GA"
>
> That seems to be the opposite of what you say above.

I think you think that because you misunderstood what I said in [1].
I'll rephrase it for you:

Because people promote the .0 as not yet production-ready, it means
that fewer people bother testing with beta and RC versions. Lack of
beta testing is what causes .0 to contain more bugs than it otherwise
might, so my suggestion is that we should be encouraging people to run
beta and RC in their test environments to try to increase the
stability of .0 versions.

I struggle to imagine anyone with any respect for the PostgreSQL
project disagreeing with that, so I suspect you and Johnson must have
misunderstood.

Also, I was highlighting that I agreed with the paradox pointed out by Ron.

Clear?

> >> 3) Progress happens and you need to keep up. A little caution is good
> >> thing though, especially if you are the one who is being held
> >> responsible for any adverse outcomes.
> >
> > We're talking test servers here. I assume they can be recreated
> > without too much trouble.
>
> Yes, but the OP was talking about upgrading a production database
> directly to 18. That was what my reply was referring to and what I was
> counseling against.

You should pay more attention to the quotes above my reply. I don't
see any of my replies quoting anything about upgrading a production
database to 18.0.

David

[1] https://postgr.es/m/CAApHDvo5pLSwD4JVxW8t94Vc88djKrrSyrL-GbQJ=hb2NVT5zg@mail.gmail.com



On 10/24/25 18:53, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> On Sat, 25 Oct 2025 at 13:40, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> 1) From previous posts to this list folks have mentioned their
>>> organizations prohibit touching anything less then a GA or maybe a late
>>> RC. That comes from on high and I doubt the folks issuing the orders are
>>> on this list.
> 
>> That seems bizarre to me. If they want new releases of PostgreSQL to
>> be as stable as possible as early as possible, then beta and RC are
>> much better times to test than .0 is.
> 
> I think the folks issuing that sort of order believe that testing is
> Somebody Else's Problem.  The folly of that approach is pretty evident
> to those of us toiling in the software trenches, but maybe not so much
> from the C-suite.

Yes and no. Yes someone needs to test new software, no that is not for 
everyone. The presence and uptake of LTS software indicates that being 
on the edge of development is something folks would like to avoid. I 
know as I use LTS versions of Ubuntu and Django for that very reason, I 
want to get work done without worry about having to pick up 'shiny bits 
and pieces'. That being said, it allows me the time to kick the tires on 
new software, of current interest Polars and Duckdb. It comes down to 
time available and what you want to spend that time on. That could very 
well mean making testing 'Somebody Else's Problem', in the way that 
Postgresql Core has made packaging some else's problem:

https://www.postgresql.org/message-id/427155.1755538776@sss.pgh.pa.us

"
The core project wouldn't document that, because we just ship source
code not RPMs.  There might be a README in the core postgresql RPM
package telling you what's what.
"

> 
>             regards, tom lane


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On 2025-10-25 17:50:59 +1300, David Rowley wrote:
> On Sat, 25 Oct 2025 at 17:36, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> > I am not following, from your previous post:
> >
> > "Beta versions are meant for test instances. It'd be
> > good if people encouraged their use more often rather than pushing
> > people to defer til GA"
> >
> > That seems to be the opposite of what you say above.
>
> I think you think that because you misunderstood what I said in [1].
> I'll rephrase it for you:
>
> Because people promote the .0 as not yet production-ready, it means
> that fewer people bother testing with beta and RC versions. Lack of
> beta testing is what causes .0 to contain more bugs than it otherwise
> might, so my suggestion is that we should be encouraging people to run
> beta and RC in their test environments to try to increase the
> stability of .0 versions.

Yes, but in your previous message you wrote:

| I said we shouldn't encourage people to test beta and RC versions,

Note: "shouldn't" instead of "should". Which is exactly the opposite.

I (apparently correctly) assumed that this was a typo, but that wasn't
entirely clear and Adrian obviously took it at face value.

> > > We're talking test servers here.
> >
> > Yes, but the OP was talking about upgrading a production database
> > directly to 18. That was what my reply was referring to and what I was
> > counseling against.
>
> You should pay more attention to the quotes above my reply. I don't
> see any of my replies quoting anything about upgrading a production
> database to 18.0.

Well, You came into this thread replying to Adrian, so maybe you should
have paid more attention to what he was (originally) replying to? Or at
least make it clearer that you are changing the subject? Something like
"I agree in this specific case, but in general ..." is often helpful to
alert people that you aren't commenting on the specific problem they are
currently thinking about.

        hjp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения
On 10/24/25 21:50, David Rowley wrote:
> On Sat, 25 Oct 2025 at 17:36, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> I am not following, from your previous post:
>>
>> "Beta versions are meant for test instances. It'd be
>> good if people encouraged their use more often rather than pushing
>> people to defer til GA"
>>
>> That seems to be the opposite of what you say above.
> 
> I think you think that because you misunderstood what I said in [1].
> I'll rephrase it for you:
> 
> Because people promote the .0 as not yet production-ready, it means
> that fewer people bother testing with beta and RC versions. Lack of
> beta testing is what causes .0 to contain more bugs than it otherwise
> might, so my suggestion is that we should be encouraging people to run
> beta and RC in their test environments to try to increase the
> stability of .0 versions.

Alright that I understand, though not necessarily agree with. I would 
say lack of testing has more to do with time/money management. 
Organizations don't want to spend either until: 1) They see the dust 
settle on what is going to end up in the release. 2) Whether there is 
anything interesting enough to invest both in moving to a new release. 
Maybe there is a compelling argument that can be made to get those 
organizations off the fence. I just don't what it is as you would have 
to convince them to spend time and money rather then just wait and let 
the community as a whole do the work.

> 
> I struggle to imagine anyone with any respect for the PostgreSQL
> project disagreeing with that, so I suspect you and Johnson must have
> misunderstood.

No we didn't misunderstand, we where responding to what the OP was 
proposing which was jumping a production instance from 11 --> 18. That 
is a different case then promoting testing of in beta's and rc's.

> David

-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Sat, Oct 25, 2025 at 10:21 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/24/25 21:50, David Rowley wrote:
> On Sat, 25 Oct 2025 at 17:36, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> I am not following, from your previous post:
>>
>> "Beta versions are meant for test instances. It'd be
>> good if people encouraged their use more often rather than pushing
>> people to defer til GA"
>>
>> That seems to be the opposite of what you say above.
>
> I think you think that because you misunderstood what I said in [1].
> I'll rephrase it for you:
>
> Because people promote the .0 as not yet production-ready, it means
> that fewer people bother testing with beta and RC versions. Lack of
> beta testing is what causes .0 to contain more bugs than it otherwise
> might, so my suggestion is that we should be encouraging people to run
> beta and RC in their test environments to try to increase the
> stability of .0 versions.

Alright that I understand, though not necessarily agree with. I would
say lack of testing has more to do with time/money management.
Organizations don't want to spend either until: 1) They see the dust
settle on what is going to end up in the release. 2) Whether there is
anything interesting enough to invest both in moving to a new release.
Maybe there is a compelling argument that can be made to get those
organizations off the fence. I just don't what it is as you would have
to convince them to spend time and money rather then just wait and let
the community as a whole do the work.\

Contractual requirements to not run EOL software are a strong motivator to migrate to newer versions of OS and RDBMS.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 2025-10-25 10:39:35 -0400, Ron Johnson wrote:
> On Sat, Oct 25, 2025 at 10:21 AM Adrian Klaver <adrian.klaver@aklaver.com>
> wrote:
>
>     On 10/24/25 21:50, David Rowley wrote:
>     > Because people promote the .0 as not yet production-ready, it means
>     > that fewer people bother testing with beta and RC versions. Lack of
>     > beta testing is what causes .0 to contain more bugs than it otherwise
>     > might, so my suggestion is that we should be encouraging people to run
>     > beta and RC in their test environments to try to increase the
>     > stability of .0 versions.
>
>     Alright that I understand, though not necessarily agree with. I would
>     say lack of testing has more to do with time/money management.
>     Organizations don't want to spend either until: 1) They see the dust
>     settle on what is going to end up in the release. 2) Whether there is
>     anything interesting enough to invest both in moving to a new release.
>     Maybe there is a compelling argument that can be made to get those
>     organizations off the fence. I just don't what it is as you would have
>     to convince them to spend time and money rather then just wait and let
>     the community as a whole do the work.\
>
>
> Contractual requirements to not run EOL software are a strong motivator to
> migrate to newer versions of OS and RDBMS.

But not to test beta or RC versions. With PostgreSQL's 5 year maintenance
window there is plenty of time to test with final versions before doing
the upgrade. For example, if a company adopted a policy of waitig for
the x.2 release, then test for 6 months before upgrading there production
servers they would still get 4 years of production use out of that major
version before having to upgrade again.

Personally, I usually use the current version for new databases (so if
I started a new project right now I'd use 18.0), but I'm conservative
about upgrades (so I usually don't upgrade to a new major version until
EOL is near and if I still had a 13.x, I'd probably go for 17.6 instead of
18.0). I also admit that I test beta versions only very rarely (basically
only if there is a new feature I'm really excited about).

        hjp
--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения
Thank you all for your suggestions, 

Thanks for your quick response and for sharing the details.
After reviewing the options, the logical replication approach seems to be the most feasible one with minimal downtime.

However, we currently have 7 streaming replication setups running from production, with a total database size of around 15 TB. Out of this, there are about 10 large tables ranging from 1 TB (max) to 50 GB (min) each, along with approximately 150+ sequences.

Could you please confirm if there are any successful case studies or benchmarks available for a similar setup?
Additionally, please share any recommended parameter tuning or best practices for handling logical replication at this scale.

Current server configuration:

CPU: 144 cores

RAM: 512 GB


Thanks & Regards
Krishna.


On Fri, 24 Oct 2025 at 21:55, Francisco Olarte <folarte@peoplecall.com> wrote:

On Thu, 23 Oct 2025 at 17:21, Greg Sabino Mullane <htamfids@gmail.com> wrote

pg_dump is the most reliable, and the slowest. Keep in mind that only the actual data needs to move over (not the indexes, which get rebuilt after the data is loaded). You could also mix-n-match pg_logical and pg_dump if you have a few tables that are super large. Whether either approach fits in your 24 hour window is hard to say without you running some tests.

Long time ago I had a similar problem and did a "running with scissors" restore. This means:

1.- Prepare normal configuration, test, etc for the new version.
2.- Prepare a restore configuration, with fsync=off, wallevel=minimal, whatever option gives you any speed advantage.

As the target was empty, if restore failed we could just clean and restart.

3.- Dump, boot with the restore configuration, restore, clean shutdown, switch to production configuration, boot again and follow on.

Time has passed and I lost my notes, but I remember the restore was much faster than doing it with the normal production configuration. Given current machine speeds, it maybe doable.


Francisco Olarte.