Don Seiler <don@seiler.us> writes:
> On Tue, Mar 17, 2020 at 8:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yikes. Well, if there aren't obvious operational problems, it might be
>> that the data is actually UTF8-clean, or almost entirely so. Maybe you
>> could look at the problem as being one of validation.
> For this test, would we restore into an en_US.UTF-8/UTF8 database? Then,
> assuming no errors (or fixing any errors until clean), we change the
> datcollate/datctype settings in prod and proceed with pg_upgrade (obviously
> after testing all of that heavily)?
Yeah, that's the basic idea.
> What are the ramifications of changing collation like that? Should we
> consider rebuilding indexes ASAP after that?
Text indexes would definitely be at risk here. I'm not really certain
how bad the problem would be. Do you have a feeling for how much of
the data is 100% ASCII? If you could be sure of that for any given
column, you wouldn't have to reindex indexes on that column.
regards, tom lane