On Fri, Oct 8, 2021 at 02:27:55PM -0700, Wells Oliver wrote:
> Hi: I am restoring a ~100GB backup using 16 jobs from an EC2 instance to an RDS
> instance (db.m6g.xlarge, which is 16GB RAM and 4 CPU) and it's dying midway
> with the dreaded "SSL SYSCALL error: EOF detected" error.
>
> I did create a parameter group to hopefully speed the restoration process, it
> includes:
>
> - wal_buffers 8192 (64MB)
> - checkpoint_timeout 3600 (1h)
> - min_wal_size 192 (192MB)
> - max_wal_size 102400 (100GB)
> - shared_buffers 524288 (4GB)
> - synchronous_commit 0 (off)
> - autovacuum 0 (off)
> - maintenance_work_mem 2097152 (2GB)
> - work_mem 32768 (32MB)
>
> I sourced these from a few different folks as well as some trial and error, but
> now it's blowing up on me.
>
> If I revert the RDS instance back to default PG parameters, it restores, but it
> takes 3x the time.
Wow, that is weird. I see that error string happening when PG can't
extend the receipt buffer on the client side:
appendPQExpBufferStr(&conn->errorMessage,
libpq_gettext("SSL SYSCALL error: EOF detected\n"));
Can you check the server logs to see if there is any error there? If I
had to take a guess, I would reduce maintenance_work_mem to 1GB and
retest.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.