Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
Дата	7 октября 2011 г. 05:36:23
Msg-id	8919.1317965775@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections (Sean Laurent <sean@studyblue.com>)
Ответы	Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections
Список	pgsql-general

Дерево обсуждения

Sean Laurent <sean@studyblue.com> writes:
> We've been running into a particularly strange problem that I'm trying to
> better understand. The super short version is that our application servers
> lose their connection to the database when I run a backup during periods of
> higher load and fail to reconnect.

> Here's an overview of the setup:

> - PostgreSQL 9.0.1 hosted on a cc1.4xlarge Amazon EC2 instance running
> CentOS 5.6
> - 8 disk RAID-0 array of EBS volumes used for primary data storage
> - 4 disk RAID-0 array of EBS volumes used for transaction logs
> - Root partition is ext3
> - RAID arrays are xfs

> Backups are taken using a script that runs the following workflow:

> - Tell Postgres to start a backup: SELECT pg_start_backup('RAID backup');
> - Run "xfs_freeze" on the primary RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the primary RAID array
> - Run "xfs_freeze" on the transaction log RAID array
> - Tell Amazon to take snapshots of each of the EBS volumes
> - Run "xfs_freeze -u" to thaw the transaction log RAID array
> - Tell Postgres the backup is finished: SELECT pg_stop_backup();
> - Remove old WAL files

> The whole process takes roughly 7 seconds on average. The RAID arrays are
> frozen for roughly 2 seconds on average.

> Within a few seconds of the backup, our application servers start throwing
> exceptions that indicate the database connection was closed. Meanwhile,
> Postgres still shows the connections and we start seeing a really high
> number (for us) of locks in the database. The application servers refuse to
> recover and must be killed and restarted. Once they're killed off, the
> connections actually go away and the locks disappear.

That's just weird.  It sounds like the "xfs_freeze" operation, or the
snapshotting operation, is somehow interrupting network traffic.  I'd
not expect such a thing on a normal server, but who knows what's
connected to what in an Amazon EC2 instance?

Anyway, I'd suggest trying to instrument something to prove or disprove
that there's a networking failure involved.  It might be as simple as
watching "ping" behavior ...

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее

От: Toby Corkindale
Дата: 07 октября 2011 г., 03:21:50
Сообщение: Re: Connection Pooling

Следующее

От: Adarsh Sharma
Дата: 07 октября 2011 г., 07:13:19
Сообщение: Retrieve Future Timestamp Values

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Postgres 9.01, Amazon EC2/EBS, XFS, JDBC and lost connections

Предыдущее

Следующее