Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception

Поиск
Список
Период
Сортировка
От david@lang.hm
Тема Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Дата
Msg-id alpine.DEB.1.10.0808281936090.2713@asgard.lang.hm
обсуждение исходный текст
Ответ на Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Ответы Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-performance
On Thu, 28 Aug 2008, Scott Marlowe wrote:

> On Thu, Aug 28, 2008 at 7:53 PM, Matthew Dennis <mdennis@merfer.net> wrote:
>> On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe <scott.marlowe@gmail.com>
>> wrote:
>>>
>>>> wait a min here, postgres is supposed to be able to survive a complete
>>>> box
>>>> failure without corrupting the database, if killing a process can
>>>> corrupt
>>>> the database it sounds like a major problem.
>>>
>>> Yes it is a major problem, but not with postgresql.  It's a major
>>> problem with the linux OOM killer killing processes that should not be
>>> killed.
>>>
>>> Would it be postgresql's fault if it corrupted data because my machine
>>> had bad memory?  Or a bad hard drive?  This is the same kind of
>>> failure.  The postmaster should never be killed.  It's the one thing
>>> holding it all together.
>>
>> I fail to see the difference between the OOM killing it and the power going
>> out.
>
> Then you fail to understand.
>
> scenario 1:  There's a postmaster, it owns all the child processes.
> It gets killed.  The Postmaster gets restarted.  Since there isn't one

when the postmaster gets killed doesn't that kill all it's children as
well?

> running, it comes up.  starts new child processes.  Meanwhile, the old
> child processes that don't belong to it are busy writing to the data
> store.  Instant corruption.

if so then the postmaster should not only check if there is an existing
postmaster running, it should check for the presense of the child
processes as well.

> scenario 2: Someone pulls the plug.  Every postgres child dies a quick
> death.  Data on the drives is coherent and recoverable.
>>>  And yes, if the power went out and PG came up with a corrupted DB
>> (assuming I didn't turn off fsync, etc) I *would* blame PG.
>
> Then you might be wrong.  If you were using the LVM, or certain levels
> of SW RAID, or a RAID controller with cache with no battery backing
> that is set to write-back, or if you were using an IDE or SATA drive /
> controller that didn't support write barriers, or using NFS mounts for
> database storage, and so on.

these all fall under "(assuming I didn't turn off fsync, etc)"

> My point being that PostgreSQL HAS to
> make certain assumptions about its environment that it simply cannot
> directly control or test for.  Not having the postmaster shot in the
> head while the children keep running is one of those things.
>
>>  I understand
>> that killing the postmaster could stop all useful PG work, that it could
>> cause it to stop responding to clients, that it could even "crash" PG, et
>> ceterabut if a particular process dying causes corrupted DBs, that sounds
>> borked to me.
>
> Well, design a better method and implement it.  If everything went
> through the postmaster you'd be lucky to get 100 transactions per
> second.

well, if you aren't going through the postmaster, what process is
recieving network messages? it can't be a group of processes, only one can
be listening to a socket at one time.

and if the postmaster isn't needed for the child processes to write to the
datastore, how are multiple child processes prevented from writing to the
datastore normally? and why doesn't that mechanism continue to work?

>  There are compromises between performance and reliability
> under fire that have to be made.  It is not unreasonable to assume
> that your OS is not going to randomly kill off processes because of a
> dodgy VM implementation quirk.
>
> P.s. I'm a big fan of linux, and I run my dbs on it.  But I turn off
> overcommit and make a few other adjustments to make sure my database
> is safe.  The OOM killer as a default is fine for workstations, but
> it's an insane setting for servers, much like swappiness=60 is an
> insane setting for a server too.

so are you saying that the only possible thing that can kill the
postmaster is the OOM killer? it can't possilby exit in any other
situation without the children being shutdown first?

I would be surprised if that was really true.

David Lang

В списке pgsql-performance по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: select on 22 GB table causes "An I/O error occured while sending to the backend." exception
Следующее
От: Greg Smith
Дата:
Сообщение: Re: How to setup disk spindles for best performance