Обсуждение: a provocative question?

Поиск
Список
Период
Сортировка

a provocative question?

От
TJ O'Donnell
Дата:
I am getting in the habit of storing much of my day-to-day
information in postgres, rather than "flat" files.
I have not had any problems of data corruption or loss,
but others have warned me against abandoning files.
I like the benefits of enforced data types, powerful searching,
data integrity, etc.
But I worry a bit about the "safety" of my data, residing
in a big scary database, instead of a simple friendly
folder-based files system.

I ran across this quote on Wikipedia at
http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29

"Text files are also much safer than databases, in that should disk
corruption occur, most of the mail is likely to be unaffected, and any
that is damaged can usually be recovered."

How naive (optimistic?) is it to think that "the database" can
replace "the filesystem"?

TJ O'Donnell
http://www.gnova.com/

Re: a provocative question?

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/06/07 10:43, TJ O'Donnell wrote:
> I am getting in the habit of storing much of my day-to-day
> information in postgres, rather than "flat" files.
> I have not had any problems of data corruption or loss,
> but others have warned me against abandoning files.
> I like the benefits of enforced data types, powerful searching,
> data integrity, etc.
> But I worry a bit about the "safety" of my data, residing
> in a big scary database, instead of a simple friendly
> folder-based files system.
>
> I ran across this quote on Wikipedia at
> http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
>
> "Text files are also much safer than databases, in that should disk
> corruption occur, most of the mail is likely to be unaffected, and any
> that is damaged can usually be recovered."
>
> How naive (optimistic?) is it to think that "the database" can
> replace "the filesystem"?

Text file are *simple*.  When fsck repairs the disk and creates a
bunch of recovery files, just fire up $EDITOR (or cat, for that
matter) and piece your text files back together.  You may lose a
block of data, but the rest is there, easy to read.

Database files are *complex*.  Pointers and half-vacuumed freespace
and binary fields and indexes and WALs, yadda yadda yadda.  And, by
design, it's all got to be internally consistent.  Any little
corruption and *poof*, you've lost a table.  A strategically placed
corruption and you've lost your database.

But... that's why database vendors create backup/restore commands.

You *do* back up your database(s), right??????

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG4D2nS9HxQb37XmcRAg73AKCD321T0u7lux0K2NBhkpQ4kwBjOwCfWh3D
WDuns1HAZboUPlraTzbE0oo=
=NuLE
-----END PGP SIGNATURE-----

Re: a provocative question?

От
Tom Lane
Дата:
"TJ O'Donnell" <tjo@acm.org> writes:
> I ran across this quote on Wikipedia at
> http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
> "Text files are also much safer than databases, in that should disk
> corruption occur, most of the mail is likely to be unaffected, and any
> that is damaged can usually be recovered."

This is mostly FUD.  You can get data out of a damaged database, too.
(I'd also point out that modern filesystems are nearly as complicated
as databases --- try getting your "simple" text files back if the
filesystem metadata is fried.)

In the end there is no substitute for a good backup policy...

            regards, tom lane

Re: a provocative question?

От
Kenneth Downs
Дата:
Tom Lane wrote:
"TJ O'Donnell" <tjo@acm.org> writes: 
I ran across this quote on Wikipedia at
http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
"Text files are also much safer than databases, in that should disk 
corruption occur, most of the mail is likely to be unaffected, and any 
that is damaged can usually be recovered."   

Should probably insert as well the standard disclaimer about Wikipedia.  Great source of info, but that particular sentence has not been corrected yet by the forces-that-dictate-everything-ends-up-correct-sooner-or-later to point out the design trade-offs between simple systems like files (or paper for that matter) vs more complex but safer systems such as databases.

And no, I wont write it.... :)

This is mostly FUD.  You can get data out of a damaged database, too.
(I'd also point out that modern filesystems are nearly as complicated
as databases --- try getting your "simple" text files back if the
filesystem metadata is fried.)

In the end there is no substitute for a good backup policy...
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster 


-- 
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com    www.andromeda-project.org
631-689-7200   Fax: 631-689-0527
cell: 631-379-0010

Re: a provocative question?

От
Chris Browne
Дата:
tjo@acm.org ("TJ O'Donnell") writes:
> I am getting in the habit of storing much of my day-to-day
> information in postgres, rather than "flat" files.
> I have not had any problems of data corruption or loss,
> but others have warned me against abandoning files.
> I like the benefits of enforced data types, powerful searching,
> data integrity, etc.
> But I worry a bit about the "safety" of my data, residing
> in a big scary database, instead of a simple friendly
> folder-based files system.
>
> I ran across this quote on Wikipedia at
> http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
>
> "Text files are also much safer than databases, in that should disk
> corruption occur, most of the mail is likely to be unaffected, and any
> that is damaged can usually be recovered."
>
> How naive (optimistic?) is it to think that "the database" can
> replace "the filesystem"?

There is certainly some legitimacy to the claim; the demerits of
things like the Windows Registry as compared to "plain text
configuration" have been pretty clear.

If the "monstrous fragile binary data structure" gets stomped on, by
any means, then you can lose data in pretty massive and invisible
ways.  It's most pointedly true if the data representation conflates
data and indexes in some attempt to "simplify" things by having Just
One File.  In such a case, if *any* block gets corrupted, that has the
potential to irretrievably destroy the database.

However, the argument may also be taken too far.

-> A PostgreSQL database does NOT assemble data into "one monstrous
   fragile binary data structure."

   Each table consists of data files that are separate from index
   files.  Blowing up an index file *doesn't* blow up the data.

-> You are taking regular backups, right???

   If you are, that's a considerable mitigation of risks.  I don't
   believe it's typical to set up off-site backups of one's Windows
   Registry, in contrast...

-> In the case of PostgreSQL, mail stored in tuples is likely to get
   TOASTed, which changes the shape of things further; the files get
   smaller (due to compression), which changes the "target profile"
   for this data.

-> In the contrary direction, storing the data as a set of files, each
   of which requires storing metadata in binary filesystem data
   structures provides an (invisible-to-the-user) interface to
   what is, no more or less, than a "monstrous fragile binary data
   structure."

   That is, after all, what a filesystem is, if you strip out the
   visible APIs that turn it into open()/close()/mkdir() calls.

   If the wrong directory block gets "crunched," then /etc could get
   munched just like the Windows Registry could.

Much of the work going into filesystem efforts, the last dozen years,
is *exceeding* similar to the work going into managing storage in
DBMSes.  People working in both areas borrow from each other.

The natural result is that they live in fairly transparent homes in
relation to one another.  Someone who "casts stones" of the sort in
your quote is making the fallacious assumption that since the fact
that a filesystem is a database of file information is kept fairly
much invisible, that a filesystem is somehow fundamentally less
vulnerable to the same kinds of corruptions.

Reality is that they are vulnerable in similar ways.

The one thing I could point to, in Eudora, as a *further* visible
merit that DOES retain validity is that there is not terribly much
metadata entrusted to the filesystem.  Much the same is true for the
Rand MH "Mail Handler", where each message is a file with very little
filesystem-based metadata.

If you should have a filesystem failure, and discover you have a
zillion no-longer-named in lost+found, and decline to recover from a
backup, it should nonetheless be possible to re-process them through
any mail filters, and rebuild a mail filesystem that will appear
roughly similar to what it was like before.

That actually implies that there is *more* "conservatism of format"
than first meets the eye; in effect, the data is left in raw form,
replete with redundancies that can, in order to retain the ability to
perform this recovery process, *never* be taken out.

There is, in effect, more than meets the eye here...
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://linuxfinances.info/info/advocacy.html
"Lumping configuration data,  security data, kernel tuning parameters,
etc. into one monstrous fragile binary data structure is really dumb."
- David F. Skoll

Re: a provocative question?

От
"Trevor Talbot"
Дата:
There's also a point in regard to how modifications are made to your
data store.  In general, things working with text files don't go to
much effort to maintain durability like a real database would.  The
most direct way of editing a text file is to make all the changes in
memory, then write the whole thing out.  Some editors make backup
files, or use a create-delete-rename cycle, but they won't necessarily
force the data to disk -- if it's entirely in cache you could end up
losing the contents of the file anyway.

In the general case on the systems I work with, corruption is a
relatively low concern due to the automatic error detection and
correction my disks perform, and the consistency guarantees of modern
filesystems.  Interruptions (e.g. crashes or power failures) are much
more likely, and in that regard the typical modification process of
text files is more of a risk than working with a database.

I've also had times where faulty RAM corrupted gigabytes of data on
disk due to cache churn alone.

It will always depend on your situation.  In both cases, you
definitely want backups just for the guarantees neither approach can
make.


[way off topic]
In regard to the Windows Registry in particular...

> There is certainly some legitimacy to the claim; the demerits of
> things like the Windows Registry as compared to "plain text
> configuration" have been pretty clear.

> -> You are taking regular backups, right???
>
>    If you are, that's a considerable mitigation of risks.  I don't
>    believe it's typical to set up off-site backups of one's Windows
>    Registry, in contrast...

Sometimes I think most people get their defining impressions of the
Windows Registry from experience with the Windows 9x line.  I'll
definitely agree that it was simply awful there, and there's much to
complain about still, but...

The Windows Registry in NT is an actual database, with a WAL,
structured and split into several files, replication of some portions
in certain network arrangements, redundant backup of key parts in a
local system, and any external storage or off-site backup system for
Windows worth its salt does, indeed, back it up.

It's been that way for about a decade.

Re: a provocative question?

От
Chris Browne
Дата:
quension@gmail.com ("Trevor Talbot") writes:
> There's also a point in regard to how modifications are made to your
> data store.  In general, things working with text files don't go to
> much effort to maintain durability like a real database would.  The
> most direct way of editing a text file is to make all the changes in
> memory, then write the whole thing out.  Some editors make backup
> files, or use a create-delete-rename cycle, but they won't
> necessarily force the data to disk -- if it's entirely in cache you
> could end up losing the contents of the file anyway.

In the case of Eudora, if its filesystem access protocol involves
writing a new text file, and completing that before unlinking the old
version, then the risk of "utter destruction" remains fairly low
specifically because of the nature of access protocol.

> In the general case on the systems I work with, corruption is a
> relatively low concern due to the automatic error detection and
> correction my disks perform, and the consistency guarantees of
> modern filesystems.  Interruptions (e.g. crashes or power failures)
> are much more likely, and in that regard the typical modification
> process of text files is more of a risk than working with a
> database.

Error rates are not so low that it's safe to be cavalier about this.

> I've also had times where faulty RAM corrupted gigabytes of data on
> disk due to cache churn alone.

Yeah, and there is the factor that as disk capacities grow, the
chances of there being errors grow (more bytes, more opportunities)
and along with that, the number of opportunities for broken checksums
to match by accident also grow.  (Ergo "don't be cavalier" unless you
can be pretty sure that your checksums are getting more careful...)

> It will always depend on your situation.  In both cases, you
> definitely want backups just for the guarantees neither approach can
> make.

Certainly.

> [way off topic]
> In regard to the Windows Registry in particular...
>
>> There is certainly some legitimacy to the claim; the demerits of
>> things like the Windows Registry as compared to "plain text
>> configuration" have been pretty clear.
>
>> -> You are taking regular backups, right???
>>
>>    If you are, that's a considerable mitigation of risks.  I don't
>>    believe it's typical to set up off-site backups of one's Windows
>>    Registry, in contrast...
>
> Sometimes I think most people get their defining impressions of the
> Windows Registry from experience with the Windows 9x line.  I'll
> definitely agree that it was simply awful there, and there's much to
> complain about still, but...
>
> The Windows Registry in NT is an actual database, with a WAL,
> structured and split into several files, replication of some portions
> in certain network arrangements, redundant backup of key parts in a
> local system, and any external storage or off-site backup system for
> Windows worth its salt does, indeed, back it up.
>
> It's been that way for about a decade.

I guess I deserve that :-).

There is a further risk, that is not directly mitigated by backups,
namely that if you don't have some lowest common denominator that's
easy to recover from, you may not have a place to recover that data.

In the old days, Unix filesystems were sufficiently buggy corruptible
that it was worthwhile to have an /sbin partition, all statically
linked, generally read-only, and therefore seldom corrupted, to have
as a base for recovering the rest of the system.

Using files in /etc, for config, and /sbin for enough tools to recover
with, provided a basis for recovery.

In contrast, there is definitely risk to stowing all config in a DBMS
such that you may have the recursive problem that you can't get the
parts of the system up to help you recover it without having the DBMS
running, but since it's corrupted, you don't have the config needed to
get the system started, and so we recurse...
--
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
As of next Monday, TRIX will be flushed in favor of VISI-CALC.
Please update your programs.

Re: a provocative question?

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/06/07 20:45, Chris Browne wrote:
> quension@gmail.com ("Trevor Talbot") writes:
>> There's also a point in regard to how modifications are made to your
>> data store.  In general, things working with text files don't go to
>> much effort to maintain durability like a real database would.  The
>> most direct way of editing a text file is to make all the changes in
>> memory, then write the whole thing out.  Some editors make backup
>> files, or use a create-delete-rename cycle, but they won't
>> necessarily force the data to disk -- if it's entirely in cache you
>> could end up losing the contents of the file anyway.
>
> In the case of Eudora, if its filesystem access protocol involves
> writing a new text file, and completing that before unlinking the old
> version, then the risk of "utter destruction" remains fairly low
> specifically because of the nature of access protocol.

mbox is a monolithic file also, and you need to copy/delete,
copy/delete, yadda yadda yadda.  Just to do anything, you need 2x as
much free disk space as you biggest mbox file.  What a PITA.

mh and Maildir are, as has been partially mentioned, much more
efficient in that regard.

(Yes... mbox is an excellent transport format.)

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG4Nx3S9HxQb37XmcRAg+6AJ42gRm82MTmocxNC2hp3yQ9ZsFhQgCgoXVQ
i51vvPBwN2Qot2TUR9AjMBY=
=8WKX
-----END PGP SIGNATURE-----