Re: Hadoop backend?
От | Markus Wanner |
---|---|
Тема | Re: Hadoop backend? |
Дата | |
Msg-id | 49A3036D.9060901@bluegap.ch обсуждение исходный текст |
Ответ на | Re: Hadoop backend? (Paul Sheer <paulsheer@gmail.com>) |
Список | pgsql-hackers |
Hi, Paul Sheer wrote: > This is not problem: Performance is a secondary consideration (at least > as far as the problem I was referring to). Well, if you don't mind your database running .. ehm.. creeping several orders of magnitudes slower, you might also be interested in Single-System Image Clustering Systems [1], like Beowulf, Kerrighed [2], OpenSSI [3], etc.. Besides distributed filesystems, those also provide transparent shared memory across nodes. > The primary usefulness is to have the data be a logical entity rather > than a physical entity so that one can maintain physical machines > without having to worry to much about where-is-the-data. There are lots of solutions offering that already. In what way should Hadoop be better than any of those existing ones? For a slightly different example, you can get equivalent functionality on the block devices layer with DRBD [4], which is in successful use for Postgres as well. The main challenge with distributed filesystems remains reliable failure detection and ensuring that only exactly one node is alive at any time. > At the moment, most databases suffer from the problem of occasionally > having to move the data from one place to another. This is a major > nightmare that happens once every few years for most DBAs. > It happens because a system needs a soft/hard upgrade, or a disk > enlarged, or because a piece of hardware fails. You are comparing to standalone nodes here, which doesn't make much sense, IMO. > I have also found it's no use having RAID or ZFS. Each of these ties > the data to an OS installation. If the OS needs to be reinstalled, all > the data has to be manually moved in a way that is, well... dangerous. I'm thinking more of Lustre, GFS, OCFS, AFS or some such. Compare with those! > If there is only one machine running postgres that is fine too: I can have > a second identical machine on standby in case of a hardware failure. > That means a short amount of downtime - most people can live > with that. What most people have trouble with is a master that revives and suddenly confuses the new master (old slave). > I read somewhere that replication was one of the goals of postgres's > coming development efforts. Personally I think hadoop might be > a better solution - *shrug*. I'm not convinced at all. The trouble is not the replication of the data on disk, it's rather the data in shared memory which poses the hard problems (locks, caches, etc..). The former is solved already, the later is a tad harder to solve. See [5] for my approach (showing my bias). What I do find interesting about Hadoop is the MapReduce approach, but lots more than writing another "storage backend" is required, if you want to make use of that for Postgres. Greenplum claims to have implemented MapReduce for their Database [6], however, to me it looks like that is working a couple of layers above the filesystem. Regards Markus Wanner [1]: Wikipedia: Single-System Image Clustering http://en.wikipedia.org/wiki/Single-system_image [2]: http://www.kerrighed.org/ [3]: http://www.openssi.org/ [4]: http://www.drbd.org/ [5]: Postgres-R: http://www.postgres-r.org/ [6]: Greenplum MapReduce http://www.greenplum.com/resources/mapreduce/
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: Okay to change TypeCreate() signature in back branches?
Следующее
От: Guillaume SmetДата:
Сообщение: Re: Okay to change TypeCreate() signature in back branches?