Re: Hadoop backend?

Поиск
Список
Период
Сортировка
От Hans-Jürgen Schönig
Тема Re: Hadoop backend?
Дата
Msg-id 136182E5-BC7E-4AEB-A2E8-4C225B2F9095@cybertec.at
обсуждение исходный текст
Ответ на Re: Hadoop backend?  (pi song <pi.songs@gmail.com>)
Список pgsql-hackers
hi ...

i think the easiest way to do this is to simply add a mechanism to functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data again" in a way which is good enough good enough for joining with the function providing the data from hadoop.

hannu ( I think) brought up some concept as well some time ago.

i think a straight forward implementation would not be too hard.

best regards,

hans



On Feb 22, 2009, at 3:37 AM, pi song wrote:

1) Hadoop file system is very optimized for mostly read operation
2) As of a few months ago, hdfs doesn't support file appending.

There might be a bit of impedance to make them go together.

However, I think it should a very good initiative to come up with ideas to be able to run postgres on distributed file system (doesn't have to be specific hadoop).

Pi Song

On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer@gmail.com> wrote:
Hadoop backend for PostGreSQL....

A problem that my client has, and one that I come across often,
is that a database seems to always be associated with a particular
physical machine, a physical machine that has to be upgraded,
replaced, or otherwise maintained.

Even if the database is replicated, it just means there are two or
more machines. Replication is also a difficult thing to properly
manage.

With a distributed data store, the data would become a logical
object - no adding or removal of machines would affect the data.
This is an ideal that would remove a tremendous maintenance
burden from many sites ---- well, at least the one's I have worked
at as far as I can see.

Does anyone know of plans to implement PostGreSQL over Hadoop?

Yahoo seems to be doing this:
     http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html

But they store tables column-ways for their performance situation.
If one is doing a lot of inserts I don't think this is most efficient - ?

Has Yahoo put the source code for their work online?

Many thanks for any pointers.

-paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: some broken on pg_stat_user_functions
Следующее
От: Martin Pihlak
Дата:
Сообщение: Re: some broken on pg_stat_user_functions