Re: Hadoop backend?

Поиск
Список
Период
Сортировка
От pi song
Тема Re: Hadoop backend?
Дата
Msg-id 1b29507a0902221418u4fcb57b9ub891b69efe516ccc@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hadoop backend?  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Hadoop backend?
Список pgsql-hackers
One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing.

Pi Song

On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Feb 21, 2009 at 9:37 PM, pi song <pi.songs@gmail.com> wrote:
> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
> There might be a bit of impedance to make them go together.
> However, I think it should a very good initiative to come up with ideas to
> be able to run postgres on distributed file system (doesn't have to be
> specific hadoop).

In theory, I think you could make postgres work on any type of
underlying storage you like by writing a second smgr implementation
that would exist alongside md.c.  The fly in the ointment is that
you'd need a more sophisticated implementation of this line of code,
from smgropen:

   reln->smgr_which = 0;   /* we only have md.c at present */

Logically, it seems like the choice of smgr should track with the
notion of a tablespace.  IOW, you might to have one tablespace that is
stored on a magnetic disk (md.c) and another that is stored on your
hypothetical distributed filesystem (hypodfs.c).  I'm not sure how
hard this would be to implement, but I don't think smgropen() is in a
position to do syscache lookups, so probably not that easy.

...Robert

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Hadoop backend?
Следующее
От: Adriano Lange
Дата:
Сообщение: Re: graph representation of data structures in optimizer