Re: [HACKERS] [hackers]development suggestion needed

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: [HACKERS] [hackers]development suggestion needed
Дата
Msg-id 38810C3C.F15758B3@tm.ee
обсуждение исходный текст
Ответ на [hackers]development suggestion needed  (xun@cs.ucsb.edu (Xun Cheng))
Ответы Re: [HACKERS] hybrid hash, cont. of development suggestion needed  (Xun Cheng <xun@cs.ucsb.edu>)
Список pgsql-hackers
Tom Lane wrote:
> 
> xun@cs.ucsb.edu (Xun Cheng) writes:
> > I want to experiment with some new fast join algorithms.
> 
> Cool.  Welcome aboard!
> 
> > Could anyone tell me if
> > the directory /docs/pgsql/src/backend/executor is the
> > right place to start
> 
> The executor is only half the problem: you must also teach the
> planner/optimizer how and when to use the new join type.
> 
> Hiroshi Inoue has recently been down this path (adding support
> for TID-based scans), and might be able to give you more specific
> advice.
> 
> > 1. Does postgresql do raw storage device management or it relies
> >    on file system? My impression is no raw device. If no,
> >    is it difficult to add it and possibly how?
> 
> Postgres uses Unix files.  We have avoided raw-device access mostly on
> grounds of portability.  To persuade people that such a change should go
> into the distribution, you'd need to prove that *significantly* better
> performance is obtained with raw access.  I for one don't think it's a
> foregone conclusion; Postgres gets considerable benefit from sitting
> atop Unix kernel device schedulers and disk buffer caches.
> 
> As far as the actual implementation goes, the low level access methods
> go through a "storage manager" switch that was intended to allow for
> the addition of a new storage manager, such as a raw-device manager.
> So you could get a good deal of stuff working by implementing code that
> parallels md.c/fd.c.  The main problem at this point is that there is a
> fair amount of utility code that goes out and does its own manipulation
> of the database file structure.  You'd need to clean that up by pushing
> it all down below the storage manager switch (inventing new storage
> manager calls as needed).
> 
> >    that the available join algos implemented are nested loop
> >    join (including index-based), hash join (which one? hybrid),
> >    sort-merge join?
> 
> Right.  The hash join uses batching if it estimates that the relation
> is too large to fit in memory; is that what you call "hybrid"?

I've heard the word "hybrid" being used of a scheme where you hash each 
key of a multi-key index separately and then concatenate the hashes for 
the index. That way you can use the index for accessing also subsets of 
keys by examining only the buxkets with matching hash sections.

Does postgres do it even when generating the keys ?

I'd guess it does, as each hashable type has a hashing function.

OTOH pg probably does not use it for finding by the 3rd field of index ?

--------
Hannu


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Oliver Elphick"
Дата:
Сообщение: Re: [HACKERS] Problem with foreign keys and inheritance
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] flex