> I have background in relational database management system
> research and I want to try to be a developer for PostgreSQL.
> Right now I only try to be familiar with your code base. I
> plan to start with a specific function module in the backend.
> I'm thinking of /docs/pgsql/src/backend/executor because
> I want to experiment with some new fast join algorithms.
> My long term objective is to introduce materialized view
> subsystem into PostgreSQL. Could anyone tell me if
> the directory /docs/pgsql/src/backend/executor is the
> right place to start or just give me some general suggestions
> which are not in the FAQs? Oh one more thing I want to
> mention is that those join algorithms I want to experiment
> with may have some special data access paths similar to an index.
Good.
>
> Further if it doesn't bother you much, could someone
> answer the following question(s) for me? (Sorry if
> some are already in the docs)
> 1. Does postgresql do raw storage device management or it relies
> on file system? My impression is no raw device. If no,
> is it difficult to add it and possibly how?
No, only file system. We don't see much advantage to raw i/o.
> 2. Do you have standard benchmark results for postgresql?
> I guess not since it only implements a subset of SQL'92.
> What about subset of a benchmark or something repeatable?
We do the Wisconsin. I think it is in the source tree.
> 3. Suppose I have added a new two rel. join algorithm, how
> would I proceed to compare the performance of it with
> the exisiting two relation join algorithms under
> different senarios? Are there any existing facilities
> in the current code base for this purpose? Am I right
> that the available join algos implemented are nested loop
> join (including index-based), hash join (which one? hybrid),
> sort-merge join?
You can control the join types used with flags to postgres. Very easy.
> 4. Usually a single sequential pass of a large joining relation
> is preferred to random access in large join operation.
> It's mostly because of the current disk access characteristics.
> Is it possible for me to do some benchmarking about this
> using postgresql? What I'm actually asking are the issues about
> how to control the flow of data form disk to buffers,
> how to stop file system interference and how to arrange
> actual data placement on the disk.
Good idea. We deal with this regularly in deciding to use an index in
the optimizer or a sequential scan. Our optimizer is quite good.
-- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026