Hi, everyone, I like your work very much and hope PostgreSQL can
grow into something competitive with Oracle just like Linux vs.
Windows.
I have background in relational database management system
research and I want to try to be a developer for PostgreSQL.
Right now I only try to be familiar with your code base. I
plan to start with a specific function module in the backend.
I'm thinking of /docs/pgsql/src/backend/executor because
I want to experiment with some new fast join algorithms.
My long term objective is to introduce materialized view
subsystem into PostgreSQL. Could anyone tell me if
the directory /docs/pgsql/src/backend/executor is the
right place to start or just give me some general suggestions
which are not in the FAQs? Oh one more thing I want to
mention is that those join algorithms I want to experiment
with may have some special data access paths similar to an index.
Further if it doesn't bother you much, could someone
answer the following question(s) for me? (Sorry if
some are already in the docs)
1. Does postgresql do raw storage device management or it relies on file system? My impression is no raw device. If
no, is it difficult to add it and possibly how?
2. Do you have standard benchmark results for postgresql? I guess not since it only implements a subset of SQL'92.
Whatabout subset of a benchmark or something repeatable?
3. Suppose I have added a new two rel. join algorithm, how would I proceed to compare the performance of it with the
exisitingtwo relation join algorithms under different senarios? Are there any existing facilities in the current code
basefor this purpose? Am I right that the available join algos implemented are nested loop join (including
index-based),hash join (which one? hybrid), sort-merge join?
4. Usually a single sequential pass of a large joining relation is preferred to random access in large join operation.
It's mostly because of the current disk access characteristics. Is it possible for me to do some benchmarking about
this using postgresql? What I'm actually asking are the issues about how to control the flow of data form disk to
buffers, how to stop file system interference and how to arrange actual data placement on the disk.
Sorry again if I'm not clear with my questions. I'd like
to further explain them if necessary.
thanks for any help
xun