Re: GSoC 2012

Поиск
Список
Период
Сортировка
От Maxim Smyatkin
Тема Re: GSoC 2012
Дата
Msg-id CAAo+-C1P_+2XoTAvrU_AXNe73npEVGzK-hx-eAz7o8DS60SCog@mail.gmail.com
обсуждение исходный текст
Ответ на GSoC 2012  (Maxim Smyatkin <smyatkinmaxim@gmail.com>)
Ответы Re: GSoC 2012  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-students
Hello all, again.

Finally, I have made a decision to work on Firebird FDW. I'm sorry for length of the letter, here is a list of content (if you wont read whole the message):
1. What I was doing during this time (from my first message to this one) and Why I finally decided to implement Firebird FDW?
2. What i want from Firebird FDW?
3. Specific questions I have to solve before proposing quantifiable results and schedule.

1. There is many interesting projects in TODO list, but I have not found any with which I am at least on 50% familiar as I am with Firebird's architecture or with Firebird's API, so I can't assure the completeness of such projects. Furthermore, I still remember how I was digging into internals of another projects and I am not as familiar with PostgreSQL that I have to be for doing something in its Core. So, I started looking for information about FDWs. I have done:
1) I got PostgreSQL source code and built it on Ubuntu. I opened project using Eclipse and I tried to debug forked postgresql process with client connected to it, to be sure that it works fine.
2) I used file_fdw as client, looked at its sources to get first impression about FDW internals and also tried to debug it, to be sure that I understand how to do it within Eclipse.
3) I have read client's and developer's documentation about FDWs; looked at FDW list on wiki (to choose something more complex than file_fdw to study and to estimate complexity of Firebird's one); Have read 2011 mail archive with FDW questions and looked at 2 presentations.
4) Tried to use mysql_fdw, but as I understood it uses old FDW API, so I downloaded Oracle_fdw to be familiar with modern API. Unfortunaly, Oracle-xe was not working on my Ubuntu (Probably, it works well, but I have to spend more time to set it up. May be I'll return to it, if it will be necessary). Anyway I was studying source code, readme and changelog and I found there several interesting optimizations, such as where push-down and connection pool (in library cache). At this moment my interest in Firebird FDW strongly grew up :) and I started to think about "quantifiable results".

2. Of course, I want Firebird FDW to be as good as Oracle's one is:) But, I cant be sure, that I'll complete it all in 3-4 month. So I have to choose most important of features to be implemented first. Next list contains features I want to implement marked as "+" for most important features and "?" for features I can implement later:
+ Types compatibility should be as full as it is possible to implement.
+ Translate most common Firebird SQL's and API's errors to PostgreSQL's analogues.
+ Plan and Cost output [3.1].
? Connections pool optimization (as it is in Oracle).
? Predicates push-down (may be not only "where", to handle situations like it was described here: http://archives.postgresql.org/pgsql-students/2011-03/msg00036.php).
? Field compatibility (link fields not only on their position, but on type, name, and some other properties [3.2]).

3. Here is several (2 at this moment) questions I have to solve before cost estimation and schedule creation can be done:
1) Firebird shows very simple query plan without cardinality or selectivity values. Furthermore, it even does not maintain it internally (as I know it can be implemented in Firebird 3, but anyway I have to think about cost estimation depending on lack of statistics). So. probably, I have to get cardinality as Count(...) and then let PostgreSQL to estimate cost and selectivity based on heuristic rules. I'm pretty sure in positive answer, but "Can PostgreSQL do it?"
2) According to http://people.planetpostgresql.org/andrew/uploads/fdw2.pdf FKs, PKs, Constraints and defaults are not implemented because PostgreSQL can't manage them on foreign tables. NOT NULL is maintained because it can't be changed in foreign DBMS, so we can be sure in it (actually field can be recreated, so we can't be sure. Btw is it taken into account in Oracle FDW?), if I am right. But we can map all of these field properties from Firebird's metadata to PostgreSQL's (FK only between foreign tables, not between servers). To check it for every user's query is too high prise for such mapping, but (I am not sure about it, because we have something like read-only transactions) as Firebird implements multi-version transaction system - we can be sure that in context of one transaction this values cant be changed. And we can track it inside of connection pool. We can get several advantages from it:
- It will be necessary if FDWs will be not read-only in future.
- I think, PostgreSQL optimizer can make better decisions based on constraints sometimes?
- And, finally, it can be useful for database user/client to know a character of data.


--
Thank you!
Smyatkin Maxim.

В списке pgsql-students по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: Regarding GSoC 2012 project
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: GSoC 2012