Re: About method of PostgreSQL's Optimizer

Поиск
Список
Период
Сортировка
От Jonah H. Harris
Тема Re: About method of PostgreSQL's Optimizer
Дата
Msg-id 36e6829205091409361035d32a@mail.gmail.com
обсуждение исходный текст
Ответ на Re: About method of PostgreSQL's Optimizer  (Pryscila B Guttoski <pryscila.lista@gmail.com>)
Ответы Re: About method of PostgreSQL's Optimizer  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Pryscila,

Step 2 is basically where you find the difference between a cost-based optimizer (CBO) and a rule-based optimizer (RBO).  A CBO is based on the computed execution cost of the query whereas an RBO uses more generalized heuristics.

Let's get an example of what you're proposing and see if we can work it out from there.

Say we have the following (this is a generalized CBO approach, not PostgreSQL specific):

Oracle's SCOTT.EMP table with cardinality of 1 million and an index on empno and ename.  For storage purposes say that the empno index takes up 3600 blocks, the ename index takes up 7800 blocks, and the table itself takes up 17000 blocks.  We'll also say that we have a 256 megabyte buffer cache of which we have cached 50% of the empno index, 10% of the ename index, and 5% of the emp table data.

A user then issues the following query:

SELECT empno, ename FROM emp;

A cost-based optimizer will see the following:
1. See that the query is a full table scan (FTS) and calculate the cost of retrieving all 17000 blocks from disk.
2. See that the query is a FTS and that it can retrieve all data from the indexes (11400 blocks) and join the data (which join algorithm?)

Without performing a breadth-first algorithm, how can one evaluate both options in a way that would allow you to perform heuristic transformations dynamically?  What transformation/heuristic/rule can you use?  A CBO implementation has to calculate the amount of I/O needed on each plan based on several statistics such as what's *potentially* in the cache, what's the access time for block I/O (including prefetching if the storage manager has it), and other factors.  If you could name a database that uses a best-first algorithm, such as A*, please send me the link to their docs; I'd be interested in reading the implementation.

As for using both in the same optimizer, I could only see an algorithm such as a customized-A* being used to planning *some* large queries.  The reason I say this is because the cost calculation, which would still need to be breadth-first, could calculate and cache the cost of most nodes thereby allowing you to possibly perform transformations at the tail of calculation.

As for references to query optimization possibly using best-first algorithms, I think I saw several types of algorithms used in work from a university query optimization engine.  I can't remember if it was Cornell, Stanford, or Wisconsin... I'll try and get you a link to their info.

-Jonah

On 9/14/05, Pryscila B Guttoski <pryscila.lista@gmail.com> wrote:
Hi Jonah,

Thank's for your email, I really appreciate your opinions.

Is it interesting to use both techniques? For example:
Given a query, an optimizer:
1. Generates one of the possible execution plans.
2. Does transformations on the original plan, based on rules and
heuristics, resulting in new alternative plans.
3. Evaluates the cost of generated plans by using statistics.
4. Keeps plans that have lower cost than the original plan
5. Repeat 2-4 over the new alternative plans.
What do you think about it? Are there any restrictions that I haven't seen?

About other method...
Have you heard about using PDDL ("Planning Domain Definition
Language") for query optimization?

[]'s
Pryscila


On 9/14/05, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> Pryscila,
>
>  While I haven't been too involved in the open source PostgreSQL optimizer,
> I have done some work on it and optimizers in other database systems.
>
>  Based on my work, it is my opinion that PostgreSQL, as-well-as other
> databases which use a cost-based optimizer, prefer a breadth-first algorithm
> because one cannot determine the "real" cost of each node at run-time
> without systematically examining all possibilities through calculation.
> This is the opposite of a rule-based optimizer which defines heuristics
> which can be evaulated by a best-first algorithm such as A*.
>
>  In a cost-based optimizer, the system must calculate the "cost" of each
> path based on data that changes during run-time including indexing,
> cardinality, tuple size, available memory, CPU usage, disk access times,
> etc.  To a cost-based optimizer, every query is unique and therefore cannot
> follow a weighted path in the same fashion.  I can certainly see A* being
> used in a rule-based optimizer but not in a real-time cost-based optimizer.
>
>  Perhaps Tom, Bruce, et al have more specifics on PostgreSQL's
> implementation.
>
>  -Jonah
>
>
>
>
> On 9/13/05, Pryscila B Guttoski <pryscila.lista@gmail.com> wrote:
> > Hello all!
> >
> > On my master course, I'm studying the PostgreSQL's optimizer.
> > I don't know if anyone in this list have been participated from the
> PostgreSQL's Optimizer development, but maybe someone can help me on this
> question.
> > PostgreSQL generates all possible plans of executing the query (using an
> almost exhaustive search), then gives a cost to each plan and finally the
> cheapest one is selected for execution.
> > There are other methods for query optimization, one of them is based on
> plan transformations (for example, using A-Star algorithm) instead of plan
> constructions used by PostgreSQL.
> > Does anyone know why this method was choosen? Are there any papers or
> researches about it?
> >
> > Thank's a lot,
> > Pryscila.
> >
>
>
>
> --
> Respectfully,
>
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> http://www.enterprisedb.com/
>



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Spinlocks, yet again: analysis and proposed patches
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Spinlocks, yet again: analysis and proposed patches