Re: Bunch o' dead code in GEQO

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Bunch o' dead code in GEQO
Дата
Msg-id 12735.1074790315@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Bunch o' dead code in GEQO  ("scott.marlowe" <scott.marlowe@ihs.com>)
Список pgsql-hackers
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Thu, 22 Jan 2004, Tom Lane wrote:
>> I'm assuming that the original author of the GEQO code already did that
>> testing ...

> Hmmm.  I was figuring he wasn't sure so he left them in for other people 
> to test.  Is this a part of the code that eats up much time, or something 
> simple and fast that isn't part of the "GEQO takes 8 seconds to plan" 
> problem?

Well, the basic plan of the GEQO code is

Step 1: generate a bunch of possible join paths at random.

Step 2: randomly select a pair of paths from the current population,
generate a new path that is some combination of these, and push it back
into the population, dropping the worst path from the population.
Repeat for a bunch of generations.

Step 3: take the best path in the final population.

The different recombination algorithms simply are different ways of
generating a "child" path given two "parent" paths in step 2.  Changing
them wouldn't affect the runtime noticeably at all --- the primary cost
is in evaluating each generated path, which is why the runtime is 
approximately the sum of the population size (step 1) and the number
of generations (step 2).  Possibly a different recombiner would give a
better chance of finding a good plan, but I'm unconvinced.  Arguably the
recombiners that are there are all wrong anyway, since they were all
invented to solve Traveling Salesman problems, which this is not quite.

The only way we can do much about the runtime is to reduce the default
population size.  With the current default parameters, a large query
will have population size 1024 and 400 generations, so about two-thirds
of the runtime is in generating the initial random population; if we
can't make a dent in that then we're not going to gain much.  The
question is what will this do to the average quality of the selected
plans.  One thing I'm thinking about is trying to improve the quality of
the initial population by immediately discarding any really bad plans
(for instance, use a heuristic that pays attention to which relations
are linked by WHERE clauses).
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "scott.marlowe"
Дата:
Сообщение: Re: Bunch o' dead code in GEQO
Следующее
От:
Дата:
Сообщение: Re: [GENERAL] tablespaces a priority for 7.5?