Обсуждение: Proposed cleanup of index-related planner estimation procedures

Поиск

Список

Период

Сортировка

Proposed cleanup of index-related planner estimation procedures

От

Tom Lane

Дата:

05 января 2000 г., 23:25:23

I am thinking about redefining and simplifying the planner's interface
to index-type-dependent estimation routines.

Currently, each defined index type is supposed to supply two routines:
an "amopselect" routine and an "amopnpages" routine.  (The existing
actual routines of this kind are btreesel, btreenpages, etc in
src/backend/utils/adt/selfuncs.c.)  These things are called by
index_selectivity() in src/backend/optimizer/util/plancat.c.  amopselect
tries to determine the selectivity of an indexqual (the fraction of
main-table tuples it will select) and amopnpages tries to determine
the number of index pages that will be read to do it.

Now, this collection of code is largely redundant with
optimizer/path/clausesel.c, which also tries to estimate the selectivity
of qualification conditions.  Furthermore, the interface to these
routines is fundamentally misdesigned, because there is no way to deal
with interrelated qualification conditions --- for example, if we have
a range query like "... WHERE x > 10 AND x < 20", the code estimates
the selectivity as the product of the selectivities of the two terms
independently, but the actual selectivity is very different from that.
I am working on fixing clausesel.c to be smarter about correlated
conditions, but it won't do much good to fix that code without fixing
the index-related code.

What I'm thinking about doing is replacing these two per-index-type
routines with a single routine, which is called once per proposed
indexscan rather than once per qual clause.  It would receive the
whole indexqual list as a parameter, instead of just one qual.
A typical implementation would just call clausesel.c's general-purpose
code to estimate the selectivity, and then do a little bit of extra
work to derive the estimated number of index pages from that number.

I suppose the original reason for having amopselect at all was to allow
exploitation of index-specific knowledge during selectivity estimation
--- but none of the existing index types actually provide any such
knowledge in their amopselect routines.  Still, this redesign preserves
the flexibility for an index type to do something specialized.

A possible objection to this scheme is that the inputs and outputs
of these routines would be structs that aren't full-fledged SQL types
(and no, I'm not willing to promote parser expression trees into an
SQL type ;-)).  But I don't think that's a real problem.  No one is
going to be inventing new index types without doing a lot of C coding,
so having to write the amopselect routines in C doesn't seem like a
big drawback.

Comments, objections, better ideas?
        regards, tom lane

Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures

От

Bernard Adrian Frankpitt

Дата:

06 января 2000 г., 12:59:32

Tom Lane wrote:
> 
> I am thinking about redefining and simplifying the planner's interface
> to index-type-dependent estimation routines.

Good Idea. I looked at that code quite closely in the past year, and I
agree that, though harmless, much of it seems bogus --- or at least not
well motivated by thorough analysis.  One reason for this is that the
problem of
computing index search costs is very difficult, the theoretical papers
that I could find on the subject are not terribly satisfactory. 
Currently only equals operators for hash and linear comparison operators
for nbtree really implement anything, the other access method operator
classes merely copy the nbtree operator algorithms.

> 
> What I'm thinking about doing is replacing these two per-index-type
> routines with a single routine, which is called once per proposed
> indexscan rather than once per qual clause.  It would receive the
> whole indexqual list as a parameter, instead of just one qual.
> A typical implementation would just call clausesel.c's general-purpose
> code to estimate the selectivity, and then do a little bit of extra
> work to derive the estimated number of index pages from that number.
> 
> I suppose the original reason for having amopselect at all was to allow
> exploitation of index-specific knowledge during selectivity estimation
> --- but none of the existing index types actually provide any such
> knowledge in their amopselect routines.  Still, this redesign preserves
> the flexibility for an index type to do something specialized.
>

Good, I have a special access method that needs to subvert the normal
optimizer in a way that ensures an index scan is used for every heap
access predicated on a particular attribute.  The reason for this is
that the data (representations of cells in a partition of a high
dimensional space) that would normally sit in heap tuples are actually
distributed through the index structure.  A query predicated on a
property of my special attributes needs to be executed with an index
scan, so I subvert the optimizer by coding amopselect and amopnpages to
always give zero cost for an index scan.  Since index scans are always
considered first, this hairy hack works. I would certainly breath more
easily at each new release of PostgreSQl if the ability of the system to
support this type of hack were a recognized feature.

> 
> A possible objection to this scheme is that the inputs and outputs
> of these routines would be structs that aren't full-fledged SQL types
> (and no, I'm not willing to promote parser expression trees into an
> SQL type ;-)).  But I don't think that's a real problem.  No one is
> going to be inventing new index types without doing a lot of C coding,

Yes, a lot of C-code, I have 10K lines in my access method.  What is
remarkable about Postgresql is that the interface between index access
methods and the rest of the system is so clean that this sort of project
is feasible. 

> so having to write the amopselect routines in C doesn't seem like a
> big drawback.

One thing that I have on my `really cool ideas' list would be to link
somehing like a Python interpreter and compiler into the backend.  one
would use the scripting language to write and debug stuff like this, and
then compile and dynamically link the debugged code into the backend. 
What I ended up doing in my index scheme was to code all the
mathematical algorithms in MATLAB and get them working there, and then
hand translate the MATLAB code to C (there is a lot of linear algebra in
the algorithms).  Debugging was a major pain.  With my idea you could
write the whole access method in a high-level language, and the low
level backend interfaces to things like buffer locking, MVCC and
postgresql memory management would be encapsulated by interfaces in the
high-level language. If there were something like that in PostgreSQL I
bet a lot more people would be rolling their own access methods.  I am
not sure that people who value stability over new features would see
this as a step in the right direction. ;-)

> 
> Comments, objections, better ideas?
> 
>                         regards, tom lane
> 
> ************

RE: [HACKERS] Proposed cleanup of index-related planner estimation procedures

От

"Hiroshi Inoue"

Дата:

06 января 2000 г., 19:05:37

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> 
> I am thinking about redefining and simplifying the planner's interface
> to index-type-dependent estimation routines.
> 
> Currently, each defined index type is supposed to supply two routines:
> an "amopselect" routine and an "amopnpages" routine.  (The existing
> actual routines of this kind are btreesel, btreenpages, etc in
> src/backend/utils/adt/selfuncs.c.)  These things are called by
> index_selectivity() in src/backend/optimizer/util/plancat.c.  amopselect
> tries to determine the selectivity of an indexqual (the fraction of
> main-table tuples it will select) and amopnpages tries to determine
> the number of index pages that will be read to do it.
> 
> Now, this collection of code is largely redundant with
> optimizer/path/clausesel.c, which also tries to estimate the selectivity
> of qualification conditions.  Furthermore, the interface to these
> routines is fundamentally misdesigned, because there is no way to deal
> with interrelated qualification conditions --- for example, if we have
> a range query like "... WHERE x > 10 AND x < 20", the code estimates
> the selectivity as the product of the selectivities of the two terms
> independently, but the actual selectivity is very different from that.
> I am working on fixing clausesel.c to be smarter about correlated
> conditions, but it won't do much good to fix that code without fixing
> the index-related code.
> 
> What I'm thinking about doing is replacing these two per-index-type
> routines with a single routine, which is called once per proposed
> indexscan rather than once per qual clause.  It would receive the
> whole indexqual list as a parameter, instead of just one qual.
> A typical implementation would just call clausesel.c's general-purpose
> code to estimate the selectivity, and then do a little bit of extra
> work to derive the estimated number of index pages from that number.
>

Seems good to me.

I have also been suspicious about per qual selectivity and have
another exmaple.
For the following queryselect * from .. where col1=val1 and col2=val2;

the selectivity is selectivity of (col1=val1) * selectivity of (col2=val2)
currently. But it's not right in many cases.

Though it's almost impossible to hold disbursions for all combination
of columns,it may be possible to hold multi-column disbursions for
multi-columns indexes,

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Proposed cleanup of index-related planner estimation procedures

Proposed cleanup of index-related planner estimation procedures

Re: [HACKERS] Proposed cleanup of index-related planner estimation procedures

RE: [HACKERS] Proposed cleanup of index-related planner estimation procedures