Обсуждение: Bitmap index scans use of filters on available columns

Поиск
Список
Период
Сортировка

Bitmap index scans use of filters on available columns

От
Jeff Janes
Дата:
create table f as select (random()*100)::int as x, md5(random()::text)
as y from generate_series(1,1000000);

create index on f (x, y);
analyze verbose f; --dont vacuum

explain select * from f where x=5 and y like '%abc%';
                                 QUERY PLAN
------------------------------------------------------------------------------Bitmap Heap Scan on f
(cost=382.67..9314.72rows=1 width=37)  Recheck Cond: (x = 5)  Filter: (y ~~ '%abc%'::text)  ->  Bitmap Index Scan on
f_x_y_idx (cost=0.00..382.67 rows=10433 width=0)        Index Cond: (x = 5)
 

Is there a fundamental reason the filter on y is not being applied to
the index scan, rather than the heap scan?  Should this be a to-do
item?  This could avoid a lot of heap block reads, and also might
prevent the bitmap from overflowing work_mem and turning lossy.

Cheers,

Jeff



Re: Bitmap index scans use of filters on available columns

От
Antonin Houska
Дата:
While prefix expression

y like 'abc%'

can be converted to

y >= 'abc'

(see expand_indexqual_opclause()), I'm not sure any kind of expansion is
possible for '%abc%' which would result in a b-tree searchable condition.

Jeff Janes <jeff.janes@gmail.com> wrote:

> Is there a fundamental reason the filter on y is not being applied to
> the index scan, rather than the heap scan?

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



Re: Bitmap index scans use of filters on available columns

От
Nicolas Barbier
Дата:
2015-11-04 Antonin Houska <ah@cybertec.at>:

> While prefix expression
>
> y like 'abc%'
>
> can be converted to
>
> y >= 'abc'
>
> (see expand_indexqual_opclause()), I'm not sure any kind of expansion is
> possible for '%abc%' which would result in a b-tree searchable condition.

I think the question is not about using the b-tree for checking the
condition, but about just retrieving the value for y from the index,
and just using that to check the condition before fetching the
corresponding tuple from the heap.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: Bitmap index scans use of filters on available columns

От
Tom Lane
Дата:
Nicolas Barbier <nicolas.barbier@gmail.com> writes:
> 2015-11-04 Antonin Houska <ah@cybertec.at>:
>> (see expand_indexqual_opclause()), I'm not sure any kind of expansion is
>> possible for '%abc%' which would result in a b-tree searchable condition.

> I think the question is not about using the b-tree for checking the
> condition, but about just retrieving the value for y from the index,
> and just using that to check the condition before fetching the
> corresponding tuple from the heap.

Bitmap index scans only return TID bitmaps, not index tuples; indeed
the underlying index may not store anything recognizable as tuples.
        regards, tom lane



Re: Bitmap index scans use of filters on available columns

От
Simon Riggs
Дата:
On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nicolas Barbier <nicolas.barbier@gmail.com> writes:
> 2015-11-04 Antonin Houska <ah@cybertec.at>:
>> (see expand_indexqual_opclause()), I'm not sure any kind of expansion is
>> possible for '%abc%' which would result in a b-tree searchable condition.

> I think the question is not about using the b-tree for checking the
> condition, but about just retrieving the value for y from the index,
> and just using that to check the condition before fetching the
> corresponding tuple from the heap.

Bitmap index scans only return TID bitmaps, not index tuples; indeed
the underlying index may not store anything recognizable as tuples.

Agreed, though the OP's question was asking why a Filter condition can't be added to a BitmapIndexScan, so that non-qualifying rows can be excluded from the TID bitmap it generates. 

We generate this plan

postgres=# explain select * from f where x=5 and y like '%abc%';

                                QUERY PLAN                                

--------------------------------------------------------------------------

 Index Scan using f_x_y_idx on f  (cost=0.42..26075.71 rows=209 width=37)
   Index Cond: (x = 5)
   Filter: (y ~~ '%abc%'::text)

So it should be possible to do the Filter condition on the BitmapIndexScan.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bitmap index scans use of filters on available columns

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We generate this plan
>  Index Scan using f_x_y_idx on f  (cost=0.42..26075.71 rows=209 width=37)
>    Index Cond: (x = 5)
>    Filter: (y ~~ '%abc%'::text)

> So it should be possible to do the Filter condition on the BitmapIndexScan.

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

In the case at hand, the planner should have considered a plan of this
shape as well.  Presumably it concluded it was more expensive than using
the bitmap approach.  Jeff might try "set enable_bitmapscan = 0" and
compare the estimated and actual costs.
        regards, tom lane



Re: Bitmap index scans use of filters on available columns

От
Simon Riggs
Дата:
On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We generate this plan
>  Index Scan using f_x_y_idx on f  (cost=0.42..26075.71 rows=209 width=37)
>    Index Cond: (x = 5)
>    Filter: (y ~~ '%abc%'::text)

> So it should be possible to do the Filter condition on the BitmapIndexScan.

You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.

Still misunderstanding each other... sorry about that

If a btree can Filter y like that on an IndexScan, then it can also apply that Filter on y when it is looking through rows before it adds them to the bitmap.

I completely understand that it cannot return the value (of y) via the bitmap.

We should be able to get a plan like this

explain select * from f where x=5 and y like '%abc%';

                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on f  (cost=382.67..9314.72 rows=1 width=37)
   ->  Bitmap Index Scan on f_x_y_idx  (cost=0.00..382.67 rows=10433 width=0)
         Index Cond: (x = 5) 
>>      Filter: (y ~~ '%abc%'::text)

if the bitmap stays non-lossy.

I see that plannodes.h says
"In a BitmapIndexScan plan node, the targetlist and qual fields are not used and are always NIL. "
but it doesn't say why.

Why can't the BitmapIndexScan execute arbitrary expressions? (or What did you mean by that phrase?)

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bitmap index scans use of filters on available columns

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're missing my point: that is possible in an indexscan, but *not* in a
>> bitmap indexscan, because the index AM APIs are totally different in the
>> two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
>> returned out to anyplace that could execute arbitrary expressions.

> Still misunderstanding each other... sorry about that

> If a btree can Filter y like that on an IndexScan, then it can also apply
> that Filter on y when it is looking through rows before it adds them to the
> bitmap.

btree applies no such filter in either case.  "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.
        regards, tom lane



Re: Bitmap index scans use of filters on available columns

От
Simon Riggs
Дата:
On 4 November 2015 at 16:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're missing my point: that is possible in an indexscan, but *not* in a
>> bitmap indexscan, because the index AM APIs are totally different in the
>> two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
>> returned out to anyplace that could execute arbitrary expressions.

> Still misunderstanding each other... sorry about that

> If a btree can Filter y like that on an IndexScan, then it can also apply
> that Filter on y when it is looking through rows before it adds them to the
> bitmap.

btree applies no such filter in either case.  "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.

I don't think anyone has argued that point, yet, we just don't have enough info to agree yet.

As Jeff said in his OP, "Is there a fundamental reason the filter on y is not being applied to
the index scan, rather than the heap scan?", which still stands.

Why would you resist? And/Or Why is that important?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bitmap index scans use of filters on available columns

От
Robert Haas
Дата:
On Wed, Nov 4, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You're missing my point: that is possible in an indexscan, but *not* in a
>>> bitmap indexscan, because the index AM APIs are totally different in the
>>> two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
>>> returned out to anyplace that could execute arbitrary expressions.
>
>> Still misunderstanding each other... sorry about that
>
>> If a btree can Filter y like that on an IndexScan, then it can also apply
>> that Filter on y when it is looking through rows before it adds them to the
>> bitmap.
>
> btree applies no such filter in either case.  "Filter" conditions are
> applied outside the index AM --- and yes, I will resist any proposal to
> relocate that responsibility.

I don't understand what you're arguing against here - as Simon I think
correctly points out, there seems to be a substantial performance
improvement possible here.  It's not so much that we would apply the
Filter conditions outside the index AM as that we would have two kinds
of Index Quals that could be enforced by the index AM.  Some quals
(such as x = 5) could be enforced by scanning only the relevant
portion of the index, while others (such as y like '%abc%') don't
abridge the portion of the index that needs to be scanned, but could
disqualify some index tuples before we go to the trouble of hitting
the heap.

The problem I see is this might involve testing quals against an index
tuple when the corresponding heap tuple isn't visible.  This probably
isn't safe except for leakproof quals, and there might be some other
problems as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bitmap index scans use of filters on available columns

От
Jeff Janes
Дата:
On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We generate this plan
>>  Index Scan using f_x_y_idx on f  (cost=0.42..26075.71 rows=209 width=37)
>>    Index Cond: (x = 5)
>>    Filter: (y ~~ '%abc%'::text)
>
>> So it should be possible to do the Filter condition on the BitmapIndexScan.
>
> You're missing my point: that is possible in an indexscan, but *not* in a
> bitmap indexscan, because the index AM APIs are totally different in the
> two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
> returned out to anyplace that could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined btree
ops (and ops of non-btree types in the case of other index types).
Does that only require a restricted execution environment?


> In the case at hand, the planner should have considered a plan of this
> shape as well.  Presumably it concluded it was more expensive than using
> the bitmap approach.  Jeff might try "set enable_bitmapscan = 0" and
> compare the estimated and actual costs.

It is a simplified test case, I can get it to do about anything I want
by tweaking the size, selectivity, columns selected, and cache warmth.

My understanding is that the normal (not index-only) index scan, which
is thought to be slower and actually is slower with a hot cache, also
applies the filter on the heap tuple, not the index tuple.  But since
it only matters much for large queries and large queries tend to
prefer bitmaps, I was less interested in that case.  And bitmaps can
feed into BitmapAnd, which is very helpful in avoiding index-creep
where every query gets a new index tailored for it.  I think we are
missing a trick here that could apply to lots of situations.

Cheers,

Jeff



Re: Bitmap index scans use of filters on available columns

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're missing my point: that is possible in an indexscan, but *not* in a
>> bitmap indexscan, because the index AM APIs are totally different in the
>> two cases.  In a bitmap scan, nothing more than a TID bitmap is ever
>> returned out to anyplace that could execute arbitrary expressions.

> I had thought it must already be able to execute arbitrary
> expressions, due to the ability to already support user-defined btree
> ops (and ops of non-btree types in the case of other index types).

No.  An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the key
restriction there is that the operator is one that the AM has volunteered
to support.  Well, actually, it's the opclass more than the AM that
determines this, but anyway it's not just some random operator; more than
likely, the AM and/or opclass has got special logic about the operator.

This also ties into Robert's point about evaluation of operators against
index entries for dead or invisible rows.  Indexable operators are much
less likely than others to have unexpected side-effects.
        regards, tom lane



Re: Bitmap index scans use of filters on available columns

От
Tomas Vondra
Дата:
Hi,

On 11/04/2015 11:32 PM, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You're missing my point: that is possible in an indexscan, but
>>> *not* in a bitmap indexscan, because the index AM APIs are
>>> totally different in the two cases. In a bitmap scan, nothing
>>> more than a TID bitmap is ever returned out to anyplace that
>>> could execute arbitrary expressions.
>
>> I had thought it must already be able to execute arbitrary
>> expressions, due to the ability to already support user-defined
>> btree ops (and ops of non-btree types in the case of other index
>> types).
>
> No. An index AM is only expected to be able to evaluate clauses of
> the form <indexed_column> <indexable_operator> <constant>, and the
> key restriction there is that the operator is one that the AM has
> volunteered to support. Well, actually, it's the opclass more than
> the AM that determines this, but anyway it's not just some random
> operator; more than likely, the AM and/or opclass has got special
> logic about the operator.

Isn't that pretty much exactly the point made by Jeff and Simon, that 
index AM is currently only allowed to handle the indexable operators, 
i.e. operators that it can explicitly optimize (e.g. use to walk the 
btree and such), and completely ignores the other operators despite 
having all the columns in the index. Which means we'll have to do the 
heap fetch, which usually means a significant performance hit.

>
> This also ties into Robert's point about evaluation of operators
> against index entries for dead or invisible rows. Indexable operators
> are much less likely than others to have unexpected side-effects.

I certainly understand there are cases that require care - like the 
leakproof thing pointed out by Robert for example. I don't immediately 
see why evaluation against dead rows would be a problem.

But maybe we can derive a set of rules required from the operators? Say 
only those marked as leakproof when RLS is enabled on the table, and 
perhaps additional things.

A "bruteforce" way would be to extend each index AM with every possible 
operator, but that's not quite manageable I guess. But why couldn't we 
provide a generic infrastructure that would allow filtering "safe" 
expressions and validating them on an index tuple?


kind regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bitmap index scans use of filters on available columns

От
Robert Haas
Дата:
On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I certainly understand there are cases that require care - like the
> leakproof thing pointed out by Robert for example. I don't immediately see
> why evaluation against dead rows would be a problem.

Well, one thing is that you might leak information about
already-deleted rows, which could be a security vulnerability, or more
mundanely cause a function to error out when there are no actually
visible rows that could trigger such an error.  It would be surprising
if you could add a CHECK(b != 0) constraint to a table, query it for
rows where a/b>1, and get a division by zero error.

But I think it's even worse: I believe there can be dead rows in the
heap whose tuple descriptor doesn't even match the current
pg_attribute contents.  Consider BEGIN; ALTER TABLE ... ADD COLUMN ...
int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT
...  If the SELECT tries to decode one of the tuples added by the
failed transaction considering the new column as text when the dead
tuples in question had it as an int8, I suspect that you can crash the
server.  Nothing good will happen, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bitmap index scans use of filters on available columns

От
Tomas Vondra
Дата:
Hi,

On 11/05/2015 06:51 PM, Robert Haas wrote:
> On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> I certainly understand there are cases that require care - like the
>> leakproof thing pointed out by Robert for example. I don't immediately see
>> why evaluation against dead rows would be a problem.
>
> Well, one thing is that you might leak information about
> already-deleted rows, which could be a security vulnerability, or more
> mundanely cause a function to error out when there are no actually
> visible rows that could trigger such an error.  It would be surprising
> if you could add a CHECK(b != 0) constraint to a table, query it for
> rows where a/b>1, and get a division by zero error.

Yes, I guess we don't have this problem when evaluating the expression 
on heap because we get to check visibility first, and after moving the 
expression to the index we can't do that.

But then again, can we come up with a way to distinguish operators that 
are safe to evaluate on indexes - either automatic or manual? We already 
do that with the indexable operators with explicitly listing them in the 
opclass, and we also explicitly use the LEAKPROOF for a similar thing. I 
don't think extending the opclass is a really good solution, but why not 
to invent another *PROOF flag?


> But I think it's even worse: I believe there can be dead rows in the
> heap whose tuple descriptor doesn't even match the current
> pg_attribute contents.  Consider BEGIN; ALTER TABLE ... ADD COLUMN ...
> int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT
> ...  If the SELECT tries to decode one of the tuples added by the
> failed transaction considering the new column as text when the dead
> tuples in question had it as an int8, I suspect that you can crash the
> server.  Nothing good will happen, anyway.

But that is about heap, and we're discussing indexes here, right? I 
don't think you can break the index descriptor like this, because 
otherwise we'd already have problems with that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bitmap index scans use of filters on available columns

От
Robert Haas
Дата:
On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>> Well, one thing is that you might leak information about
>> already-deleted rows, which could be a security vulnerability, or more
>> mundanely cause a function to error out when there are no actually
>> visible rows that could trigger such an error.  It would be surprising
>> if you could add a CHECK(b != 0) constraint to a table, query it for
>> rows where a/b>1, and get a division by zero error.
>
> Yes, I guess we don't have this problem when evaluating the expression on
> heap because we get to check visibility first, and after moving the
> expression to the index we can't do that.

Right.

> But then again, can we come up with a way to distinguish operators that are
> safe to evaluate on indexes - either automatic or manual? We already do that
> with the indexable operators with explicitly listing them in the opclass,
> and we also explicitly use the LEAKPROOF for a similar thing. I don't think
> extending the opclass is a really good solution, but why not to invent
> another *PROOF flag?

I think LEAKPROOF is probably fine for this.  How would the new thing
be different?

> But that is about heap, and we're discussing indexes here, right? I don't
> think you can break the index descriptor like this, because otherwise we'd
> already have problems with that.

Oh, right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bitmap index scans use of filters on available columns

От
Amit Kapila
Дата:
On Thu, Nov 5, 2015 at 7:45 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

On 11/04/2015 11:32 PM, Tom Lane wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You're missing my point: that is possible in an indexscan, but
*not* in a bitmap indexscan, because the index AM APIs are
totally different in the two cases. In a bitmap scan, nothing
more than a TID bitmap is ever returned out to anyplace that
could execute arbitrary expressions.

I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined
btree ops (and ops of non-btree types in the case of other index
types).

No. An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the
key restriction there is that the operator is one that the AM has
volunteered to support. Well, actually, it's the opclass more than
the AM that determines this, but anyway it's not just some random
operator; more than likely, the AM and/or opclass has got special
logic about the operator.

Isn't that pretty much exactly the point made by Jeff and Simon, that index AM is currently only allowed to handle the indexable operators, i.e. operators that it can explicitly optimize (e.g. use to walk the btree and such), and completely ignores the other operators despite having all the columns in the index. Which means we'll have to do the heap fetch, which usually means a significant performance hit.


This also ties into Robert's point about evaluation of operators
against index entries for dead or invisible rows. Indexable operators
are much less likely than others to have unexpected side-effects.

I certainly understand there are cases that require care - like the leakproof thing pointed out by Robert for example. I don't immediately see why evaluation against dead rows would be a problem.


Apart from other problems discussed, I think it could also lead to
a performance penality for the cases when the qual condition is
costly as evaluating such a qual against lot of dead rows would be a
time consuming operation.  I am not sure, but I think some of the
other well know databases might not have any such problems as
they store visibility info inside index due to which they don't need to
fetch the heap tuple for evaluating such quals.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Bitmap index scans use of filters on available columns

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com>
> Apart from other problems discussed, I think it could also lead to
> a performance penality for the cases when the qual condition is
> costly as evaluating such a qual against lot of dead rows would be a
> time consuming operation.  I am not sure, but I think some of the
> other well know databases might not have any such problems as
> they store visibility info inside index due to which they don't need to
> fetch the heap tuple for evaluating such quals.

I was junst thinking of the same thing. Can we estimate the
degree of the expected penalty using heap statistics? Of couse
not so accurate though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Bitmap index scans use of filters on available columns

От
Tomas Vondra
Дата:
Hi,

On 11/05/2015 07:36 PM, Robert Haas wrote:
> On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>
>> But then again, can we come up with a way to distinguish operators that are
>> safe to evaluate on indexes - either automatic or manual? We already do that
>> with the indexable operators with explicitly listing them in the opclass,
>> and we also explicitly use the LEAKPROOF for a similar thing. I don't think
>> extending the opclass is a really good solution, but why not to invent
>> another *PROOF flag?
>
> I think LEAKPROOF is probably fine for this.  How would the new thing
> be different?

I don't think so - AFAIK "leakproof" is about not revealing information 
about arguments, nothing more and nothing less. It does not say whether 
it's safe to evaluate on indexes, and I don't think we should extend the 
meaning in this direction.

I find it perfectly plausible that there will be leakproof functions 
that can't be pushed to indexes, and that would not be possible with a 
single marker. Of course, "leakproof" may be a minimum requirement, but 
another marker is needed to actually enable pushing the function to index.

Of course, we may eventually realize that leakproof really is 
sufficient, and that we can push all leakproof functions to indexes. In 
that case we may deprecate the other marker and just use leakproof. But 
if we go just with leakproof and later find that we really need two 
markers because not all leakproof functions are not index-pushable, 
it'll be much harder to fix because it will cause performance 
regressions for the users (some of the expressions won't be pushed to 
indexes anymore).

But I think the marker is the last thing we need to worry about.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bitmap index scans use of filters on available columns

От
Robert Haas
Дата:
On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>> I think LEAKPROOF is probably fine for this.  How would the new thing
>> be different?
>
> I don't think so - AFAIK "leakproof" is about not revealing information
> about arguments, nothing more and nothing less. It does not say whether it's
> safe to evaluate on indexes, and I don't think we should extend the meaning
> in this direction.

That seems like a non-answer answer.

Clearly, if a function can leak information about it's arguments, for
example by throwing an error, we can't call it on tuples that might
not even be visible, or the behavior of the query might be change.  So
any function that is not leakproof is also not safe for this purpose.

Now that doesn't rule out the possibility that the functions for which
this optimization is safe are a subset of the leakproof functions -
but offhand I don't see why that should be the case.  The index tuple
is just a tuple, and the values it contains are just datums, no more
or less than in a heap tuple.  There could be a reason for concern
here, but saying that it might not be "safe to evaluate on indexes"
just begs the question: WHY wouldn't that be safe?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bitmap index scans use of filters on available columns

От
Tomas Vondra
Дата:
Hi,

On 11/07/2015 02:18 AM, Robert Haas wrote:
> On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>> I think LEAKPROOF is probably fine for this.  How would the new thing
>>> be different?
>>
>> I don't think so - AFAIK "leakproof" is about not revealing information
>> about arguments, nothing more and nothing less. It does not say whether it's
>> safe to evaluate on indexes, and I don't think we should extend the meaning
>> in this direction.
>
> That seems like a non-answer answer.

I'm not claiming I have an answer, really. My knowledge of leakproof 
stuff is a bit shallow. Also, I had a few beers earlier today, which 
does not really improve the depth of knowledge on any topic except 
politics and football (aka soccer). So you may be perfectly right.

> Clearly, if a function can leak information about it's arguments,
> for example by throwing an error, we can't call it on tuples that
> might not even be visible, or the behavior of the query might be
> change. So any function that is not leakproof is also not safe for
> this purpose.
>
> Now that doesn't rule out the possibility that the functions for
> which this optimization is safe are a subset of the leakproof
> functions - but offhand I don't see why that should be the case. The
> index tuple is just a tuple, and the values it contains are just
> datums, no more or less than in a heap tuple. There could be a reason
> for concern here, but saying that it might not be "safe to evaluate
> on indexes" just begs the question: WHY wouldn't that be safe?

Ah. For some reason I thought the "division by zero" example is one of 
the non-safe cases, but now I see int4div is not marked as leakproof, so 
we couldn't push it to index anyway.

I've however also noticed that all the 'like' procedures are marked as 
not leak proof, which is a bit unfortunate because that's the example 
from Jeff's e-mail that started this thread.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bitmap index scans use of filters on available columns

От
Amit Kapila
Дата:
On Fri, Nov 6, 2015 at 10:28 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello,
>
> At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com>
> > Apart from other problems discussed, I think it could also lead to
> > a performance penality for the cases when the qual condition is
> > costly as evaluating such a qual against lot of dead rows would be a
> > time consuming operation.  I am not sure, but I think some of the
> > other well know databases might not have any such problems as
> > they store visibility info inside index due to which they don't need to
> > fetch the heap tuple for evaluating such quals.
>
> I was junst thinking of the same thing. Can we estimate the
> degree of the expected penalty using heap statistics? Of couse
> not so accurate though.
>

I think so. Information related to number of tuples inserted, updated,
hot-updated, deleted and so on is maintained and is shown by
pg_stat_all_tables.  By the way, whats your point here, do you mean to
say that we can estimate approximate penality and then decide whether
quals should be evaluated at index level?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Bitmap index scans use of filters on available columns

От
Jeff Janes
Дата:
On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hi,
>
> On 11/07/2015 02:18 AM, Robert Haas wrote:
>>
>> On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>>>
>>>> I think LEAKPROOF is probably fine for this.  How would the new thing
>>>> be different?
>>>
>>>
>>> I don't think so - AFAIK "leakproof" is about not revealing information
>>> about arguments, nothing more and nothing less. It does not say whether
>>> it's
>>> safe to evaluate on indexes, and I don't think we should extend the
>>> meaning
>>> in this direction.
>>
>>
>> That seems like a non-answer answer.
>
>
> I'm not claiming I have an answer, really. My knowledge of leakproof stuff
> is a bit shallow. Also, I had a few beers earlier today, which does not
> really improve the depth of knowledge on any topic except politics and
> football (aka soccer). So you may be perfectly right.
>
>> Clearly, if a function can leak information about it's arguments,
>> for example by throwing an error, we can't call it on tuples that
>> might not even be visible, or the behavior of the query might be
>> change. So any function that is not leakproof is also not safe for
>> this purpose.
>>
>> Now that doesn't rule out the possibility that the functions for
>> which this optimization is safe are a subset of the leakproof
>> functions - but offhand I don't see why that should be the case. The
>> index tuple is just a tuple, and the values it contains are just
>> datums, no more or less than in a heap tuple. There could be a reason
>> for concern here, but saying that it might not be "safe to evaluate
>> on indexes" just begs the question: WHY wouldn't that be safe?
>
>
> Ah. For some reason I thought the "division by zero" example is one of the
> non-safe cases, but now I see int4div is not marked as leakproof, so we
> couldn't push it to index anyway.
>
> I've however also noticed that all the 'like' procedures are marked as not
> leak proof, which is a bit unfortunate because that's the example from
> Jeff's e-mail that started this thread.

Is there a reason they aren't leak proof?  I don't see that they have
side effects, or that they can throw errors.  What do they leak?

Anyway, I just picked that for convenience.  One of the original
pieces that lead me on this quest had a range inclusion rather than a
LIKE.  I just thought that changing it to a LIKE made the data
generator simpler and easier to reason about, without changing the
fundamental principle.

Cheers,

Jeff



Re: Bitmap index scans use of filters on available columns

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> I've however also noticed that all the 'like' procedures are marked as not
>> leak proof, which is a bit unfortunate because that's the example from
>> Jeff's e-mail that started this thread.

> Is there a reason they aren't leak proof?  I don't see that they have
> side effects, or that they can throw errors.  What do they leak?

Huh?

regression=# select 'z' like '\';
ERROR:  LIKE pattern must not end with escape character
        regards, tom lane



Re: Bitmap index scans use of filters on available columns

От
Jeff Janes
Дата:
On Sun, Nov 8, 2015 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> I've however also noticed that all the 'like' procedures are marked as not
>>> leak proof, which is a bit unfortunate because that's the example from
>>> Jeff's e-mail that started this thread.
>
>> Is there a reason they aren't leak proof?  I don't see that they have
>> side effects, or that they can throw errors.  What do they leak?
>
> Huh?
>
> regression=# select 'z' like '\';
> ERROR:  LIKE pattern must not end with escape character

Ah, I was only thinking of the pattern as a constant.  And there is no
way to declare a function leakproof on one input but not another.

Cheers,

Jeff