Обсуждение: limit clause breaks query planner?

Поиск

Список

Период

Сортировка

limit clause breaks query planner?

От

"David West"

Дата:

01 сентября 2008 г., 12:31:31

Hi,

I have a single table with about 10 million rows, and two indexes. Index A is on a column A with 95% null values. Index B is on a column B with about 10 values, ie. About a million rows of each value.

When I do a simple query on the table (no joins) with the following condition:

A is null AND

B = ‘21’

it uses the correct index, index B. However, when I add a limit clause of 15, postgres decides to do a sequential scan :s. Looking at the results from explain:

"Limit (cost=0.00..3.69 rows=15 width=128)"

" -> Seq Scan on my_table this_ (cost=0.00..252424.24 rows=1025157 width=128)"

" Filter: ((A IS NULL) AND ((B)::text = '21'::text))"

It appears that postgres is (very incorrectly) assuming that it will only have to retrieve 15 rows on a sequential scan, and gives a total cost of 3.69. In reality, it has to scan many rows forward until it finds the correct value, yielding very poor performance for my table.

If I disable sequential scan (set enable_seqscan=false) it then incorrectly uses the index A that has 95% null values: it seems to incorrectly apply the same logic again that it will only have to retrieve 15 rows with the limit clause, and thinks that the index scan using A is faster than index scan B.

Only by deleting the index on A and disabling sequential scan will it use the correct index, which is of course by far the fastest.

Is there an assumption in the planner that a limit of 15 will mean that postgres will only have to read 15 rows? If so is this a bad assumption? If a particular query is faster without a limit, then surely it will also be faster with the limit.

Any workarounds for this?

Thanks

David

Re: limit clause breaks query planner?

От

"Pavel Stehule"

Дата:

01 сентября 2008 г., 12:53:00

Hello

you should partial index

create index foo(b) on mytable where a is null;

regards
Pavel Stehule

2008/9/1 David West <david.west@cusppoint.com>:
> Hi,
>
>
>
> I have a single table with about 10 million rows, and two indexes.  Index A
> is on a column A with 95% null values.  Index B is on a column B with about
> 10 values, ie. About a million rows of each value.
>
>
>
> When I do a simple query on the table (no joins) with the following
> condition:
>
> A is null AND
>
> B = '21'
>
>
>
> it uses the correct index, index B.  However, when I add a limit clause of
> 15, postgres decides to do a sequential scan :s.  Looking at the results
> from explain:
>
>
>
> "Limit  (cost=0.00..3.69 rows=15 width=128)"
>
> "  ->  Seq Scan on my_table this_  (cost=0.00..252424.24 rows=1025157
> width=128)"
>
> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>
>
>
> It appears that postgres is (very incorrectly) assuming that it will only
> have to retrieve 15 rows on a sequential scan, and gives a total cost of
> 3.69.  In reality, it has to scan many rows forward until it finds the
> correct value, yielding very poor performance for my table.
>
>
>
> If I disable sequential scan (set enable_seqscan=false) it then incorrectly
> uses the index A that has 95% null values: it seems to incorrectly apply the
> same logic again that it will only have to retrieve 15 rows with the limit
> clause, and thinks that the index scan using A is faster than index scan B.
>
>
>
> Only by deleting the index on A and disabling sequential scan will it use
> the correct index, which is of course by far the fastest.
>
>
>
> Is there an assumption in the planner that a limit of 15 will mean that
> postgres will only have to read 15 rows?  If so is this a bad assumption?
> If a particular query is faster without a limit, then surely it will also be
> faster with the limit.
>
>
>
> Any workarounds for this?
>
>
>
> Thanks
>
> David

Re: limit clause breaks query planner?

От

"David West"

Дата:

01 сентября 2008 г., 14:44:35

Thanks for your suggestion but the result is the same.

Here is the explain analyse output from different queries.
Select * from my_table where A is null and B = '21' limit 15

"Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
"  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
"        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
"Total runtime: 85896.214 ms"

As you can see the estimated cost was 3.68: a long way from the true value.

Doing 'set enable_seqscan=false' and repeating the select:
"Limit  (cost=0.00..5.58 rows=15 width=128) (actual time=4426.438..4426.834 rows=15 loops=1)"
"  ->  Index Scan using idx_A on my_table this_  (cost=0.00..392956.76 rows=1055970 width=128) (actual
time=4426.433..4426.768rows=15 loops=1)" 
"        Index Cond: (A IS NULL)"
"        Filter: ((B)::text = '21'::text)"
"Total runtime: 4426.910 ms"

Probably some caching made this query faster, but it's still too slow, and using the wrong index.

Deleting index A gives:
"Limit  (cost=0.00..56.47 rows=15 width=128) (actual time=10.298..10.668 rows=15 loops=1)"
"  ->  Index Scan using idx_B on my_table this_  (cost=0.00..3982709.15 rows=1057960 width=128) (actual
time=10.293..10.618rows=15 loops=1)" 
"        Index Cond: ((B)::text = '21'::text)"
"        Filter: (A IS NULL)"
"Total runtime: 10.735 ms"
Much better.  However I need index A for another query so I can't just delete it.

Looking at the estimated cost, you can see why it's choosing the order that it is choosing, but it just doesn't seem to
reflectreality at all. 

Now here's the result of the query, with both indexes in place and sequential scan enabled
Select * from my_table where A is null and B = '21'
"Bitmap Heap Scan on my_table this_  (cost=20412.89..199754.37 rows=1060529 width=128) (actual time=470.772..7432.062
rows=1020062loops=1)" 
"  Recheck Cond: ((B)::text = '21'::text)"
"  Filter: (A IS NULL)"
"  ->  Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0) (actual time=466.545..466.545 rows=1020084
loops=1)"
"        Index Cond: ((B)::text = '21'::text)"
"Total runtime: 8940.119 ms"

In this case it goes for the correct index.  It appears that the query planner makes very simplistic assumptions when
itcomes to LIMIT? 

Thanks
David

-----Original Message-----
From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
Sent: 01 September 2008 13:53
To: David West
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

Hello

you should partial index

create index foo(b) on mytable where a is null;

regards
Pavel Stehule

2008/9/1 David West <david.west@cusppoint.com>:
> Hi,
>
>
>
> I have a single table with about 10 million rows, and two indexes.  Index A
> is on a column A with 95% null values.  Index B is on a column B with about
> 10 values, ie. About a million rows of each value.
>
>
>
> When I do a simple query on the table (no joins) with the following
> condition:
>
> A is null AND
>
> B = '21'
>
>
>
> it uses the correct index, index B.  However, when I add a limit clause of
> 15, postgres decides to do a sequential scan :s.  Looking at the results
> from explain:
>
>
>
> "Limit  (cost=0.00..3.69 rows=15 width=128)"
>
> "  ->  Seq Scan on my_table this_  (cost=0.00..252424.24 rows=1025157
> width=128)"
>
> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>
>
>
> It appears that postgres is (very incorrectly) assuming that it will only
> have to retrieve 15 rows on a sequential scan, and gives a total cost of
> 3.69.  In reality, it has to scan many rows forward until it finds the
> correct value, yielding very poor performance for my table.
>
>
>
> If I disable sequential scan (set enable_seqscan=false) it then incorrectly
> uses the index A that has 95% null values: it seems to incorrectly apply the
> same logic again that it will only have to retrieve 15 rows with the limit
> clause, and thinks that the index scan using A is faster than index scan B.
>
>
>
> Only by deleting the index on A and disabling sequential scan will it use
> the correct index, which is of course by far the fastest.
>
>
>
> Is there an assumption in the planner that a limit of 15 will mean that
> postgres will only have to read 15 rows?  If so is this a bad assumption?
> If a particular query is faster without a limit, then surely it will also be
> faster with the limit.
>
>
>
> Any workarounds for this?
>
>
>
> Thanks
>
> David

Re: limit clause breaks query planner?

От

"Pavel Stehule"

Дата:

01 сентября 2008 г., 19:17:27

Hello

2008/9/1 David West <david.west@cusppoint.com>:
> Thanks for your suggestion but the result is the same.
>
> Here is the explain analyse output from different queries.
> Select * from my_table where A is null and B = '21' limit 15
>
> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
> "Total runtime: 85896.214 ms"
>

I see it - problem is in statistics - system expect 1055580, but there
is only 15 values.

try
a) increase statistics on column a and b - probably there are strong
dependency between column a nad b, because statistic are totally out
b) try cursors
declare cursor c as Select * from my_table where A is null and B =
'21' limit 15;
fetch forward 15 from c;

http://www.postgresql.org/docs/8.2/static/sql-fetch.html

maybe

select * from (select * from mytable where b = '21' offset 0) where a
is null limit 15

regards
Pavel Stehule




> As you can see the estimated cost was 3.68: a long way from the true value.
>
> Doing 'set enable_seqscan=false' and repeating the select:
> "Limit  (cost=0.00..5.58 rows=15 width=128) (actual time=4426.438..4426.834 rows=15 loops=1)"
> "  ->  Index Scan using idx_A on my_table this_  (cost=0.00..392956.76 rows=1055970 width=128) (actual
time=4426.433..4426.768rows=15 loops=1)" 
> "        Index Cond: (A IS NULL)"
> "        Filter: ((B)::text = '21'::text)"
> "Total runtime: 4426.910 ms"
>
> Probably some caching made this query faster, but it's still too slow, and using the wrong index.
>
> Deleting index A gives:
> "Limit  (cost=0.00..56.47 rows=15 width=128) (actual time=10.298..10.668 rows=15 loops=1)"
> "  ->  Index Scan using idx_B on my_table this_  (cost=0.00..3982709.15 rows=1057960 width=128) (actual
time=10.293..10.618rows=15 loops=1)" 
> "        Index Cond: ((B)::text = '21'::text)"
> "        Filter: (A IS NULL)"
> "Total runtime: 10.735 ms"
> Much better.  However I need index A for another query so I can't just delete it.
>
> Looking at the estimated cost, you can see why it's choosing the order that it is choosing, but it just doesn't seem
toreflect reality at all. 
>
> Now here's the result of the query, with both indexes in place and sequential scan enabled
> Select * from my_table where A is null and B = '21'
> "Bitmap Heap Scan on my_table this_  (cost=20412.89..199754.37 rows=1060529 width=128) (actual time=470.772..7432.062
rows=1020062loops=1)" 
> "  Recheck Cond: ((B)::text = '21'::text)"
> "  Filter: (A IS NULL)"
> "  ->  Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0) (actual time=466.545..466.545
rows=1020084loops=1)" 
> "        Index Cond: ((B)::text = '21'::text)"
> "Total runtime: 8940.119 ms"
>
> In this case it goes for the correct index.  It appears that the query planner makes very simplistic assumptions when
itcomes to LIMIT? 
>
> Thanks
> David
>
> -----Original Message-----
> From: Pavel Stehule [mailto:pavel.stehule@gmail.com]
> Sent: 01 September 2008 13:53
> To: David West
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] limit clause breaks query planner?
>
> Hello
>
> you should partial index
>
> create index foo(b) on mytable where a is null;
>
> regards
> Pavel Stehule
>
> 2008/9/1 David West <david.west@cusppoint.com>:
>> Hi,
>>
>>
>>
>> I have a single table with about 10 million rows, and two indexes.  Index A
>> is on a column A with 95% null values.  Index B is on a column B with about
>> 10 values, ie. About a million rows of each value.
>>
>>
>>
>> When I do a simple query on the table (no joins) with the following
>> condition:
>>
>> A is null AND
>>
>> B = '21'
>>
>>
>>
>> it uses the correct index, index B.  However, when I add a limit clause of
>> 15, postgres decides to do a sequential scan :s.  Looking at the results
>> from explain:
>>
>>
>>
>> "Limit  (cost=0.00..3.69 rows=15 width=128)"
>>
>> "  ->  Seq Scan on my_table this_  (cost=0.00..252424.24 rows=1025157
>> width=128)"
>>
>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>
>>
>>
>> It appears that postgres is (very incorrectly) assuming that it will only
>> have to retrieve 15 rows on a sequential scan, and gives a total cost of
>> 3.69.  In reality, it has to scan many rows forward until it finds the
>> correct value, yielding very poor performance for my table.
>>
>>
>>
>> If I disable sequential scan (set enable_seqscan=false) it then incorrectly
>> uses the index A that has 95% null values: it seems to incorrectly apply the
>> same logic again that it will only have to retrieve 15 rows with the limit
>> clause, and thinks that the index scan using A is faster than index scan B.
>>
>>
>>
>> Only by deleting the index on A and disabling sequential scan will it use
>> the correct index, which is of course by far the fastest.
>>
>>
>>
>> Is there an assumption in the planner that a limit of 15 will mean that
>> postgres will only have to read 15 rows?  If so is this a bad assumption?
>> If a particular query is faster without a limit, then surely it will also be
>> faster with the limit.
>>
>>
>>
>> Any workarounds for this?
>>
>>
>>
>> Thanks
>>
>> David
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: limit clause breaks query planner?

От

Russell Smith

Дата:

02 сентября 2008 г., 04:29:36

Pavel Stehule wrote:
> Hello
>
> 2008/9/1 David West <david.west@cusppoint.com>:
>
>> Thanks for your suggestion but the result is the same.
>>
>> Here is the explain analyse output from different queries.
>> Select * from my_table where A is null and B = '21' limit 15
>>
>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>> "Total runtime: 85896.214 ms"
>>
>>
[snip]

Further to Pavel's comments;

(actual time=85837.038..85896.091 rows=15 loops=1)

That's 85 seconds on a sequence scan to return the first tuple.  The table is not bloated by any chance is it?

Regards

Russell

Re: limit clause breaks query planner?

От

Guillaume Cottenceau

Дата:

02 сентября 2008 г., 07:44:45

"Pavel Stehule" <pavel.stehule 'at' gmail.com> writes:

> Hello
>
> 2008/9/1 David West <david.west@cusppoint.com>:
>> Thanks for your suggestion but the result is the same.
>>
>> Here is the explain analyse output from different queries.
>> Select * from my_table where A is null and B = '21' limit 15
>>
>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>> "Total runtime: 85896.214 ms"
>>
>
> I see it - problem is in statistics - system expect 1055580, but there
> is only 15 values.

Aren't you rather seeing the effect of the limit clause?

gc=# create table foo ( bar int );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000 );
INSERT 0 10000001
gc=# analyze foo;
ANALYZE
gc=# explain analyze select * from foo where bar = 8 limit 15;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.30 rows=15 width=4) (actual time=2379.878..2379.921 rows=15 loops=1)
   ->  Seq Scan on foo  (cost=0.00..164217.00 rows=1070009 width=4) (actual time=2379.873..2379.888 rows=15 loops=1)
         Filter: (bar = 8)
 Total runtime: 2379.974 ms

(on 8.3.1)

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

"Pavel Stehule"

Дата:

02 сентября 2008 г., 07:47:10

2008/9/2 Guillaume Cottenceau <gc@mnc.ch>:
> "Pavel Stehule" <pavel.stehule 'at' gmail.com> writes:
>
>> Hello
>>
>> 2008/9/1 David West <david.west@cusppoint.com>:
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>
>> I see it - problem is in statistics - system expect 1055580, but there
>> is only 15 values.
>
> Aren't you rather seeing the effect of the limit clause?

yes, true, my mistake

Pavel

>
> gc=# create table foo ( bar int );
> CREATE TABLE
> gc=# insert into foo ( select generate_series(0, 10000000) / 1000000 );
> INSERT 0 10000001
> gc=# analyze foo;
> ANALYZE
> gc=# explain analyze select * from foo where bar = 8 limit 15;
>                                                     QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..2.30 rows=15 width=4) (actual time=2379.878..2379.921 rows=15 loops=1)
>   ->  Seq Scan on foo  (cost=0.00..164217.00 rows=1070009 width=4) (actual time=2379.873..2379.888 rows=15 loops=1)
>         Filter: (bar = 8)
>  Total runtime: 2379.974 ms
>
> (on 8.3.1)
>
> --
> Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
> Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36
>

Re: limit clause breaks query planner?

От

Guillaume Cottenceau

Дата:

02 сентября 2008 г., 09:07:02

Russell Smith <mr-russ 'at' pws.com.au> writes:

> Pavel Stehule wrote:
>> Hello
>>
>> 2008/9/1 David West <david.west@cusppoint.com>:
>>
>>> Thanks for your suggestion but the result is the same.
>>>
>>> Here is the explain analyse output from different queries.
>>> Select * from my_table where A is null and B = '21' limit 15
>>>
>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>> "Total runtime: 85896.214 ms"
>>>
>>>
> [snip]
>
> Further to Pavel's comments;
>
> (actual time=85837.038..85896.091 rows=15 loops=1)
>
> That's 85 seconds on a sequence scan to return the first tuple.  The table is not bloated by any chance is it?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case when random() < 0.05 then 'Today
Alcatel-Lucenthas announced that P******* C**** is appointed non-executive Chairman and B** V******** is appointed
ChiefExecutive Officer.' else null end ); 
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
         ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=50000 width=0)
               Index Cond: (bar = 8)
         ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=50000 width=0)
               Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

"David West"

Дата:

02 сентября 2008 г., 09:48:19

Yes I inserted values in big batches according to a single value of 'B', so
indeed a sequence scan may have to scan forward many millions of rows before
finding the required value.

I have been doing regular analyse commands on my table.  I don't think my
table is bloated, I haven't been performing updates.  However I'm doing a
vacuum analyse now and I'll see if that makes any difference.

I am using postgres 8.3.1 with a default install on windows - no tweaks to
the configuration at all.

There are many other columns in my table, but none of them are used in this
query.

Guillaume in your example you didn't add the limit clause?  Postgres chooses
the correct index in my case without the limit clause, the problem is with
the limit clause.  One other difference with your example is both my columns
are varchar columns, not integer and text, I don't know if that would make a
difference.

From looking at the plans, it seems to be postgres is assuming it will only
have to sequentially scan 15 rows, which is not true in my case because
column B is not distributed randomly (nor will it be in production).  Would
postgres not be best to ignore the limit when deciding the best index to use
-  in this simple query wouldn't the best plan to use always be the same
with or without a limit?

Thanks to all of you for your interest in my problem
David

-----Original Message-----
From: Guillaume Cottenceau [mailto:gc@mnc.ch]
Sent: 02 September 2008 10:07
To: David West; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

Wouldn't this be e.g. normal if the distribution of values would
be uneven, e.g. A IS NULL AND B = '21' not near the beginning of
the table data?

By the way, my newbie eyes on "pg_stats" seem to tell me that PG
doesn't collect/use statistics about the distribution of the
data, am I wrong? E.g. in that situation, when a few A IS NULL
AND B = '21' rows move from the beginning to the end of the table
data, a seqscan becomes a totally different story.. (the
correlation changes, but may not change a lot if only a few rows
move).

However, I cannot reproduce a similar situation to David's.

gc=# create table foo ( bar int, baz text );
CREATE TABLE
gc=# insert into foo ( select generate_series(0, 10000000) / 1000000, case
when random() < 0.05 then 'Today Alcatel-Lucent has announced that P*******
C**** is appointed non-executive Chairman and B** V******** is appointed
Chief Executive Officer.' else null end );
INSERT 0 10000001
gc=# create index foobar on foo(bar);
CREATE INDEX
gc=# create index foobaz on foo(baz);
CREATE INDEX
gc=# explain select * from foo where baz is null and bar = '8';
                                   QUERY PLAN

----------------------------------------------------------------------------
-----
 Bitmap Heap Scan on foo  (cost=1297.96..1783.17 rows=250 width=36)
   Recheck Cond: ((bar = 8) AND (baz IS NULL))
   ->  BitmapAnd  (cost=1297.96..1297.96 rows=250 width=0)
         ->  Bitmap Index Scan on foobar  (cost=0.00..595.69 rows=50000
width=0)
               Index Cond: (bar = 8)
         ->  Bitmap Index Scan on foobaz  (cost=0.00..701.90 rows=50000
width=0)
               Index Cond: (baz IS NULL)
(7 rows)

gc=# analyze foo;
ANALYZE
gc=# explain select * from foo where baz is null and bar = '8';
                                  QUERY PLAN

----------------------------------------------------------------------------
--
 Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089
width=154)
   Index Cond: (bar = 8)
   Filter: (baz IS NULL)
(3 rows)

This is using pg 8.3.1 and:

random_page_cost = 2
effective_cache_size = 256MB
shared_buffers = 384MB

David, is there relevant information you've forgot to tell:

- any other columns in your table?
- is table bloated?
- has table never been analyzed?
- what version of postgresql? what overriden configuration?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

Gregory Stark

Дата:

02 сентября 2008 г., 09:53:17


>>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)"
>>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091
rows=15loops=1)" 
>>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>>> "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A" column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: limit clause breaks query planner?

От

Guillaume Cottenceau

Дата:

02 сентября 2008 г., 10:16:25

"David West" <david.west 'at' cusppoint.com> writes:

> Yes I inserted values in big batches according to a single value of 'B', so
> indeed a sequence scan may have to scan forward many millions of rows before
> finding the required value.

That may well be why the seqscan is so slow to give your results;
that said, it doesn't explain why the indexscane is not
preferred.

> I have been doing regular analyse commands on my table.  I don't think my

Like, recently? Can you post the stats?

gc=# select * from pg_stats where tablename = 'foo';

You should try to ANALYZE again and see if that makes a
difference, to be sure.

> table is bloated, I haven't been performing updates.  However I'm doing a

Maybe you've been DELETE'ing then INSERT'ing some? That creates
bloat too. Btw, don't forget to prefer TRUNCATE to remove
everything from the table, and ANALYZE after large INSERT's.

> vacuum analyse now and I'll see if that makes any difference.

A single VACUUM may not report how bloated your table is, if it's
been VACUUM'ed some before, but not frequently enough. If you
have time for it, and you can afford a full lock on the table,
only a VACUUM FULL VERBOSE will tell you the previous bloat (the
"table .. truncated to .." line IIRC).

> I am using postgres 8.3.1 with a default install on windows - no tweaks to
> the configuration at all.

With a default install, the free space map settings may well be
too small for tracking free space on a table as large as 10M
rows. Performing VACUUM VERBOSE on database 'template1' will show
you interesting information about the current and ideal FSM
settings, at the end of the output. Something like:

   INFO:  free space map contains 37709 pages in 276 relations
   DETAIL:  A total of 42080 page slots are in use (including overhead).
   42080 page slots are required to track all free space.
   Current limits are:  204800 page slots, 1000 relations, using 1265 kB.

Of course, this also depends on the frequency of your VACUUMing
(if autovacuuming is not configured or badly configured) against
the frequency of your UPDATE's and DELETE's.

> There are many other columns in my table, but none of them are used in this
> query.

Can you show us the table definition? I am too ignorant in PG to
know if that would make a difference, but it might ring a bell
for others.. AFAIK, more column data may mean larger resultsets
and may change the plan (though "width=128" in the log of your
explanation wouldn't mean a lot of data are stored per row).

> Guillaume in your example you didn't add the limit clause?  Postgres chooses
> the correct index in my case without the limit clause, the problem is with
> the limit clause.

Duh, forgot about that, sorry! But I did try it and it was the same.

gc=# explain select * from foo where baz is null and bar = '8' limit 15;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.42 rows=15 width=154)
   ->  Index Scan using foobar on foo  (cost=0.00..30398.66 rows=1079089 width=154)
         Index Cond: (bar = 8)
         Filter: (baz IS NULL)
(4 rows)

> One other difference with your example is both my columns are
> varchar columns, not integer and text, I don't know if that
> would make a difference.

It is always useful to know as much about the actual table
definition and data, to isolate a performance problem... I know
it may clash with privacy :/ but that kind of information
probably will not, isn't it?

With:

gc=# create table foo ( bar varchar(64), baz varchar(256) );

it doesn't make a difference yet:

gc=# explain select * from foo where baz is null and bar = '8';
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using foobar on foo  (cost=0.00..27450.05 rows=982092 width=149)
   Index Cond: ((bar)::text = '8'::text)
   Filter: (baz IS NULL)
(3 rows)

gc=# explain select * from foo where baz is null and bar = '8' limit 15;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=0.00..0.42 rows=15 width=149)
   ->  Index Scan using foobar on foo  (cost=0.00..27450.05 rows=982092 width=149)
         Index Cond: ((bar)::text = '8'::text)
         Filter: (baz IS NULL)
(4 rows)

Btw, it would help if you could reproduce my test scenario and
see if PG uses "correctly" the indexscan. It is better to try on
your installation, to take care of any configuration/whatever
variation which may create your problem.

>From looking at the plans, it seems to be postgres is assuming it will only
> have to sequentially scan 15 rows, which is not true in my case because
> column B is not distributed randomly (nor will it be in production).  Would

Why do you say that? The explanation seems to rather tell that it
(correctly) assumes that the seqscan would bring up about 1M rows
for the selected values of A and B, and then it will limit to 15
rows.

> postgres not be best to ignore the limit when deciding the best index to use
> -  in this simple query wouldn't the best plan to use always be the same
> with or without a limit?

I am not too sure, but I'd say no: when PG considers the LIMIT,
then it knows that (potentially) less rows are to be actually
used from the inner resultset, so a different plan may be
devised.

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

"David West"

Дата:

02 сентября 2008 г., 11:16:53

Here is the results of 'vacuum analyse verbose' on my table:

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  scanned index "jbpm_taskinstance_pkey" to remove 928153 row versions
DETAIL:  CPU 0.70s/2.40u sec elapsed 46.49 sec.
INFO:  scanned index "idx_tskinst_tminst" to remove 928153 row versions
DETAIL:  CPU 0.78s/2.34u sec elapsed 88.99 sec.
INFO:  scanned index "idx_tskinst_slinst" to remove 928153 row versions
DETAIL:  CPU 0.63s/2.37u sec elapsed 92.54 sec.
INFO:  scanned index "idx_taskinst_tokn" to remove 928153 row versions
DETAIL:  CPU 0.99s/2.30u sec elapsed 110.29 sec.
INFO:  scanned index "idx_taskinst_tsk" to remove 928153 row versions
DETAIL:  CPU 0.92s/2.63u sec elapsed 89.16 sec.
INFO:  scanned index "idx_pooled_actor" to remove 928153 row versions
DETAIL:  CPU 0.32s/1.65u sec elapsed 2.56 sec.
INFO:  scanned index "idx_task_actorid" to remove 928153 row versions
DETAIL:  CPU 0.09s/1.88u sec elapsed 2.69 sec.
INFO:  "jbpm_taskinstance": removed 928153 row versions in 13685 pages
DETAIL:  CPU 0.84s/0.82u sec elapsed 26.42 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  927985 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages

Afterwards I ran a 'vacuum full verbose'

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  "jbpm_taskinstance": found 0 removable, 7555748 nonremovable row
versions in 166156 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 209 bytes long.
There were 8470471 unused item pointers.
Total free space (including removable row versions) is 208149116 bytes.
9445 pages are or will become empty, including 0 at the end of the table.
119104 pages containing 206008504 free bytes are potential move
destinations.
CPU 2.44s/1.60u sec elapsed 127.89 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.87s/2.16u sec elapsed 120.81 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  0 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 0.79s/1.95u sec elapsed 103.52 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  0 index row versions were removed.
28343 index pages have been deleted, 28343 are currently reusable.
CPU 0.62s/1.93u sec elapsed 99.21 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  0 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.10s/2.18u sec elapsed 108.29 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.01s/1.73u sec elapsed 64.73 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 20896
pages
DETAIL:  0 index row versions were removed.
136 index pages have been deleted, 136 are currently reusable.
CPU 0.26s/1.57u sec elapsed 3.01 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 20885
pages
DETAIL:  0 index row versions were removed.
121 index pages have been deleted, 121 are currently reusable.
CPU 0.23s/1.52u sec elapsed 2.77 sec.
INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
140279 pages
DETAIL:  CPU 26.50s/138.35u sec elapsed 735.02 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.04s/1.38u sec elapsed 117.72 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  1374243 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 1.37s/1.01u sec elapsed 123.56 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  1374243 index row versions were removed.
28560 index pages have been deleted, 28560 are currently reusable.
CPU 1.20s/1.27u sec elapsed 105.67 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  1374243 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.29s/0.96u sec elapsed 112.62 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.48s/1.12u sec elapsed 70.56 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 25534
pages
DETAIL:  1374243 index row versions were removed.
3769 index pages have been deleted, 3769 are currently reusable.
CPU 0.48s/0.82u sec elapsed 6.89 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 25545
pages
DETAIL:  1374243 index row versions were removed.
3790 index pages have been deleted, 3790 are currently reusable.
CPU 0.37s/1.24u sec elapsed 7.93 sec.
INFO:  vacuuming "pg_toast.pg_toast_560501"
INFO:  "pg_toast_560501": found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_560501_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

Query returned successfully with no result in 1910948 ms.

So it looks to me like it was bloated by about 25%.  Performance seems
better for 'some' values of column B (I believe all the vacuumed rows were
at the beginning), but I've still done the same query which can take over a
minute in some cases - depending on caching etc.  It is still choosing
sequential scan.

Here are the queries you requested:
EXPLAIN ANALYZE SELECT count(*) FROM jbpm_taskinstance WHERE actorid_ IS
NULL;
"Aggregate  (cost=234297.69..234297.70 rows=1 width=0) (actual
time=109648.328..109648.330 rows=1 loops=1)"
"  ->  Seq Scan on jbpm_taskinstance  (cost=0.00..215836.48 rows=7384484
width=0) (actual time=13.684..99002.279 rows=7315726 loops=1)"
"        Filter: (actorid_ IS NULL)"
"Total runtime: 109648.403 ms"

EXPLAIN ANALYZE SELECT count(*) FROM jbpm_taskinstance WHERE
pooledactor_='21';
"Aggregate  (cost=180929.77..180929.78 rows=1 width=0) (actual
time=6739.215..6739.217 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on jbpm_taskinstance  (cost=23839.23..178127.84
rows=1120769 width=0) (actual time=633.808..5194.672 rows=1020084 loops=1)"
"        Recheck Cond: ((pooledactor_)::text = '21'::text)"
"        ->  Bitmap Index Scan on idx_pooled_actor  (cost=0.00..23559.04
rows=1120769 width=0) (actual time=612.546..612.546 rows=1020084 loops=1)"
"              Index Cond: ((pooledactor_)::text = '21'::text)"
"Total runtime: 6739.354 ms"

EXPLAIN ANALYZE SELECT count(*) FROM jbpm_taskinstance WHERE actorid_ IS
NULL AND pooledactor_='21';
"Aggregate  (cost=180859.91..180859.92 rows=1 width=0) (actual
time=4358.316..4358.318 rows=1 loops=1)"
"  ->  Bitmap Heap Scan on jbpm_taskinstance  (cost=23832.88..178121.49
rows=1095365 width=0) (actual time=377.206..2929.735 rows=1020062 loops=1)"
"        Recheck Cond: ((pooledactor_)::text = '21'::text)"
"        Filter: (actorid_ IS NULL)"
"        ->  Bitmap Index Scan on idx_pooled_actor  (cost=0.00..23559.04
rows=1120769 width=0) (actual time=373.160..373.160 rows=1020084 loops=1)"
"              Index Cond: ((pooledactor_)::text = '21'::text)"
"Total runtime: 4366.766 ms"

Many thanks,
David





-----Original Message-----
From: Greg Stark [mailto:greg.stark enterprisedb.com] On Behalf Of Gregory
Stark
Sent: 02 September 2008 10:45
To: Guillaume Cottenceau
Cc: David West; pgsql-performance@postgresql.org
Subject: Re: limit clause breaks query planner?



>>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual
time=85837.043..85896.140 rows=15 loops=1)"
>>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580
width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>>> "        Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>>> "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A"
column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being
extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if
there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the
two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then
Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the
distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for
the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: limit clause breaks query planner?

От

"David West"

Дата:

02 сентября 2008 г., 11:53:15

>A single VACUUM may not report how bloated your table is, if it's
>been VACUUM'ed some before, but not frequently enough. If you
>have time for it, and you can afford a full lock on the table,
>only a VACUUM FULL VERBOSE will tell you the previous bloat (the
>"table .. truncated to .." line IIRC).

Here's the output of vacuum full verbose (after running a plain vacuum
verbose)

INFO:  vacuuming "public.jbpm_taskinstance"
INFO:  "jbpm_taskinstance": found 0 removable, 7555748 nonremovable row
versions in 166156 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 209 bytes long.
There were 8470471 unused item pointers.
Total free space (including removable row versions) is 208149116 bytes.
9445 pages are or will become empty, including 0 at the end of the table.
119104 pages containing 206008504 free bytes are potential move
destinations.
CPU 2.44s/1.60u sec elapsed 127.89 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.87s/2.16u sec elapsed 120.81 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  0 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 0.79s/1.95u sec elapsed 103.52 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  0 index row versions were removed.
28343 index pages have been deleted, 28343 are currently reusable.
CPU 0.62s/1.93u sec elapsed 99.21 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  0 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.10s/2.18u sec elapsed 108.29 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.01s/1.73u sec elapsed 64.73 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 20896
pages
DETAIL:  0 index row versions were removed.
136 index pages have been deleted, 136 are currently reusable.
CPU 0.26s/1.57u sec elapsed 3.01 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 20885
pages
DETAIL:  0 index row versions were removed.
121 index pages have been deleted, 121 are currently reusable.
CPU 0.23s/1.52u sec elapsed 2.77 sec.
INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
140279 pages
DETAIL:  CPU 26.50s/138.35u sec elapsed 735.02 sec.
INFO:  index "jbpm_taskinstance_pkey" now contains 7555748 row versions in
62090 pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.04s/1.38u sec elapsed 117.72 sec.
INFO:  index "idx_tskinst_tminst" now contains 7555748 row versions in 65767
pages
DETAIL:  1374243 index row versions were removed.
26024 index pages have been deleted, 26024 are currently reusable.
CPU 1.37s/1.01u sec elapsed 123.56 sec.
INFO:  index "idx_tskinst_slinst" now contains 7555748 row versions in 56031
pages
DETAIL:  1374243 index row versions were removed.
28560 index pages have been deleted, 28560 are currently reusable.
CPU 1.20s/1.27u sec elapsed 105.67 sec.
INFO:  index "idx_taskinst_tokn" now contains 7555748 row versions in 65758
pages
DETAIL:  1374243 index row versions were removed.
26012 index pages have been deleted, 26012 are currently reusable.
CPU 1.29s/0.96u sec elapsed 112.62 sec.
INFO:  index "idx_taskinst_tsk" now contains 7555748 row versions in 64516
pages
DETAIL:  1374243 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.48s/1.12u sec elapsed 70.56 sec.
INFO:  index "idx_pooled_actor" now contains 7555748 row versions in 25534
pages
DETAIL:  1374243 index row versions were removed.
3769 index pages have been deleted, 3769 are currently reusable.
CPU 0.48s/0.82u sec elapsed 6.89 sec.
INFO:  index "idx_task_actorid" now contains 7555748 row versions in 25545
pages
DETAIL:  1374243 index row versions were removed.
3790 index pages have been deleted, 3790 are currently reusable.
CPU 0.37s/1.24u sec elapsed 7.93 sec.
INFO:  vacuuming "pg_toast.pg_toast_560501"
INFO:  "pg_toast_560501": found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_560501_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

>Can you show us the table definition? I am too ignorant in PG to
>know if that would make a difference, but it might ring a bell
>for others.. AFAIK, more column data may mean larger resultsets
>and may change the plan (though "width=128" in the log of your
>explanation wouldn't mean a lot of data are stored per row).

Yep, the table is from the jboss jbpm (business process management) schema.
It has one modification, I've added the pooledactor_ column (column B as
I've been referring to it until now) to remove a many-to-many relationship
to simplify my query.  Column A as I've been referring to is the actorid_
column. Here's the ddl:

CREATE TABLE jbpm_taskinstance
(
  id_ bigint NOT NULL,
  class_ character(1) NOT NULL,
  version_ integer NOT NULL,
  name_ character varying(255),
  description_ character varying(4000),
  actorid_ character varying(255),
  create_ timestamp without time zone,
  start_ timestamp without time zone,
  end_ timestamp without time zone,
  duedate_ timestamp without time zone,
  priority_ integer,
  iscancelled_ boolean,
  issuspended_ boolean,
  isopen_ boolean,
  issignalling_ boolean,
  isblocking_ boolean,
  task_ bigint,
  token_ bigint,
  procinst_ bigint,
  swimlaninstance_ bigint,
  taskmgmtinstance_ bigint,
  pooledactor_ character varying(255),
  processname_ character varying(255),
  CONSTRAINT jbpm_taskinstance_pkey PRIMARY KEY (id_),
  CONSTRAINT fk_taskinst_slinst FOREIGN KEY (swimlaninstance_)
      REFERENCES jbpm_swimlaneinstance (id_) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_taskinst_task FOREIGN KEY (task_)
      REFERENCES jbpm_task (id_) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_taskinst_tminst FOREIGN KEY (taskmgmtinstance_)
      REFERENCES jbpm_moduleinstance (id_) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_taskinst_token FOREIGN KEY (token_)
      REFERENCES jbpm_token (id_) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_tskins_prcins FOREIGN KEY (procinst_)
      REFERENCES jbpm_processinstance (id_) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (OIDS=FALSE);

-- Index: idx_pooled_actor

-- DROP INDEX idx_pooled_actor;

CREATE INDEX idx_pooled_actor
  ON jbpm_taskinstance
  USING btree
  (pooledactor_);

-- Index: idx_taskinst_tokn

-- DROP INDEX idx_taskinst_tokn;

CREATE INDEX idx_taskinst_tokn
  ON jbpm_taskinstance
  USING btree
  (token_);

-- Index: idx_taskinst_tsk

-- DROP INDEX idx_taskinst_tsk;

CREATE INDEX idx_taskinst_tsk
  ON jbpm_taskinstance
  USING btree
  (task_, procinst_);

-- Index: idx_tskinst_slinst

-- DROP INDEX idx_tskinst_slinst;

CREATE INDEX idx_tskinst_slinst
  ON jbpm_taskinstance
  USING btree
  (swimlaninstance_);

-- Index: idx_tskinst_tminst

-- DROP INDEX idx_tskinst_tminst;

CREATE INDEX idx_tskinst_tminst
  ON jbpm_taskinstance
  USING btree
  (taskmgmtinstance_);



>Btw, it would help if you could reproduce my test scenario and
>see if PG uses "correctly" the indexscan. It is better to try on
>your installation, to take care of any configuration/whatever
>variation which may create your problem.

I have tried your example and I get the same results as you.

db=# explain select * from foo where baz is null and bar = '8' limit 15;

                                    QUERY PLAN

----------------------------------------------------------------------------
----
---
 Limit  (cost=0.00..0.53 rows=15 width=154)
   ->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
width=15
4)
         Index Cond: (bar = 8)
         Filter: (baz IS NULL)
(4 rows)

db=# drop index foobar;
DROP INDEX
db=# explain select * from foo where baz is null and bar = '8' limit 15;

                             QUERY PLAN
---------------------------------------------------------------------
 Limit  (cost=0.00..2.87 rows=15 width=154)
   ->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
         Filter: ((baz IS NULL) AND (bar = 8))
(3 rows)

It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
sequential scan.  I wonder why in my real tables the index scan cost is
higher than the sequential scan cost.  Perhaps because of the extra width of
my rows?

>> From looking at the plans, it seems to be postgres is assuming it will
>> only
>> have to sequentially scan 15 rows, which is not true in my case
>> because column B is not distributed randomly (nor will it be in
>> production).  Would
>
>Why do you say that? The explanation seems to rather tell that it
>(correctly) assumes that the seqscan would bring up about 1M rows for the
selected values of A and B, and then it will limit to 15 rows.

I say that because the plan gives a really really low number (3.21) for the
estimated cost after the limit on sequential scan:

Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and
this_.POOLEDACTOR_ in ('21') limit 15
"Limit  (cost=0.00..3.21 rows=15 width=128) (actual
time=84133.211..84187.247 rows=15 loops=1)"
"  ->  Seq Scan on jbpm_taskinstance this_  (cost=0.00..234725.85
rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15 loops=1)"
"        Filter: ((actorid_ IS NULL) AND ((pooledactor_)::text =
'21'::text))"
"Total runtime: 84187.335 ms"

It just seems to me it is not taking into account at all that it might have
to scan thousands or millions of rows before it gets the 15 rows it needs.

If I disable sequence scan, it equally uselessly chooses the wrong index,
based on a far too low estimated cost of 5.27:
Set enable_seqscan=false
"Limit  (cost=0.00..5.27 rows=15 width=128) (actual
time=120183.610..120183.707 rows=15 loops=1)"
"  ->  Index Scan using idx_task_actorid on jbpm_taskinstance this_
(cost=0.00..384657.95 rows=1095365 width=128) (actual
time=120183.604..120183.653 rows=15 loops=1)"
"        Index Cond: (actorid_ IS NULL)"
"        Filter: ((pooledactor_)::text = '21'::text)"
"Total runtime: 120183.788 ms"





Many thanks
David

Re: limit clause breaks query planner?

От

Guillaume Cottenceau

Дата:

02 сентября 2008 г., 13:55:52

"David West" <david.west 'at' cusppoint.com> writes:

> INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
> 140279 pages

nothing which would explain so much planning off :/

> Yep, the table is from the jboss jbpm (business process management) schema.

I've went to that kind of test then, but it didn't help much:

  create table foo ( bar character varying(255), baz character varying(255),
    id_ bigint NOT NULL,
    class_ character(1) NOT NULL,
    version_ integer NOT NULL,
    name_ character varying(255),
    description_ character varying(4000),
    create_ timestamp without time zone,
    start_ timestamp without time zone,
    end_ timestamp without time zone,
    duedate_ timestamp without time zone,
    priority_ integer,
    iscancelled_ boolean,
    issuspended_ boolean,
    isopen_ boolean,
    issignalling_ boolean,
    isblocking_ boolean,
    task_ bigint,
    token_ bigint,
    procinst_ bigint,
    swimlaninstance_ bigint,
    taskmgmtinstance_ bigint,
    processname_ character varying(255) );

  insert into foo ( select generate_series(0, 10000000) / 1000000, case when random() < 0.05 then 'Today Alcatel-Lucent
hasannounced that Philippe Camus is appointed non-executive Chairman and Ben Verwaayen is appointed Chief Executive
Officer.'else null end, 1, 'a', 1  ); 

  create index foobaz on foo(baz);
  create index foobar on foo(bar);
  analyze foo;

Estimated costs still look correct on my side:

  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
                                       QUERY PLAN
  ------------------------------------------------------------------------------------
   Limit  (cost=0.00..0.46 rows=15 width=1795)
     ->  Index Scan using foobar on foo  (cost=0.00..26311.70 rows=860238 width=1795)
           Index Cond: ((bar)::text = '8'::text)
           Filter: (baz IS NULL)
  (4 rows)

  gc=# set enable_indexscan = off;
  SET
  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
                                QUERY PLAN
  ----------------------------------------------------------------------
   Limit  (cost=0.00..3.46 rows=15 width=1795)
     ->  Seq Scan on foo  (cost=0.00..198396.62 rows=860238 width=1795)
           Filter: ((baz IS NULL) AND ((bar)::text = '8'::text))
  (3 rows)


>>Btw, it would help if you could reproduce my test scenario and
>>see if PG uses "correctly" the indexscan. It is better to try on
>>your installation, to take care of any configuration/whatever
>>variation which may create your problem.
>
> I have tried your example and I get the same results as you.
>
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                                     QUERY PLAN
>
> ----------------------------------------------------------------------------
> ----
> ---
>  Limit  (cost=0.00..0.53 rows=15 width=154)
>    ->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
> width=15
> 4)
>          Index Cond: (bar = 8)
>          Filter: (baz IS NULL)
> (4 rows)
>
> db=# drop index foobar;
> DROP INDEX
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                              QUERY PLAN
> ---------------------------------------------------------------------
>  Limit  (cost=0.00..2.87 rows=15 width=154)
>    ->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
>          Filter: ((baz IS NULL) AND (bar = 8))
> (3 rows)
>
> It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
> sequential scan.  I wonder why in my real tables the index scan cost is
> higher than the sequential scan cost.  Perhaps because of the extra width of
> my rows?

You may try to crosscheck with the new test I've put upper, but
I'm skeptical :/

I think I've unfortunately more than reached my level of
incompetence on that subject, sorry I wasn't able to better
locate your problem :/

>>> From looking at the plans, it seems to be postgres is assuming it will
>>> only
>>> have to sequentially scan 15 rows, which is not true in my case
>>> because column B is not distributed randomly (nor will it be in
>>> production).  Would
>>
>>Why do you say that? The explanation seems to rather tell that it
>>(correctly) assumes that the seqscan would bring up about 1M rows for the
> selected values of A and B, and then it will limit to 15 rows.
>
> I say that because the plan gives a really really low number (3.21) for the
> estimated cost after the limit on sequential scan:
>
> Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and
> this_.POOLEDACTOR_ in ('21') limit 15
> "Limit  (cost=0.00..3.21 rows=15 width=128) (actual
> time=84133.211..84187.247 rows=15 loops=1)"
> "  ->  Seq Scan on jbpm_taskinstance this_  (cost=0.00..234725.85
> rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15 loops=1)"
> "        Filter: ((actorid_ IS NULL) AND ((pooledactor_)::text =
> '21'::text))"
> "Total runtime: 84187.335 ms"
>
> It just seems to me it is not taking into account at all that it might have
> to scan thousands or millions of rows before it gets the 15 rows it needs.

Well, if your have 95% of NULL actorid_ and 10% for each value of
pooledactor_, then it makes sense to assume it will have to fetch
about 150 rows to find the 15 awaited ones...

In the end, if PG doesn't know about data distribution, its
behavior makes total sense to me: 150 rows of width=128 bytes
need only 3 disk pages, so it shouldn't be faster than with a
seqscan, theoretically; however, I am not sure then why on my
simple "foo" test it isn't using the same decision..


Btw, that should not solve your problem, but normally, to help PG
choose indexscan often enough, it's good to reduce
random_page_cost which is 4 by default (a high value for nowadays
servers), increase effective_cache_size to what's available on
your machine, and potentially the shared_buffers which normally
helps for a good deal of matters, performance-wise.

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

"David West"

Дата:

02 сентября 2008 г., 14:09:10

Thanks very much for your help Guillaume, I appreciate you spending time on
this.

> Well, if your have 95% of NULL actorid_ and 10% for each value of
> pooledactor_, then it makes sense to assume it will have to fetch
> about 150 rows to find the 15 awaited ones...

This is only true if the data is randomly distributed, which it isn't
unfortunately.

To any postgres developers reading this, two words: planning hints :-).  In
the face of imperfect information it's not possible to write a perfect
planner, please give us the ability to use the heuristic information we have
as developers, and the planner will never know about.  Even if we force
postgres to use queries that give sub-optimal performance some of the time,
once we can write sufficiently performant queries, we're happy.  In cases
like this where postgres gets it very wrong (and this is just a very simple
query after all), well, we're screwed.

I'm going to try partitioning my database along the pooledactor_ column to
see if I can get reasonable performance for my purposes, even if I can't
reach 10 million rows.

Thanks
David

-----Original Message-----
From: Guillaume Cottenceau [mailto:gc@mnc.ch]
Sent: 02 September 2008 14:56
To: David West
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause breaks query planner?

"David West" <david.west 'at' cusppoint.com> writes:

> INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156
to
> 140279 pages

nothing which would explain so much planning off :/

> Yep, the table is from the jboss jbpm (business process management)
schema.

I've went to that kind of test then, but it didn't help much:

  create table foo ( bar character varying(255), baz character varying(255),
    id_ bigint NOT NULL,
    class_ character(1) NOT NULL,
    version_ integer NOT NULL,
    name_ character varying(255),
    description_ character varying(4000),
    create_ timestamp without time zone,
    start_ timestamp without time zone,
    end_ timestamp without time zone,
    duedate_ timestamp without time zone,
    priority_ integer,
    iscancelled_ boolean,
    issuspended_ boolean,
    isopen_ boolean,
    issignalling_ boolean,
    isblocking_ boolean,
    task_ bigint,
    token_ bigint,
    procinst_ bigint,
    swimlaninstance_ bigint,
    taskmgmtinstance_ bigint,
    processname_ character varying(255) );

  insert into foo ( select generate_series(0, 10000000) / 1000000, case when
random() < 0.05 then 'Today Alcatel-Lucent has announced that Philippe Camus
is appointed non-executive Chairman and Ben Verwaayen is appointed Chief
Executive Officer.' else null end, 1, 'a', 1  );

  create index foobaz on foo(baz);
  create index foobar on foo(bar);
  analyze foo;

Estimated costs still look correct on my side:

  gc=# explain select * from foo where baz is null and bar in ('8') limit
15;
                                       QUERY PLAN


----------------------------------------------------------------------------
--------
   Limit  (cost=0.00..0.46 rows=15 width=1795)
     ->  Index Scan using foobar on foo  (cost=0.00..26311.70 rows=860238
width=1795)
           Index Cond: ((bar)::text = '8'::text)
           Filter: (baz IS NULL)
  (4 rows)

  gc=# set enable_indexscan = off;
  SET
  gc=# explain select * from foo where baz is null and bar in ('8') limit
15;
                                QUERY PLAN
  ----------------------------------------------------------------------
   Limit  (cost=0.00..3.46 rows=15 width=1795)
     ->  Seq Scan on foo  (cost=0.00..198396.62 rows=860238 width=1795)
           Filter: ((baz IS NULL) AND ((bar)::text = '8'::text))
  (3 rows)


>>Btw, it would help if you could reproduce my test scenario and
>>see if PG uses "correctly" the indexscan. It is better to try on
>>your installation, to take care of any configuration/whatever
>>variation which may create your problem.
>
> I have tried your example and I get the same results as you.
>
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                                     QUERY PLAN
>
>
----------------------------------------------------------------------------
> ----
> ---
>  Limit  (cost=0.00..0.53 rows=15 width=154)
>    ->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
> width=15
> 4)
>          Index Cond: (bar = 8)
>          Filter: (baz IS NULL)
> (4 rows)
>
> db=# drop index foobar;
> DROP INDEX
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                              QUERY PLAN
> ---------------------------------------------------------------------
>  Limit  (cost=0.00..2.87 rows=15 width=154)
>    ->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
>          Filter: ((baz IS NULL) AND (bar = 8))
> (3 rows)
>
> It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
> sequential scan.  I wonder why in my real tables the index scan cost is
> higher than the sequential scan cost.  Perhaps because of the extra width
of
> my rows?

You may try to crosscheck with the new test I've put upper, but
I'm skeptical :/

I think I've unfortunately more than reached my level of
incompetence on that subject, sorry I wasn't able to better
locate your problem :/

>>> From looking at the plans, it seems to be postgres is assuming it will
>>> only
>>> have to sequentially scan 15 rows, which is not true in my case
>>> because column B is not distributed randomly (nor will it be in
>>> production).  Would
>>
>>Why do you say that? The explanation seems to rather tell that it
>>(correctly) assumes that the seqscan would bring up about 1M rows for the
> selected values of A and B, and then it will limit to 15 rows.
>
> I say that because the plan gives a really really low number (3.21) for
the
> estimated cost after the limit on sequential scan:
>
> Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and
> this_.POOLEDACTOR_ in ('21') limit 15
> "Limit  (cost=0.00..3.21 rows=15 width=128) (actual
> time=84133.211..84187.247 rows=15 loops=1)"
> "  ->  Seq Scan on jbpm_taskinstance this_  (cost=0.00..234725.85
> rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15
loops=1)"
> "        Filter: ((actorid_ IS NULL) AND ((pooledactor_)::text =
> '21'::text))"
> "Total runtime: 84187.335 ms"
>
> It just seems to me it is not taking into account at all that it might
have
> to scan thousands or millions of rows before it gets the 15 rows it needs.

Well, if your have 95% of NULL actorid_ and 10% for each value of
pooledactor_, then it makes sense to assume it will have to fetch
about 150 rows to find the 15 awaited ones...

In the end, if PG doesn't know about data distribution, its
behavior makes total sense to me: 150 rows of width=128 bytes
need only 3 disk pages, so it shouldn't be faster than with a
seqscan, theoretically; however, I am not sure then why on my
simple "foo" test it isn't using the same decision..


Btw, that should not solve your problem, but normally, to help PG
choose indexscan often enough, it's good to reduce
random_page_cost which is 4 by default (a high value for nowadays
servers), increase effective_cache_size to what's available on
your machine, and potentially the shared_buffers which normally
helps for a good deal of matters, performance-wise.

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

"Matt Smiley"

Дата:

02 сентября 2008 г., 20:13:33

Hi David,

Early in this thread, Pavel suggested:

> you should partial index
>
> create index foo(b) on mytable where a is null;

Rather, you might try the opposite partial index (where a is NOT null) as a replacement for the original unqualified
indexon column A.  This new index will be ignored by the query you're trying to tune, but it'll be available to the
otherqueries that filter to a non-null value of column A.  (Omitting NULL from that index should be ok because you
normallywouldn't want to use an index when 95% of the table's rows match the filtered key.) 

Then you can temporarily disable Seq Scans in your session for just this one query, as follows:

SQL> create table my_table ( a int, b int ) ;
CREATE TABLE

SQL> create index idx_a_not_null on my_table ( a ) where a is not null ;
CREATE INDEX

SQL> create index idx_b on my_table ( b ) ;
CREATE INDEX

SQL> insert into my_table (a, b)
select
  case when random() <= 0.95 then null else i end as a,
  mod(i, 10) as b
from generate_series(1, 10000000) s(i)
;
INSERT 0 10000000

SQL> analyze my_table ;
ANALYZE


Review the statistics available to the optimizer:

SQL> select attname, null_frac, n_distinct, most_common_vals, most_common_freqs, histogram_bounds, correlation
from pg_stats
where tablename = 'my_table'
order by attname
;
 attname | null_frac | n_distinct |   most_common_vals    |                                  most_common_freqs
                        |                                    histogram_bounds                                    |
correlation

---------+-----------+------------+-----------------------+--------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-------------
 a       |     0.945 |         -1 |                       |
                        | {2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768} |
    1 
 b       |         0 |         10 | {9,4,3,1,2,6,8,5,7,0} |
{0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.0983333,0.096,0.09}|
                                                    |    0.127294 
(2 rows)

SQL> select relname, reltuples, relpages from pg_class where relname in ('my_table', 'idx_a_not_null', 'idx_b') order
byrelname ; 
    relname     | reltuples | relpages
----------------+-----------+----------
 idx_a_not_null |    499955 |     1100
 idx_b          |     1e+07 |    21946
 my_table       |     1e+07 |    39492
(3 rows)


Run the test query, first without disabling Seq Scan to show this example reproduces the plan you're trying to avoid.

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.00 rows=929250 width=8) (actual time=0.061..0.159 rows=15 loops=1)
         Filter: ((a IS NULL) AND (b = 5))
 Total runtime: 0.371 ms
(4 rows)


Now run the same query without the Seq Scan option.

SQL> set enable_seqscan = false ;
SET

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..2869913.63 rows=929250 width=8) (actual time=0.072..0.130
rows=15loops=1) 
         Index Cond: (b = 5)
         Filter: (a IS NULL)
 Total runtime: 0.341 ms
(5 rows)

SQL> reset enable_seqscan ;
RESET


Yes, it's unsavory to temporarily adjust a session-level parameter to tune a single query, but I don't know of a less
intrusiveway to avoid the SeqScan.  Here's why I think it might be your simplest option: 

As far as I can tell, the plan nodes for accessing the table/index are unaware of the LIMIT.  The cost of the Limit
nodeis estimated as the cost of its input row-source multiplied by the ratio of requested/returned rows.  For example,
fromthe preceding plan output: 
    2869913.63 for "Index Scan" upper cost * (15 row limit / 929250 returned rows) = 46.326 upper cost for the "Limit"
node
The underlying plan nodes each assume that all the rows matching their filter predicates will be returned up the
pipeline;the cost estimate is only reduced at the Limit node.  A Seq Scan and an Index Scan (over a complete index)
willboth expected the same number of input rows (pg_class.reltuples).  They also produce the same estimated result set,
sinceboth apply the same filters before outputing rows to the next node.  So an Index Scan is always going to have a
highercost estimate than an equivalent Seq Scan returning the same result rows (unless random_page_cost is < 1).
That'swhy I think the planner is always preferring the plan that uses a Seq Scan. 

Hope this helps!

Re: limit clause breaks query planner?

От

Tom Lane

Дата:

02 сентября 2008 г., 21:00:58

"Matt Smiley" <mss@rentrak.com> writes:
>  So an Index Scan is always going to have a higher cost estimate than
>  an equivalent Seq Scan returning the same result rows (unless
>  random_page_cost is < 1).  That's why I think the planner is always
>  preferring the plan that uses a Seq Scan.

If that were the case, we'd never choose an indexscan at all...

It's true that a plain indexscan is not preferred for queries that will
return a large fraction of the table.  However, it should be willing to
use a bitmap scan for this query, given default cost settings (the
default cost settings will cause it to prefer bitmap scan for retrieving
up to about a third of the table, in my experience).  I too am confused
about why it doesn't prefer that choice in the OP's example.  It would
be interesting to alter the random_page_cost setting and see if he gets
different results.

            regards, tom lane

Re: limit clause breaks query planner?

От

"Matt Smiley"

Дата:

04 сентября 2008 г., 06:10:56

"Tom Lane" <tgl@sss.pgh.pa.us> writes:
>  "Matt Smiley" <mss@rentrak.com> writes:
>  >  So an Index Scan is always going to have a higher cost estimate than
>  >  an equivalent Seq Scan returning the same result rows (unless
>  >  random_page_cost is < 1).  That's why I think the planner is always
>  >  preferring the plan that uses a Seq Scan.
>
>  If that were the case, we'd never choose an indexscan at all...

You're right, that was a silly guess.

>  It's true that a plain indexscan is not preferred for queries that will
>  return a large fraction of the table.  However, it should be willing to
>  use a bitmap scan for this query, given default cost settings (the
>  default cost settings will cause it to prefer bitmap scan for retrieving
>  up to about a third of the table, in my experience).  I too am confused
>  about why it doesn't prefer that choice in the OP's example.

It looks like the bitmap scan has a higher cost estimate because the entire bitmap index must be built before beginning
theheap scan and returning rows up the pipeline.  The row-count limit can't be pushed lower than the bitmap-heap-scan
likeit can for the basic index-scan. 

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# set enable_indexscan = false ;
SET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 rows=15 loops=1)
   ->  Bitmap Heap Scan on my_table  (cost=17070.22..69478.96 rows=988217 width=8) (actual time=606.892..606.983
rows=15loops=1) 
         Recheck Cond: (b = 3)
         Filter: (a IS NULL)
         ->  Bitmap Index Scan on idx_b  (cost=0.00..16823.17 rows=1033339 width=0) (actual time=592.657..592.657
rows=1000000loops=1) 
               Index Cond: (b = 3)
 Total runtime: 607.340 ms
(7 rows)


>  It would be interesting to alter the random_page_cost setting and see if he gets
>  different results.

Using an unmodified postgresql.conf, the cost estimate for an index-scan were so much higher than for a seqscan that
random_page_costhad to be set below 0.2 before the index-scan was preferred.  However, it looks like this was mainly
becauseeffective_cache_size was too small.  The planner thought the cache was only 128 MB, and the size of the complete
table+indexwas 39492 + 21946 pages * 8 KB/block = 330 MB.  It makes sense for the cost estimate to be so much higher if
blocksare expected to be repeatedly re-fetched from disk.  I wonder if David's effective_cache_size is too small. 

test_8_3_3=# reset all ;
RESET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.74 rows=988217 width=8) (actual time=0.028..0.138 rows=15 loops=1)
         Filter: ((a IS NULL) AND (b = 3))
 Total runtime: 0.338 ms
(4 rows)

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..3029924.36 rows=988217 width=8) (actual time=0.043..0.100
rows=15loops=1) 
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.308 ms
(5 rows)

test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..161190.65 rows=988217 width=8) (actual time=0.042..0.097 rows=15
loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.307 ms
(5 rows)


Now fix effective_cache_size and try again.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# set effective_cache_size = '500MB' ;
SET
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.78 rows=15 width=8) (actual time=0.051..0.204 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..183361.21 rows=988217 width=8) (actual time=0.043..0.103 rows=15
loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's better, but still not quite low enough cost estimate to beat the seqscan.  Try adjusting random_page_cost again.

test_8_3_3=# set random_page_cost = 3 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.16 rows=15 width=8) (actual time=0.052..0.202 rows=15 loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..142053.51 rows=988217 width=8) (actual time=0.043..0.100 rows=15
loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's enough: index-scan's 142053.51 beats seqscan's 164492.74.  We no longer need to set enable_seqscan=false.

Re: limit clause breaks query planner?

От

Tom Lane

Дата:

04 сентября 2008 г., 15:32:29

"Matt Smiley" <mss@rentrak.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> default cost settings will cause it to prefer bitmap scan for retrieving
>> up to about a third of the table, in my experience).  I too am confused
>> about why it doesn't prefer that choice in the OP's example.

> It looks like the bitmap scan has a higher cost estimate because the
> entire bitmap index must be built before beginning the heap scan and
> returning rows up the pipeline.

Oh, of course.  The LIMIT is small enough to make it look like we can
get the required rows after scanning only a small part of the table,
so the bitmap scan will lose out in the cost comparison because of its
high startup cost.

Ultimately the only way that we could get the right answer would be if
the planner realized that the required rows are concentrated at the end
of the table instead of being randomly scattered.  This isn't something
that is considered at all right now in seqscan cost estimates.  I'm not
sure offhand whether the existing correlation stats would be of use for
it, or whether we'd have to get ANALYZE to gather additional data.

            regards, tom lane

Re: limit clause breaks query planner?

От

Matthew Wakeling

Дата:

04 сентября 2008 г., 15:50:12

On Thu, 4 Sep 2008, Tom Lane wrote:
> Ultimately the only way that we could get the right answer would be if
> the planner realized that the required rows are concentrated at the end
> of the table instead of being randomly scattered.  This isn't something
> that is considered at all right now in seqscan cost estimates.  I'm not
> sure offhand whether the existing correlation stats would be of use for
> it, or whether we'd have to get ANALYZE to gather additional data.

Using the correlation would help, I think, although it may not be the best
solution possible. At least, if the correlation is zero, you could behave
as currently, and if the correlation is 1, then you know (from the
histogram) where in the table the values are.

Matthew

--
X's book explains this very well, but, poor bloke, he did the Cambridge Maths
Tripos...                               -- Computer Science Lecturer

Re: limit clause breaks query planner?

От

Guillaume Cottenceau

Дата:

04 сентября 2008 г., 16:08:22

Matthew Wakeling <matthew 'at' flymine.org> writes:

> On Thu, 4 Sep 2008, Tom Lane wrote:
>> Ultimately the only way that we could get the right answer would be if
>> the planner realized that the required rows are concentrated at the end
>> of the table instead of being randomly scattered.  This isn't something
>> that is considered at all right now in seqscan cost estimates.  I'm not
>> sure offhand whether the existing correlation stats would be of use for
>> it, or whether we'd have to get ANALYZE to gather additional data.
>
> Using the correlation would help, I think, although it may not be the
> best solution possible. At least, if the correlation is zero, you
> could behave as currently, and if the correlation is 1, then you know
> (from the histogram) where in the table the values are.

It seems to me that if the correlation is 0.99[1], and you're
looking for less than 1% of rows, the expected rows may be at the
beginning or at the end of the heap?

Ref:
[1] or even 1, as ANALYZE doesn't sample all the rows?

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

Re: limit clause breaks query planner?

От

Matthew Wakeling

Дата:

04 сентября 2008 г., 16:20:23

On Thu, 4 Sep 2008, Guillaume Cottenceau wrote:
> It seems to me that if the correlation is 0.99, and you're
> looking for less than 1% of rows, the expected rows may be at the
> beginning or at the end of the heap?

Not necessarily. Imagine for example that you have a table with 1M rows,
and one of the fields has unique values from 1 to 1M, and the rows are
ordered in the table by that field. So the correlation would be 1. If you
were to SELECT from the table WHERE the field = 500000 LIMIT 1, then the
database should be able to work out that the rows will be right in the
middle of the table, not at the beginning or end. It should set the
startup cost of a sequential scan to the amount of time required to
sequential scan half of the table.

Of course, this does bring up a point - if the matching rows are
concentrated at the end of the table, the database could perform a
sequential scan backwards, or even a scan from the middle of the table
onwards.

This improvement of course only actually helps if the query has a LIMIT
clause, and presumably would muck up simultaneous sequential scans.

Matthew

--
Picard: I was just paid a visit from Q.
Riker:  Q! Any idea what he's up to?
Picard: No. He said he wanted to be "nice" to me.
Riker:  I'll alert the crew.

Re: limit clause breaks query planner?

От

Tom Lane

Дата:

04 сентября 2008 г., 17:14:48

Guillaume Cottenceau <gc@mnc.ch> writes:
> It seems to me that if the correlation is 0.99[1], and you're
> looking for less than 1% of rows, the expected rows may be at the
> beginning or at the end of the heap?

Right, but if you know the value being searched for, you could then
estimate where it is in the table by consulting the histogram.

Actually, an even easier hack (which would have the nice property of not
needing to know the exact value being searched for), would simply use
the existing cost estimates if the WHERE variables have low correlation
(meaning the random-locations assumption is probably good), but apply
some sort of penalty factor if the correlation is high.  This would
amount to assuming that the universe is perverse and high correlation
will always mean that the rows we want are at the wrong end of the table
not the right one.  But any DBA will tell you that the universe is
indeed perverse ;-)

OTOH, since indexscans get a cost estimate reduction in the presence of
high correlation, we're already biasing the choice in the direction of
indexscans for high correlation.  We may not need to do it twice.
I don't recall whether the OP ever showed us his statistics for the
table in question --- did it even appear to have high correlation?

            regards, tom lane

Re: limit clause breaks query planner?

От

"Scott Carey"

Дата:

04 сентября 2008 г., 18:04:43

On Thu, Sep 4, 2008 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Actually, an even easier hack (which would have the nice property of not
needing to know the exact value being searched for), would simply use
the existing cost estimates if the WHERE variables have low correlation
(meaning the random-locations assumption is probably good), but apply
some sort of penalty factor if the correlation is high. This would
amount to assuming that the universe is perverse and high correlation
will always mean that the rows we want are at the wrong end of the table
not the right one. But any DBA will tell you that the universe is
indeed perverse ;-)

As a user, I prefer this solution. For one, statistics get out of date. A few updates/ inserts (maybe 3% of the table) can greatly
affect an individual value in the histogram, breaking the location assumptions and create a worst case result.
But when 3% of rows change, the correlation cannot change as drastically as the histogram in the worst case. When deciding how to go about using the statistics for a query plan, its best to assume that they are not perfect, and that there has been some change since they were last gathered. This is but one thing that makes the universe perverse :)
The safe assumption is that if the distribution is not random, the expected average number of rows to scan goes up -- this is statistically correct. With perfect correlation and unknown locations the average expected value number of scanned rows would be ~ half the table. Equal likelihood of the first 15 rows as the last 15. Thus, for perfect correlation the average penalty would be half the table scanned, and worst case would be the whole table. In this case, if the statistics are out of date somewhat, the cost estimate is not likely to be more than a factor of 2 off. If one were to use the histogram, the cost estimate could be many orders of magnitude off.
I'm fairly sure the penalty function to use for correlation values can be derived statistically -- or at least approximated well enough for a look up table.
If the histogram is used, the odds of it being wrong or out of date have to be taken into account since the penalty for being incorrect is potentially very large -- its not a gradual increase in cost for a small error, it is a big and uncertain increase. I see the query planner's main goal is to avoid the worst outcomes more than finding the absolute best one. Better to produce 90% optimized queries 100% of the time than make 100% perfect plans 90% of the time and then 10% of the time produce very bad plans. Your suggestion above would do the best job avoiding bad plans but could miss squeezing out that last few % in rare cases that probably don't matter that much.

OTOH, since indexscans get a cost estimate reduction in the presence of
high correlation, we're already biasing the choice in the direction of
indexscans for high correlation. We may not need to do it twice.

Because the full scan is compared against more than just index scans (potentially), it makes sense to adjust each accordingly and independently. Additionally, the index scan cost reduction function and the full table scan cost increase function as correlation changes may have very different, nonlinear 'shapes' between 0 and 1.

I don't recall whether the OP ever showed us his statistics for the
table in question --- did it even appear to have high correlation?

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: limit clause breaks query planner?

От

"Matt Smiley"

Дата:

04 сентября 2008 г., 18:45:56

"Tom Lane" <tgl@sss.pgh.pa.us> writes:
> I'm not sure offhand whether the existing correlation stats would be of use for
> it, or whether we'd have to get ANALYZE to gather additional data.

Please forgive the tangent, but would it be practical to add support for gathering statistics on an arbitrary
expressionassociated with a table, rather than just on materialized columns?  For example: 
    analyze my_tab for expression 'my_func(my_tab.col)' ;
It seems like any time you'd consider using a functional index, this feature would let the planner calculate decent
selectivityestimates for the expression's otherwise opaque data distribution.  The expression might be treated as a
virtualcolumn on the table; not sure if that helps or hurts.  Should I post this question on pgsql-hackers?

Re: limit clause breaks query planner?

От

Gregory Stark

Дата:

04 сентября 2008 г., 19:55:04

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Guillaume Cottenceau <gc@mnc.ch> writes:
>> It seems to me that if the correlation is 0.99[1], and you're
>> looking for less than 1% of rows, the expected rows may be at the
>> beginning or at the end of the heap?
>
> Right, but if you know the value being searched for, you could then
> estimate where it is in the table by consulting the histogram.
>
> Actually, an even easier hack (which would have the nice property of not
> needing to know the exact value being searched for), would simply use
> the existing cost estimates if the WHERE variables have low correlation
> (meaning the random-locations assumption is probably good), but apply
> some sort of penalty factor if the correlation is high.

Fwiw this will have all the same problems our existing uses of the correlation
have. That doesn't mean we shouldn't do it but I would expect it to be
improved along with the other uses when we find a better metric.

I did happen to speak to a statistician the other day and was given some terms
to google. I'll investigate and see if I get anything useful.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!