Обсуждение: index usage on queries on inherited tables

Поиск
Список
Период
Сортировка

index usage on queries on inherited tables

От
Joseph Shraibman
Дата:
When I do a query on a table with child tables on certain queries pg
uses indexes and on others it doesn't. Why does this happen? For example:


[local]:playpen=> explain analyze select * from vis where id > 10747 ;
                                                               QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=4.29..115.11 rows=325 width=634) (actual
time=0.063..0.116 rows=5 loops=1)
   ->  Append  (cost=4.29..115.11 rows=325 width=634) (actual
time=0.053..0.090 rows=5 loops=1)
         ->  Bitmap Heap Scan on vis  (cost=4.29..23.11 rows=5
width=948) (actual time=0.051..0.058 rows=5 loops=1)
               Recheck Cond: (id > 10747)
               ->  Bitmap Index Scan on vis_pkey  (cost=0.00..4.29
rows=5 width=0) (actual time=0.037..0.037 rows=5 loops=1)
                     Index Cond: (id > 10747)
         ->  Seq Scan on vis_for_seg_1_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_4_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_66_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_69_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_79_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_80_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_82_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id > 10747)
         ->  Seq Scan on vis_for_seg_87_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id > 10747)
 Total runtime: 0.724 ms
(23 rows)

Time: 5.804 ms
[local]:playpen=> explain analyze select * from vis where id = 10747 ;

QUERY
PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..74.41 rows=9 width=664) (actual time=0.060..0.503
rows=1 loops=1)
   ->  Append  (cost=0.00..74.41 rows=9 width=664) (actual
time=0.053..0.493 rows=1 loops=1)
         ->  Index Scan using vis_pkey on vis  (cost=0.00..8.27 rows=1
width=948) (actual time=0.051..0.055 rows=1 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_1_2011_03_pkey on
vis_for_seg_1_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.122..0.122 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_4_2011_03_pkey on
vis_for_seg_4_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_66_2011_03_pkey on
vis_for_seg_66_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_69_2011_03_pkey on
vis_for_seg_69_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_79_2011_03_pkey on
vis_for_seg_79_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_80_2011_03_pkey on
vis_for_seg_80_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_82_2011_03_pkey on
vis_for_seg_82_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.049..0.049 rows=0 loops=1)
               Index Cond: (id = 10747)
         ->  Index Scan using vis_for_seg_87_2011_03_pkey on
vis_for_seg_87_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
 Total runtime: 1.110 ms
(21 rows)

[local]:playpen=> select version();

version
------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-48), 32-bit
(1 row)


Re: index usage on queries on inherited tables

От
Robert Haas
Дата:
On Fri, Apr 1, 2011 at 2:41 AM, Joseph Shraibman <jks@selectacast.net> wrote:
> When I do a query on a table with child tables on certain queries pg
> uses indexes and on others it doesn't. Why does this happen? For example:
>
>
> [local]:playpen=> explain analyze select * from vis where id > 10747 ;
>                                                               QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=4.29..115.11 rows=325 width=634) (actual
> time=0.063..0.116 rows=5 loops=1)
>   ->  Append  (cost=4.29..115.11 rows=325 width=634) (actual
> time=0.053..0.090 rows=5 loops=1)
>         ->  Bitmap Heap Scan on vis  (cost=4.29..23.11 rows=5
> width=948) (actual time=0.051..0.058 rows=5 loops=1)
>               Recheck Cond: (id > 10747)
>               ->  Bitmap Index Scan on vis_pkey  (cost=0.00..4.29
> rows=5 width=0) (actual time=0.037..0.037 rows=5 loops=1)
>                     Index Cond: (id > 10747)
>         ->  Seq Scan on vis_for_seg_1_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_4_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_66_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_69_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_79_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_80_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_82_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_87_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>  Total runtime: 0.724 ms
> (23 rows)
>
> Time: 5.804 ms
> [local]:playpen=> explain analyze select * from vis where id = 10747 ;
>
> QUERY
> PLAN
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.00..74.41 rows=9 width=664) (actual time=0.060..0.503
> rows=1 loops=1)
>   ->  Append  (cost=0.00..74.41 rows=9 width=664) (actual
> time=0.053..0.493 rows=1 loops=1)
>         ->  Index Scan using vis_pkey on vis  (cost=0.00..8.27 rows=1
> width=948) (actual time=0.051..0.055 rows=1 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_1_2011_03_pkey on
> vis_for_seg_1_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.122..0.122 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_4_2011_03_pkey on
> vis_for_seg_4_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_66_2011_03_pkey on
> vis_for_seg_66_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_69_2011_03_pkey on
> vis_for_seg_69_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_79_2011_03_pkey on
> vis_for_seg_79_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_80_2011_03_pkey on
> vis_for_seg_80_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_82_2011_03_pkey on
> vis_for_seg_82_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.049..0.049 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_87_2011_03_pkey on
> vis_for_seg_87_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>  Total runtime: 1.110 ms
> (21 rows)
>
> [local]:playpen=> select version();
>
> version
> ------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
> 20080704 (Red Hat 4.1.2-48), 32-bit
> (1 row)

In the first case, PostgreSQL evidently thinks that using the indexes
will be slower than just ignoring them.  You could find out whether
it's right by trying it with enable_seqscan=off.

If it turns out that using the indexes really is better, then you
probably want to adjust random_page_cost and seq_page_cost.  The
defaults assume a mostly-not-cached database, so if your database is
heavily or completely cached you might need significantly lower
values.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: index usage on queries on inherited tables

От
Joseph Shraibman
Дата:
On 04/27/2011 04:32 PM, Robert Haas wrote:
> In the first case, PostgreSQL evidently thinks that using the indexes
> will be slower than just ignoring them.  You could find out whether
> it's right by trying it with enable_seqscan=off.

My point is that this is just a problem with inherited tables.  It
should be obvious to postgres that few rows are being returned, but in
the inherited tables case it doesn't use indexes.  This was just an
example.  In a 52 gig table I have a "select id from table limit 1 order
by id desc" returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.

Re: index usage on queries on inherited tables

От
Samuel Gendler
Дата:


On Wed, Apr 27, 2011 at 2:11 PM, Joseph Shraibman <jks@selectacast.net> wrote:
On 04/27/2011 04:32 PM, Robert Haas wrote:
> In the first case, PostgreSQL evidently thinks that using the indexes
> will be slower than just ignoring them.  You could find out whether
> it's right by trying it with enable_seqscan=off.

My point is that this is just a problem with inherited tables.  It
should be obvious to postgres that few rows are being returned, but in
the inherited tables case it doesn't use indexes.  This was just an
example.  In a 52 gig table I have a "select id from table limit 1 order
by id desc" returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.


If I'm understanding correctly, this kind of obviates the utility of partitioning if you structure a warehouse in a traditional manner.  Assuming a fact table partitioned by time, but with foreign keys to a time dimension, it is now not possible to gain any advantage from the partitioning if selecting on columns in the time dimension.

"select * from fact_table f join time_dimension t on f.time_id = t.time_id where t.quarter=3 and t.year = 2010" will scan all partitions of the fact table despite the fact that all of the rows would come from 3 partitions, assuming a partitioning schema that uses one partition for each month.  

I use a time id that is calculable from the from the timestamp so it doesn't need to be looked up, and partitioning on time_id directly is easy enough to handle, but if I'm understanding the problem, it sounds like nothing short of computing the appropriate time ids before issuing the query and then including a 'where f.time_id between x and y' clause to the query will result in the partitions being correctly excluded.  Is that what people are doing to solve this problem?  The alternative is to leave a timestamp column in the fact table (something I tend to do since it makes typing ad-hoc queries in psql much easier) and partition on that column and then always include a where clause for that column that is at least as large as the requested row range.  Both result in fairly ugly queries, though I can certainly see how I might structure my code to always build queries which adhere to this.

I'm just in the process of designing a star schema for a project and was intending to use exactly the structure I described at the top of the email. Is there a postgres best-practices for solving this problem? There's no way I can get away without partitioning.  I'm looking at a worst case table of 100,000 rows being written every 5 minutes, 24x7 - 29 million rows per day, a billion rows per month - with most queries running over a single month or comparing same months from differing years and quarters - so a month based partitioning.  Normal case is closer to 10K rows per 5 minutes.

Suggestions?

--sam


Re: index usage on queries on inherited tables

От
Greg Smith
Дата:
Joseph Shraibman wrote:
> In a 52 gig table I have a "select id from table limit 1 order
> by id desc" returns instantly, but as soon as you declare a child table
> it tries to seq scan all the tables.
>

This is probably the limitation that's fixed in PostgreSQL 9.1 by this
commit (following a few others leading up to it):
http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php

There was a good example showing what didn't work as expected before
(along with an earlier patch that didn't everything the larger 9.1
improvement does) at
http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php ;
"ORDER BY x DESC LIMIT 1" returns the same things as MAX(x).

It's a pretty serious issue with the partitioning in earlier versions.
I know of multiple people, myself included, who have been compelled to
apply this change to an earlier version of PostgreSQL to make larger
partitioned databases work correctly.  The other option is to manually
decompose the queries into ones that target each of the child tables
individually, then combine the results, which is no fun either.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: index usage on queries on inherited tables

От
Robert Haas
Дата:
On Apr 27, 2011, at 11:11 PM, Joseph Shraibman <jks@selectacast.net> wrote:
> On 04/27/2011 04:32 PM, Robert Haas wrote:
>> In the first case, PostgreSQL evidently thinks that using the indexes
>> will be slower than just ignoring them.  You could find out whether
>> it's right by trying it with enable_seqscan=off.
>
> My point is that this is just a problem with inherited tables.  It
> should be obvious to postgres that few rows are being returned, but in
> the inherited tables case it doesn't use indexes.  This was just an
> example.  In a 52 gig table I have a "select id from table limit 1 order
> by id desc" returns instantly, but as soon as you declare a child table
> it tries to seq scan all the tables.

Oh, sorry, I must have misunderstood. As Greg says, this is fixed in 9.1.

...Robert