Обсуждение: Partitioned Tables and ORDER BY

Поиск
Список
Период
Сортировка

Partitioned Tables and ORDER BY

От
Joe Uhl
Дата:
We have been using partitioning for some time with great success.  Up
until now our usage has not included ordering and now that we are trying
to use an ORDER BY against an indexed column a rather significant
shortcoming seems to be kicking in.

Parent table (have cut all but 4 columns to make it easier to post about)
CREATE TABLE people
(
   person_id character varying(36) NOT NULL,
   list_id character varying(36) NOT NULL,
   first_name character varying(255),
   last_name character varying(255),
   CONSTRAINT people_pkey (person_id, list_id)
);

A partition looks like this:
CREATE TABLE people_list1
(
   -- inherited columns omitted
   CONSTRAINT people_list1_list_id_check CHECK (list_id::text =
'the_unique_list_id'::text)
)
INHERITS (people);

Both the parent and the children have indexes on all 4 columns mentioned
above.  The parent table is completely empty.

If I run this query, directly against the partition, performance is
excellent:
select * from people_list1 order by first_name asc limit 50;

The explain analyze output:
  Limit  (cost=0.00..4.97 rows=50 width=34315) (actual
time=49.616..522.464 rows=50 loops=1)
    ->  Index Scan using idx_people_first_name_list1 on people_list1
(cost=0.00..849746.98 rows=8544854 width=34315) (actual
time=49.614..522.424 rows=50 loops=1)
  Total runtime: 522.773 ms

If I run this query, against the parent, performance is terrible:
select * from people where list_id = 'the_unique_list_id' order by
first_name asc limit 50;

The explain analyze output:
  Limit  (cost=726844.88..726845.01 rows=50 width=37739) (actual
time=149864.869..149864.884 rows=50 loops=1)
    ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
(actual time=149864.868..149864.876 rows=50 loops=1)
          Sort Key: public.people.first_name
          Sort Method:  top-N heapsort  Memory: 50kB
          ->  Result  (cost=0.00..442990.94 rows=8544855 width=37739)
(actual time=0.081..125837.332 rows=8545138 loops=1)
                ->  Append  (cost=0.00..442990.94 rows=8544855
width=37739) (actual time=0.079..111103.743 rows=8545138 loops=1)
                      ->  Index Scan using people_pkey on people
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.008..0.008 rows=0
loops=1)
                            Index Cond: ((list_id)::text =
'the_unique_list_id'::text)
                      ->  Seq Scan on people_list1 people
(cost=0.00..442986.67 rows=8544854 width=34315) (actual
time=0.068..109781.308 rows=8545138 loops=1)
                            Filter: ((list_id)::text =
'the_unique_list_id'::text)
  Total runtime: 149865.411 ms

Just to show that partitions are setup correctly, this query also has
excellent performance:
select * from people where list_id = 'the_unique_list_id' and first_name
= 'JOE';

Here is the explain analyze for that:
  Result  (cost=0.00..963.76 rows=482 width=37739) (actual
time=6.031..25.394 rows=2319 loops=1)
    ->  Append  (cost=0.00..963.76 rows=482 width=37739) (actual
time=6.029..21.340 rows=2319 loops=1)
          ->  Index Scan using idx_people_first_name on people
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.010..0.010 rows=0
loops=1)
                Index Cond: ((first_name)::text = 'JOE'::text)
                Filter: ((list_id)::text = 'the_unique_list_id'::text)
          ->  Bitmap Heap Scan on people_list1 people
(cost=8.47..959.49 rows=481 width=34315) (actual time=6.018..20.968
rows=2319 loops=1)
                Recheck Cond: ((first_name)::text = 'JOE'::text)
                Filter: ((list_id)::text = 'the_unique_list_id'::text)
                ->  Bitmap Index Scan on idx_people_first_name_list1
(cost=0.00..8.35 rows=481 width=0) (actual time=5.566..5.566 rows=2319
loops=1)
                      Index Cond: ((first_name)::text = 'JOE'::text)
  Total runtime: 25.991 ms


This is Postgres 8.3.7 on the 2.6.28 kernel with constraint_exclusion
on.  Our partitions are in the 8 - 15 million row range.

I realize one option is to hit the partition directly instead of hitting
the parent table with the check constraint in the WHERE clause, but up
until now we have been able to avoid needing partition-awareness in our
code.  Perhaps we have hit upon something that will require breaking
that cleanliness but wanted to see if there were any workarounds.

Re: Partitioned Tables and ORDER BY

От
Michal Szymanski
Дата:
We have similar problem and now we are try to find solution. When you
execute query on partion there is no sorting - DB use index to
retrieve data and if you need let say 50 rows it reads 50 rows using
index. But when you execute on parent table query optymizer do this:

  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
(actual time=149864.868..149864.876 rows=50 loops=1)

it means 8544855 rows should be sorted and it takes long minutes. We
have simpler situation than you and I will try to find solution
tommorow :)

Michal Szymanski
http://blog.szymanskich.net
http://techblog.freeconet.pl/

Re: Partitioned Tables and ORDER BY

От
Michal Szymanski
Дата:

Re: Partitioned Tables and ORDER BY

От
Joe Uhl
Дата:
This seems like a pretty major weakness in PostgreSQL partitioning.  I
have essentially settled on not being able to do queries against the
parent table when I want to order the results.  Going to have to use a
Hibernate interceptor or something similar to rewrite the statements so
they hit specific partitions, will be working on this in the coming week.

This weakness is a bummer though as it makes partitions a lot less
useful.  Having to hit specific child tables by name isn't much
different than just creating separate tables and not using partitions at
all.

Michal Szymanski wrote:
> I've described our problem here
> http://groups.google.pl/group/pgsql.performance/browse_thread/thread/54a7419381bd1565?hl=pl#
>   Michal Szymanski
> http://blog.szymanskich.net
> http://techblog.freeconet.pl/
>
>
>

Re: Partitioned Tables and ORDER BY

От
Grzegorz Jaśkiewicz
Дата:


On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061@gmail.com> wrote:
We have similar problem and now we are try to find solution. When you
execute query on partion there is no sorting - DB use index to
retrieve data and if you need let say 50 rows it reads 50 rows using
index. But when you execute on parent table query optymizer do this:

 ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
(actual time=149864.868..149864.876 rows=50 loops=1)

it means 8544855 rows should be sorted and it takes long minutes.
The figures in first parenthesis are estimates, not the actual row count. 
If you think it is too low, increase statistic target for that column. 

We
have simpler situation than you and I will try to find solution
tommorow :)

Michal Szymanski
http://blog.szymanskich.net
http://techblog.freeconet.pl/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
GJ

Re: Partitioned Tables and ORDER BY

От
Robert Haas
Дата:
2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>
>
> On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061@gmail.com>
> wrote:
>>
>> We have similar problem and now we are try to find solution. When you
>> execute query on partion there is no sorting - DB use index to
>> retrieve data and if you need let say 50 rows it reads 50 rows using
>> index. But when you execute on parent table query optymizer do this:
>>
>>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> (actual time=149864.868..149864.876 rows=50 loops=1)
>>
>> it means 8544855 rows should be sorted and it takes long minutes.
>
> The figures in first parenthesis are estimates, not the actual row count.
> If you think it is too low, increase statistic target for that column.

It's true that the figures in parentheses are estimates, it's usually
bad when the estimated and actual row counts are different by 5 orders
of magnitude, and that large of a difference is not usually fixed by
increasing the statistics target.

...Robert

Re: Partitioned Tables and ORDER BY

От
Grzegorz Jaśkiewicz
Дата:


2009/10/19 Robert Haas <robertmhaas@gmail.com>
2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>
>
> On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061@gmail.com>
> wrote:
>>
>> We have similar problem and now we are try to find solution. When you
>> execute query on partion there is no sorting - DB use index to
>> retrieve data and if you need let say 50 rows it reads 50 rows using
>> index. But when you execute on parent table query optymizer do this:
>>
>>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> (actual time=149864.868..149864.876 rows=50 loops=1)
>>
>> it means 8544855 rows should be sorted and it takes long minutes.
>
> The figures in first parenthesis are estimates, not the actual row count.
> If you think it is too low, increase statistic target for that column.

It's true that the figures in parentheses are estimates, it's usually
bad when the estimated and actual row counts are different by 5 orders
of magnitude, and that large of a difference is not usually fixed by
increasing the statistics target.

I thought that this means, that either analyze was running quite a long time ago, or that the value didn't made it to histogram. In the later case, that's mostly case when your statistic target is low, or that the value is really 'rare'.
 


--
GJ

Re: Partitioned Tables and ORDER BY

От
Craig James
Дата:
Joe Uhl wrote:
> This seems like a pretty major weakness in PostgreSQL partitioning.  I
> have essentially settled on not being able to do queries against the
> parent table when I want to order the results.  Going to have to use a
> Hibernate interceptor or something similar to rewrite the statements so
> they hit specific partitions, will be working on this in the coming week.
>
> This weakness is a bummer though as it makes partitions a lot less
> useful.  Having to hit specific child tables by name isn't much
> different than just creating separate tables and not using partitions at
> all.

I wonder if the "offset 0" trick would work here?  I was told (for a different question) that the planner can't merge
levelsif there's an offset or limit on a subquery.  So you might be able to do something like this: 

  select ... from (select ...  offset 0) as foo order by ...

In other words, put your primary query as a sub-select without the sort criterion, with the "offset 0" as a sort of
roadblockthat the planner can't get past.  Then the outer select does the sorting, without affecting the plan for the
innerselect. 

Craig

Re: Partitioned Tables and ORDER BY

От
Robert Haas
Дата:
2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>
>
> 2009/10/19 Robert Haas <robertmhaas@gmail.com>
>>
>> 2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>> >
>> >
>> > On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061@gmail.com>
>> > wrote:
>> >>
>> >> We have similar problem and now we are try to find solution. When you
>> >> execute query on partion there is no sorting - DB use index to
>> >> retrieve data and if you need let say 50 rows it reads 50 rows using
>> >> index. But when you execute on parent table query optymizer do this:
>> >>
>> >>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> >> (actual time=149864.868..149864.876 rows=50 loops=1)
>> >>
>> >> it means 8544855 rows should be sorted and it takes long minutes.
>> >
>> > The figures in first parenthesis are estimates, not the actual row
>> > count.
>> > If you think it is too low, increase statistic target for that column.
>>
>> It's true that the figures in parentheses are estimates, it's usually
>> bad when the estimated and actual row counts are different by 5 orders
>> of magnitude, and that large of a difference is not usually fixed by
>> increasing the statistics target.
>>
> I thought that this means, that either analyze was running quite a long time
> ago, or that the value didn't made it to histogram. In the later case,
> that's mostly case when your statistic target is low, or that the value is
> really 'rare'.

It's possible, but (1) most people are running autovacuum these days,
in which case this isn't likely to occur and (2) most people do not
manage to expand the size of a table by five orders of magnitude
without analyzing it.  Generally these kinds of problems come from bad
selectivity estimates.

In this case, though, I think that the actual number is less than the
estimate because of the limit node immediately above.  The problem is
just that a top-N heapsort requires scanning the entire set of rows,
and scanning 8 million rows is slow.

...Robert

Re: Partitioned Tables and ORDER BY

От
Scott Carey
Дата:

On 10/19/09 12:10 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:

> 2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>>
>>
>> 2009/10/19 Robert Haas <robertmhaas@gmail.com>
>>>
>>> 2009/10/19 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>>>>
>>>>
>>>> On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061@gmail.com>
>>>> wrote:
>>>>>
>>>>> We have similar problem and now we are try to find solution. When you
>>>>> execute query on partion there is no sorting - DB use index to
>>>>> retrieve data and if you need let say 50 rows it reads 50 rows using
>>>>> index. But when you execute on parent table query optymizer do this:
>>>>>
>>>>>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>>>>> (actual time=149864.868..149864.876 rows=50 loops=1)
>>>>>
>>>>> it means 8544855 rows should be sorted and it takes long minutes.
>>>>
>>>> The figures in first parenthesis are estimates, not the actual row
>>>> count.
>>>> If you think it is too low, increase statistic target for that column.
>>>
>>> It's true that the figures in parentheses are estimates, it's usually
>>> bad when the estimated and actual row counts are different by 5 orders
>>> of magnitude, and that large of a difference is not usually fixed by
>>> increasing the statistics target.
>>>
>> I thought that this means, that either analyze was running quite a long time
>> ago, or that the value didn't made it to histogram. In the later case,
>> that's mostly case when your statistic target is low, or that the value is
>> really 'rare'.
>
> It's possible, but (1) most people are running autovacuum these days,
> in which case this isn't likely to occur and (2) most people do not
> manage to expand the size of a table by five orders of magnitude
> without analyzing it.  Generally these kinds of problems come from bad
> selectivity estimates.
>

Also, with partitioning the "combined" statistics of multiple tables is just
plain wrong much of the time.  It makes some worst case assumptions about
the number of distinct values when merging multiple table results (even with
100% overlap and all unique values in the stats columns), and at least in
8.3 (haven't looked in 8.4) the row width estimate is the max of all the
child tables, not an average or weighted average.  So even with 100% perfect
statistics on each individual table, do a scan over a few dozen partitions
(or a couple hundred) and the summary stats can be way off.  The tendency is
to sometimes significantly overestimate the number of distinct values.



> In this case, though, I think that the actual number is less than the
> estimate because of the limit node immediately above.  The problem is
> just that a top-N heapsort requires scanning the entire set of rows,
> and scanning 8 million rows is slow.
>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>