Обсуждение: Estimating the execution cost of a query in a partitioned schema: Weird execution plans and totally wrong execution costs (pg_class and pg_statistic)

Поиск
Список
Период
Сортировка
Hello,

I've taken part in some interesting research involving an estimation of the execution cost of a query by properly updating the statistics stored in pg_class and pg_statistic. The idea is to perform range partitioning of tables in a schema and then update the statistics in the two system catalogs mentioned above without transferring any data into the new schema. This provides fast algorithm execution times since we use some global optimization methods.

Using EXPLAIN we have successfully managed to get very similar execution plans and costs by the planner as if there were data in the newly created partitions. This method works just fine as long as the queries to be estimated consist of a join of no more than 2 tables.
When we try to estimate a query's cost that contains a join between 3 or more tables we get huge costs and wrong plans (in the rank of millions of cost units). When data is actually loaded into all of the partitions, the cost does not exceed a few thousand cost units.

The strategy to determine the partitions that the query should be executed against is using a simple interval tree to find overlapping intervals of the range values. For each partitioned table, a new master table is created and statistics are set to a "zero" value, meaning that the master (parent) table contains no data, while the statistics of the child tables that inherit the masters are updated properly.

We use the Star Schema Benchmark (a modification of TPC-H) and assume uniformity of data.

The question: Is the following list of updated statistics enough to fool the planner into generating accurate execution plans as if there were data in those partitions?

I've been searching around for a few weeks but I'm getting nowhere. Here's what we update:
  • pg_statistic
    • stawidth
    • staop
    • stanumbersN
    • stakindN
    • stavaluesN
  • pg_class
    • relfilenode (we create an empty file and fool the file system about its size, so the planner actually sees that there is data physically stored on disk, although the relation file is empty)
    • reltuples
    • relpages
This probably is a both tough and demanding question, but I guess this is the right place to ask.

Best regards,
Nino Arsov
Nino Arsov <nino.arsov@gmail.com> writes:
> Using EXPLAIN we have successfully managed to get very similar execution
> plans and costs by the planner as if there were data in the newly created
> partitions. This method works just fine as long as the queries to be
> estimated consist of a join of no more than 2 tables.
> When we try to estimate a query's cost that contains a join between 3 or
> more tables we get huge costs and wrong plans (in the rank of millions of
> cost units). When data is actually loaded into all of the partitions, the
> cost does not exceed a few thousand cost units.

Have you modified the planner at all?  Because it's a bit hard to see how
inserting faked-up statistics as described would work properly in a 2-way
join but not in a 3-way join.  A bug in some hack or other would be a
much more plausible explanation.

I'm also curious about exactly how you're inserting new data into
pg_statistic --- the "anyarray" columns that are used there are not
readily modifiable from SQL.

(This is pretty off-topic for pgsql-admin, btw.)

            regards, tom lane



---------- Forwarded message ----------
From: Nino Arsov <nino.arsov@gmail.com>
Date: Wed, Feb 4, 2015 at 12:24 PM
Subject: Re: [ADMIN] Estimating the execution cost of a query in a partitioned schema: Weird execution plans and totally wrong execution costs (pg_class and pg_statistic)
To: Tom Lane <tgl@sss.pgh.pa.us>


Hi Mr. Lane,

I guess I have misinterpreted the "admin" part of the title. Is there a way for this thread to be moved to pgsql-hackers (which I think is more appropriate).

On the pg_statistic data insertion: We did have a little struggle with that, but we do that as follows;
To insert ["AB", "BC", "CD"] to stavaluesN, we use 
array_in(anyarray_out('{"AB", "BC", "CD"}'::character[]), 'character'::regtype, 1). For numeric data types, only the corresponding type is changed in the argument list.

No modifications have been done to the source code of the planner nor the parameters that can be set using SQL. I'm using a clean, "out-of-the-box" 9.3 installation under Linux.

I have done some analysis last night and it's a bit hard to draw a conclusion for the problem. I guess the large cost offset when a multi-way join is present in a query is a result of an additive error. 
I've managed to get exactly the same execution plan from the planner without having any modifications done to the planner. I've tracked the problem in the row estimation phase of each plan node. 
Namely, "Plan Rows" has a value of 1 in many nodes of the plan, like Hash Join, Seq Scan, etc. I've checked pg_statistics and pg_class and everything looks fine (as far as I can see).
I'll post an example query from Star Schema Benchmark and the associated execution plan. I've also noticed that "Plan Rows" gets to be 1 in some two-way joins besides this four-way join.

select c_nation, s_nation, d_year, sum(lo_revenue) as revenue
from customer, lineorder, supplier, dates
where lo_custkey=c_custkey 
and lo_suppkey = s_suppkey
and lo_orderdate = d_datekey
and c_region = 'ASIA' and s_region = 'ASIA'
and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, revenue desc;

This gets executed against all valid partitions of the four tables that contain the data subjected to the constraints in the WHERE clause. You can examine the plans below:
  1. Estimated execution plan: http://codebeautify.org/jsonviewer/fa36b7
  2. Actual execution plan: http://codebeautify.org/jsonviewer/a370ce
Another point to take into account is the assumption of uniformity of data. I've checked the histograms of the non-partitioned tables against their partitioned counterparts and the shape of the histogram is basically the same with a small deviation of the bar height due to the range partitioning, which is inevitable.

The changes to the system catalogs remain the same as before. The only difference is the "Plan Rows" = 1 and it seems it's hard to track the problem. In the meantime, if I manage to resolve this issue, I'll post the solution here, as others might also find it useful.

Best regards,
Nino Arsov


On Tue, Feb 3, 2015 at 4:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nino Arsov <nino.arsov@gmail.com> writes:
> Using EXPLAIN we have successfully managed to get very similar execution
> plans and costs by the planner as if there were data in the newly created
> partitions. This method works just fine as long as the queries to be
> estimated consist of a join of no more than 2 tables.
> When we try to estimate a query's cost that contains a join between 3 or
> more tables we get huge costs and wrong plans (in the rank of millions of
> cost units). When data is actually loaded into all of the partitions, the
> cost does not exceed a few thousand cost units.

Have you modified the planner at all?  Because it's a bit hard to see how
inserting faked-up statistics as described would work properly in a 2-way
join but not in a 3-way join.  A bug in some hack or other would be a
much more plausible explanation.

I'm also curious about exactly how you're inserting new data into
pg_statistic --- the "anyarray" columns that are used there are not
readily modifiable from SQL.

(This is pretty off-topic for pgsql-admin, btw.)

                        regards, tom lane