Re: A wrong index choose issue because of inaccurate statistics

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: A wrong index choose issue because of inaccurate statistics
Дата	9 июня 2020 г. 06:33:03
Msg-id	CAKU4AWrRA4oXrORsCvXgy1r3oC_eGZ5DZBR_TBc0W+O9z_fcaw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: A wrong index choose issue because of inaccurate statistics (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jun 5, 2020 at 2:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 1 Jun 2020 at 01:24, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> The one-line fix describe the exact idea in my mind:
>
> +++ b/src/backend/optimizer/path/costsize.c
> @@ -730,6 +730,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
>
> cpu_run_cost += cpu_per_tuple * tuples_fetched;
>
> + /*
> + * To make the planner more robust to handle some inaccurate statistics
> + * issue, we will add a extra cost to qpquals so that the less qpquals
> + * the lower cost it has.
> + */
> + cpu_run_cost += 0.01 * list_length(qpquals);
> +
>
> This change do fix the issue above, but will it make some other cases worse? My
> answer is no based on my current knowledge, and this is most important place
> I want to get advised. The mainly impact I can figure out is: it not only
> change the cost difference between (a, b) and (a, c) it also cause the cost
> difference between Index scan on (a, c) and seq scan. However the
> cost different between index scan and seq scan are huge by practice so
> I don't think this impact is harmful.

Didn't that idea already get shot down in the final paragraph on [1]?

Thanks for chiming in. I treat this as I didn't describe my idea clearly enough then

both Tomas and Tom didn't spend much time to read it (no offense, and I

understand they need to do lots of things every day), so I re-summarize the

issue to make it easier to read.

In Tomas's reply, he raises concerns about how to fix the issue, since we

don't know how much it errored 1%, 10% and so on, so I emphasized I don't

touch that part actually. Even the wrong estimation still plays a bad role on

later join, but that would not be the issue I would fix here.

I understand that you wish to increase the cost by some seemingly
innocent constant to fix your problem case. Here are my thoughts
about that: Telling lies is not really that easy to pull off. Bad
liers think it's easy and good ones know it's hard. The problem is
that the lies can start small, but then at some point the future you
must fashion some more lies to account for your initial lie. Rinse and
repeat that a few times and before you know it, your course is set
well away from the point of truth. I feel the part about "rinse and
repeat" applies reasonably well to how join costing works. The lie is
likely to be amplified as the join level gets deeper.

I agree with that to some extent. However we can just provide more options to

users. At the same time, I still believe we should provide such options very carefully.

Unsure what the exact add_path logic would be. Perhaps a GUC would
need to assist with the decision there. Likewise, with
NestedLoopPaths which have a large number of join quals, the risk
factor could go up a bit with those so that we take a stronger
consideration for hash or merge joins instead.

I probably underestimated the impacts for a large number of join quals.

This looks like a weakness we can't ignore confomforably, so I checked

the code further, maybe we don't need a risk factor for this case.

For query WHERE a = 1 AND b = 2, both Selectivity(a = 1) and

Selectivity(b = 2) are greater than 0 even the statistics are stale enough,

so the IndexSelectivity is greater than 0 as well. Based on this,

IndexSelectivity(A, B) should be less than IndexSelectivity(A, C) for the

above query However they still generate the same cost because of the

below code. (genericcostestimate and btcostestimate)

numIndexTuples = indexSelectivity * index->rel->tuples;
numIndexTuples = rint(numIndexTuples / num_sa_scans);

if (numIndexTuples < 1.0)

numIndexTuples = 1.0;

later numIndexTuples is used to calculate cost. The above code eats the

difference of indexSelectivity.

Many places say we need to "round to integer" but I am still not figuring out

why it is a must. so If we can "round to integer" just after the IndexCost

is calculated, the issue can be fixed as well.

The attached is the patch in my mind, since this patch may lower the index

costs if the numIndexTuples < 1.0, so it makes the optimizer prefer to use

nest loop rather than a hash join if the loop is small, which is a common

case in our regression test where we don't have much data, so there are

several changes like that.

Another impact I found in the regression test is that optimizer choose an

IndexScan of a conditional index rather than IndexOnlyScan a normal index.

I checked the code and didn't find anything wrong, so I'd like to say that is

because the data is too small.

I also tested TPC-H workload, Query-12 changed hash join to nest loop,

The execution time changed from1605.118ms to 699.032ms.

> the root cause of this situation is because IndexSelectivity = 0.

This is wrong. I got the 0 by elog(INFO, "%f", IndexSelectivity).

that reported 0 while the value is pretty small.

Best Regards

Andy Fan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 09 июня 2020 г., 06:31:55
Сообщение: Re: valgrind versus pg_atomic_init()

Следующее

От: Tom Lane
Дата: 09 июня 2020 г., 06:41:21
Сообщение: Re: Intermittent test plan change in "privileges" test on BF animal prion

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: A wrong index choose issue because of inaccurate statistics

Вложения

Предыдущее

Следующее