Re: Bad Estimate for multi tenant database queries

Поиск

Список

Период

Сортировка

От	Peter Grman
Тема	Re: Bad Estimate for multi tenant database queries
Дата	4 сентября 2019 г. 01:10:43
Msg-id	CACF7Wx0hwFVdd2bBTB6BF2h6bfLemGuHoDnHS0SOtz=oSF6mnA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Bad Estimate for multi tenant database queries (Michael Lewis <mlewis@entrata.com>)
Список	pgsql-general

Дерево обсуждения

Hello Michael,

I digged a bit deeper and found an even simpler query, which can perfectly embody that problem:

explain (ANALYZE, FORMAT text)
select *
from "Reservation"."Reservation" r
join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
where r."AccountCode" = 'OXHV' and t."AccountCode" = 'OXHV'

Nested Loop (cost=448.56..6918.67 rows=54 width=657) (actual time=1.207..26.874 rows=9322 loops=1)
-> Bitmap Heap Scan on "TimeSliceDefinition" t (cost=4.39..39.99 rows=14 width=131) (actual time=0.013..0.029 rows=14 loops=1)
Recheck Cond: ("AccountCode" = 'OXHV'::text)
Heap Blocks: exact=7
-> Bitmap Index Scan on "IX_TimeSliceDefinition_AccountCode_EntityId" (cost=0.00..4.39 rows=14 width=0) (actual time=0.009..0.009 rows=14 loops=1)
Index Cond: ("AccountCode" = 'OXHV'::text)
-> Bitmap Heap Scan on "Reservation" r (cost=444.17..491.21 rows=12 width=526) (actual time=1.021..1.755 rows=666 loops=14)
Recheck Cond: (("TimeSliceDefinitionId" = t."Id") AND ("AccountCode" = 'OXHV'::text))
Heap Blocks: exact=4858
-> BitmapAnd (cost=444.17..444.17 rows=12 width=0) (actual time=0.980..0.980 rows=0 loops=14)
-> Bitmap Index Scan on "IX_Reservation_TimeSliceDefinitionId" (cost=0.00..13.82 rows=187 width=0) (actual time=0.057..0.057 rows=692 loops=14)
Index Cond: ("TimeSliceDefinitionId" = t."Id")
-> Bitmap Index Scan on "IX_Reservation_AccountCode_EntityId" (cost=0.00..427.72 rows=9507 width=0) (actual time=0.980..0.980 rows=9327 loops=13)
Index Cond: ("AccountCode" = 'OXHV'::text)
Planning Time: 0.353 ms
Execution Time: 27.311 ms

Above the query with wrong estimates (factor ~200x off) and below the query with correct estimates:

explain (ANALYZE, FORMAT text)
select *
from "Reservation"."Reservation" r
join "Rates"."TimeSliceDefinition" t on r."TimeSliceDefinitionId" = t."Id"
where r."AccountCode" = 'OXHV'

Hash Join (cost=557.83..12519.98 rows=9507 width=657) (actual time=3.290..15.174 rows=9322 loops=1)
Hash Cond: (r."TimeSliceDefinitionId" = t."Id")
-> Bitmap Heap Scan on "Reservation" r (cost=430.10..12367.25 rows=9507 width=526) (actual time=1.931..10.821 rows=9322 loops=1)
Recheck Cond: ("AccountCode" = 'OXHV'::text)
Heap Blocks: exact=4666
-> Bitmap Index Scan on "IX_Reservation_AccountCode_EntityId" (cost=0.00..427.72 rows=9507 width=0) (actual time=1.398..1.398 rows=9327 loops=1)
Index Cond: ("AccountCode" = 'OXHV'::text)
-> Hash (cost=96.77..96.77 rows=2477 width=131) (actual time=1.312..1.313 rows=2511 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 466kB
-> Seq Scan on "TimeSliceDefinition" t (cost=0.00..96.77 rows=2477 width=131) (actual time=0.004..0.550 rows=2511 loops=1)
Planning Time: 1.394 ms
Execution Time: 15.641 ms

Given that the AccountCode should be actually the same, the in all referenced rows, it's really just to double check, in case we have cross tenant (cross account) references. - The extra "and t."AccountCode" = 'OXHV'" is added by the ORM as a safety net, we'd like to keep it that way, but postgres shouldn't consider it for the row estimates.

I've tried creating the following statistics:

CREATE STATISTICS MT_ReservationBucket on "AccountCode", "DepartureUtc", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationDepartureUtc on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";
CREATE STATISTICS MT_TimeSliceDefinition on "AccountCode", "Id" from "Rates"."TimeSliceDefinition";
CREATE STATISTICS MT_NoShowFee on "AccountCode", "Id" from "Reservation"."NoShowFee";

I tried creating indexes on Id+AccountCode and TimeSliceDefinitionId+AccountCode.

I tried setting up a foreign key using both columns instead of just one:

alter table "Reservation"."Reservation"
add constraint "FK_Reservation_TimeSliceDefinition_TimeSliceDefinitionId_test"
foreign key ("AccountCode", "TimeSliceDefinitionId") references "Rates"."TimeSliceDefinition" ("AccountCode", "Id")
on delete restrict;

I also tried switching default_statistics_target to 10000 and running full "analyze" again afterwards, with those statistics and other indexes, but nothing had any effect. (Maybe the row estimate grew to 55 instead of 54.)

At the end, I've found the following presentation: https://www.postgresql.eu/events/pgconfeu2018/sessions/session/2124/slides/122/Towards%20more%20efficient%20query%20plans%20(2).pdf - with a reference to this discussion: https://www.postgresql.org/message-id/flat/3fcfd5e5-6849-34e6-22ab-1b62d191bedb%402ndquadrant.com#d61504c511d4b437505a05fa50047019

If I understood that discussion correctly (and that's a big IF), that feature in question might be able to solve my problem. - Am I correct? Or did I completely missunderstand it? Is there anything I can do until that is released? (We're currently on Postgres 11 on RDS)

Thanks for your time!

Peter

On Tue, Sep 3, 2019 at 8:40 PM Michael Lewis <mlewis@entrata.com> wrote:

CREATE STATISTICS MT_ReservationDepartureUtc (ndistinct) on "AccountCode", "DepartureUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationArrivalUtc (ndistinct) on "AccountCode", "ArrivalUtc" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationNoShowFeeId (ndistinct) on "AccountCode", "NoShowFeeId" from "Reservation"."Reservation";
CREATE STATISTICS MT_ReservationTimeSliceDefinitionId (ndistinct) on "AccountCode", "TimeSliceDefinitionId" from "Reservation"."Reservation";

but that didn't help either

Did you try with 'dependencies' for the statistics_kind or only ndistinct? What was default_statistics_target set to at the time you created the extended statistics? I am not sure if that value is used, but I would assume so.

В списке pgsql-general по дате отправления:

Предыдущее

От: "Kumar, Virendra"
Дата: 04 сентября 2019 г., 01:08:48
Сообщение: Re: Running a Simple Update Statement Fails, Second Time Suceeds.

Следующее

От: James Sewell
Дата: 04 сентября 2019 г., 01:45:57
Сообщение: Re: Upgrade 96 -> 11

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bad Estimate for multi tenant database queries

Предыдущее

Следующее