need to repeat the same condition on joined tables in order to choose the proper plan

Поиск
Список
Период
Сортировка
От Svetlin Manavski
Тема need to repeat the same condition on joined tables in order to choose the proper plan
Дата
Msg-id BANLkTikSqgoucZkO0F3v8pHCopXjR=+NYg@mail.gmail.com
обсуждение исходный текст
Ответы Re: need to repeat the same condition on joined tables in order to choose the proper plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hi performance gurus,

One of the reasons I prefer PostgreSQL is because it does not implement hints. However I have a situation which seems like I am forced to use a hint-like statement:
 
I have 2 tables in PostgreSQL 9.0:

tcpsessions - about 4 Mrows in lab, hundreds of Mrows in production
primary key(detectorid, id)

tcpsessiondata - about 2-5 times bigger than tcpsessions
Foreign key(detectorid, sessionid) References tcpsessions(detectorid,id)
There is an index on (detectorid, sessionid)

For completeness tcpsessiondata is actually partitioned according to the official documentation but I will save you the details if that is not necessary. For the purpose of this message, all the data will be available in one child table: tcpsessiondata_default

When I run the following simple query:

select
    (createdtime / 60000000000) as timegroup,
    (sum(datafromsource)+sum(datafromdestination)) as numbytes,
    (sum(packetsfromsource)+sum(packetsfromdestination)) as numpackets
from
    tcpsessiondata SD, tcpsessions SS
where
    SD.detectorid = SS.detectorid
    and SD.sessionid = SS.id
    and SD.detectorid = 1
    and SD.sessionid >= 1001000000000::INT8 and SD.sessionid <= 2001000000000::INT8
group by
    timegroup
order by
    timegroup asc

I get the following plan:
"Sort  (cost=259126.13..259126.63 rows=200 width=32) (actual time=32526.762..32526.781 rows=20 loops=1)"
"  Output: ((sd.createdtime / 60000000000::bigint)), (((sum(sd.datafromsource) + sum(sd.datafromdestination)) / 1048576::numeric)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))"
"  Sort Key: ((sd.createdtime / 60000000000::bigint))"
"  Sort Method:  quicksort  Memory: 26kB"
"  ->  HashAggregate  (cost=259112.49..259118.49 rows=200 width=32) (actual time=32526.657..32526.700 rows=20 loops=1)"
"        Output: ((sd.createdtime / 60000000000::bigint)), ((sum(sd.datafromsource) + sum(sd.datafromdestination)) / 1048576::numeric), (sum(sd.packetsfromsource) + sum(sd.packetsfromdestination))"
"        ->  Hash Join  (cost=126553.43..252603.29 rows=520736 width=32) (actual time=22400.430..31291.838 rows=570100 loops=1)"
"              Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, (sd.createdtime / 60000000000::bigint)"
"              Hash Cond: (sd.sessionid = ss.id)"
"              ->  Append  (cost=0.00..100246.89 rows=520736 width=42) (actual time=2382.160..6226.906 rows=570100 loops=1)"
"                    ->  Seq Scan on appqosdata.tcpsessiondata sd  (cost=0.00..18.65 rows=1 width=42) (actual time=0.002..0.002 rows=0 loops=1)"
"                          Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, sd.detectorid, sd.sessionid"
"                          Filter: ((sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint) AND (sd.detectorid = 1))"
"                    ->  Bitmap Heap Scan on appqosdata.tcpsessiondata_default sd  (cost=11001.37..100228.24 rows=520735 width=42) (actual time=2382.154..5278.319 rows=570100 loops=1)"
"                          Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, sd.detectorid, sd.sessionid"
"                          Recheck Cond: ((sd.detectorid = 1) AND (sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint))"
"                          ->  Bitmap Index Scan on idx_tcpsessiondata_default_detectoridandsessionid  (cost=0.00..10871.19 rows=520735 width=0) (actual time=2351.865..2351.865 rows=574663 loops=1)"
"                                Index Cond: ((sd.detectorid = 1) AND (sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint))"
"              ->  Hash  (cost=72340.48..72340.48 rows=3628356 width=10) (actual time=19878.891..19878.891 rows=3632586 loops=1)"
"                    Output: ss.detectorid, ss.id"
"                    Buckets: 8192  Batches: 64  Memory Usage: 2687kB"
"                    ->  Seq Scan on appqosdata.tcpsessions ss  (cost=0.00..72340.48 rows=3628356 width=10) (actual time=627.164..14586.202 rows=3632586 loops=1)"
"                          Output: ss.detectorid, ss.id"
"                          Filter: (ss.detectorid = 1)"
"Total runtime: 32543.224 ms"


As we can see the planner decides to go for an index scan on tcpsessiondata_default (as expected) and for a seq scan on tcpsessions. However if I add the following ugly condition to my query:
    and SS.detectorid = 1
    and SS.id >= 1001000000000::INT8 and SS.id <= 2001000000000::INT8
so that the full query now becomes:

select
    (createdtime / 60000000000) as timegroup,
    (sum(datafromsource)+sum(datafromdestination)) as numbytes,
    (sum(packetsfromsource)+sum(packetsfromdestination)) as numpackets
from
    tcpsessiondata SD, tcpsessions SS
where
    SD.detectorid = SS.detectorid
    and SD.sessionid = SS.id
    and SD.detectorid = 1
    and SD.sessionid >= 1001000000000::INT8 and SD.sessionid <= 2001000000000::INT8
    and SS.detectorid = 1
    and SS.id >= 1001000000000::INT8 and SS.id <= 2001000000000::INT8
group by
    timegroup
order by
    timegroup asc

well, now I have an index scan on tcpsessions as well and running time is 3 times less than the previous one:

"Sort  (cost=157312.59..157313.09 rows=200 width=32) (actual time=9682.748..9682.764 rows=20 loops=1)"
"  Output: ((sd.createdtime / 60000000000::bigint)), (((sum(sd.datafromsource) + sum(sd.datafromdestination)) / 1048576::numeric)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))"
"  Sort Key: ((sd.createdtime / 60000000000::bigint))"
"  Sort Method:  quicksort  Memory: 26kB"
"  ->  HashAggregate  (cost=157298.94..157304.94 rows=200 width=32) (actual time=9682.649..9682.692 rows=20 loops=1)"
"        Output: ((sd.createdtime / 60000000000::bigint)), ((sum(sd.datafromsource) + sum(sd.datafromdestination)) / 1048576::numeric), (sum(sd.packetsfromsource) + sum(sd.packetsfromdestination))"
"        ->  Hash Join  (cost=32934.67..150744.28 rows=524373 width=32) (actual time=3695.016..8370.629 rows=570100 loops=1)"
"              Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, (sd.createdtime / 60000000000::bigint)"
"              Hash Cond: (sd.sessionid = ss.id)"
"              ->  Append  (cost=0.00..100948.71 rows=524373 width=42) (actual time=2318.568..4799.985 rows=570100 loops=1)"
"                    ->  Seq Scan on appqosdata.tcpsessiondata sd  (cost=0.00..18.65 rows=1 width=42) (actual time=0.001..0.001 rows=0 loops=1)"
"                          Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, sd.detectorid, sd.sessionid"
"                          Filter: ((sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint) AND (sd.detectorid = 1))"
"                    ->  Bitmap Heap Scan on appqosdata.tcpsessiondata_default sd  (cost=11080.05..100930.06 rows=524372 width=42) (actual time=2318.563..3789.844 rows=570100 loops=1)"
"                          Output: sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, sd.detectorid, sd.sessionid"
"                          Recheck Cond: ((sd.detectorid = 1) AND (sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint))"
"                          ->  Bitmap Index Scan on idx_tcpsessiondata_default_detectoridandsessionid  (cost=0.00..10948.96 rows=524372 width=0) (actual time=2305.322..2305.322 rows=574663 loops=1)"
"                                Index Cond: ((sd.detectorid = 1) AND (sd.sessionid >= 1001000000000::bigint) AND (sd.sessionid <= 2001000000000::bigint))"
"              ->  Hash  (cost=30159.60..30159.60 rows=185726 width=10) (actual time=1345.307..1345.307 rows=194971 loops=1)"
"                    Output: ss.detectorid, ss.id"
"                    Buckets: 8192  Batches: 4  Memory Usage: 2297kB"
"                    ->  Bitmap Heap Scan on appqosdata.tcpsessions ss  (cost=3407.46..30159.60 rows=185726 width=10) (actual time=483.572..1069.292 rows=194971 loops=1)"
"                          Output: ss.detectorid, ss.id"
"                          Recheck Cond: ((ss.id >= 1001000000000::bigint) AND (ss.id <= 2001000000000::bigint))"
"                          Filter: (ss.detectorid = 1)"
"                          ->  Bitmap Index Scan on idx_tcpsessions_id  (cost=0.00..3361.02 rows=201751 width=0) (actual time=451.242..451.242 rows=219103 loops=1)"
"                                Index Cond: ((ss.id >= 1001000000000::bigint) AND (ss.id <= 2001000000000::bigint))"
"Total runtime: 9682.905 ms"


Let me also add that if I remove the conditions on SD but keep the conditions on SS, then I get an index scan on tcpsessions BUT a seq scan on tcpsessiondata.

Let's now suppose that the index scan on both tables is the best choice as the planner itself selects it in one of the 3 cases. (It is also the faster plan as we extract only 200 000 rows out of 4 millions in this example). But I am really surprised to see that the planner needs me to explicitly specify the same condition twice like this:

    SD.detectorid = SS.detectorid
    and SD.sessionid = SS.id
    and SD.detectorid = 1
    and SD.sessionid >= 1001000000000::INT8 and SD.sessionid <= 2001000000000::INT8
    and SS.detectorid = 1
    and SS.id >= 1001000000000::INT8 and SS.id <= 2001000000000::INT8

in order to use the primary key on SS, even if it is absolutely clear that "SD.detectorid = SS.detectorid and SD.sessionid = SS.id". Well I hope you agree that repeating the same condition on SS seems very like giving a hint to use the index there. But I feel very uncomfortable to use such an ugly condition, especially knowing that I am doing it to "force an index". On the other hand I am terrified that we may go in production for a seq scan on hundreds of millions of rows just to extract 200 000.

Would you please explain that behavior and how would you suggest to proceed?

Thanks for any comments,
Svetlin Manavski



В списке pgsql-performance по дате отправления:

Предыдущее
От: Willy-Bas Loos
Дата:
Сообщение: Re: change sample size for statistics
Следующее
От: Tom Lane
Дата:
Сообщение: Re: need to repeat the same condition on joined tables in order to choose the proper plan