Обсуждение: slow joins?
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
If I disable sequential scans, hash joins, and merge joins, the query plans become the same and performance on the first slow one is much improved.
Is there something else I can do to avoid this problem?
below also at https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txt
Aggregate (cost=869360.53..869360.54 rows=1 width=0) (actual time=103.102..103.102 rows=1 loops=1)
-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual time=0.253..101.708 rows=8413 loops=1)
-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4) (actual time=0.157..51.766 rows=8178 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..2312.56 rows=618 width=4) (actual time=0.087..6.318 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..702.89 rows=279 width=8) (actual time=0.010..0.069 rows=13 loops=618)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=8178)
Index Cond: (line_item_id = li.id)
Heap Fetches: 144
Total runtime: 103.442 ms
(11 rows)
On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Joe -- >________________________________ > From: Joe Van Dyk <joe@tanga.com> >To: pgsql-performance@postgresql.org >Sent: Friday, April 5, 2013 6:42 PM >Subject: Re: [PERFORM] slow joins? > > >(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt showsa non-wrapped version of the queries and plan) > > > > >On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote: > >On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very differentquery plans, the first is much slower than the second. >> >> >>Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query. >> >>explain analyze >>SELECT COUNT(*) >>FROM purchased_items pi >>inner join line_items li on li.id = pi.line_item_id >>inner join products on products.id = li.product_id >>WHERE products.drop_shipper_id = 221; >> >> Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1) >> -> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1) >> Hash Cond: (pi.line_item_id = li.id) >> -> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639loops=1) >> -> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1) >> Buckets: 4096 Batches: 4 Memory Usage: 73kB >> -> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1) >> Hash Cond: (li.product_id = products.id) >> -> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926rows=2685499 loops=1) >> -> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1) >> Buckets: 1024 Batches: 1 Memory Usage: 22kB >> -> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752rows=618 loops=1) >> Recheck Cond: (drop_shipper_id = 221) >> -> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618width=0) (actual time=0.125..0.125 rows=618 loops=1) >> Index Cond: (drop_shipper_id = 221) >> Total runtime: 2425.302 ms >> >> >>explain analyze >>SELECT COUNT(*) >>FROM purchased_items pi >>inner join line_items li on li.id = pi.line_item_id >>inner join products on products.id = li.product_id >>WHERE products.drop_shipper_id = 2; >> >> >> Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1) >> -> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1) >> -> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1) >> -> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actualtime=0.010..0.074 rows=70 loops=1) >> Index Cond: (drop_shipper_id = 2) >> -> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8)(actual time=0.002..0.004 rows=2 loops=70) >> Index Cond: (product_id = products.id) >> -> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4)(actual time=0.002..0.003 rows=1 loops=167) >> Index Cond: (line_item_id = li.id) >> Heap Fetches: 5 >> Total runtime: 0.955 ms >>(11 rows) >> > Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwiseprefer a sequential scan on products and on line_items ? What are the stats settings for these tables ? HTH, Greg WIlliamson
On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:
Joe --
>________________________________
> From: Joe Van Dyk <joe@tanga.com>
>To: pgsql-performance@postgresql.org
>Sent: Friday, April 5, 2013 6:42 PM
>Subject: Re: [PERFORM] slow joins?Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwise prefer a sequential scan on products and on line_items ?>
>
>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
>
>
>
>
>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
>
>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
>>
>>
>>Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
>>
>>explain analyze
>>SELECT COUNT(*)
>>FROM purchased_items pi
>>inner join line_items li on li.id = pi.line_item_id
>>inner join products on products.id = li.product_id
>>WHERE products.drop_shipper_id = 221;
>>
>> Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>> -> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>> Hash Cond: (pi.line_item_id = li.id)
>> -> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
>> -> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>> Buckets: 4096 Batches: 4 Memory Usage: 73kB
>> -> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
>> Hash Cond: (li.product_id = products.id)
>> -> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
>> -> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 22kB
>> -> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
>> Recheck Cond: (drop_shipper_id = 221)
>> -> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
>> Index Cond: (drop_shipper_id = 221)
>> Total runtime: 2425.302 ms
>>
>>
>>explain analyze
>>SELECT COUNT(*)
>>FROM purchased_items pi
>>inner join line_items li on li.id = pi.line_item_id
>>inner join products on products.id = li.product_id
>>WHERE products.drop_shipper_id = 2;
>>
>>
>> Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>> -> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>> -> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
>> Index Cond: (drop_shipper_id = 2)
>> -> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
>> Index Cond: (product_id = products.id)
>> -> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
>> Index Cond: (line_item_id = li.id)
>> Heap Fetches: 5
>> Total runtime: 0.955 ms
>>(11 rows)
>>
>
Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one.
70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.
What are the stats settings for these tables ?
Whatever the defaults are.
HTH,
Greg WIlliamson
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Joe -- >________________________________ > From: Joe Van Dyk <joe@tanga.com> >To: Greg Williamson <gwilliamson39@yahoo.com> >Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org> >Sent: Friday, April 5, 2013 7:56 PM >Subject: Re: [PERFORM] slow joins? > > >On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote: > >Joe -- >> >>>________________________________ >>> From: Joe Van Dyk <joe@tanga.com> >>>To: pgsql-performance@postgresql.org >>>Sent: Friday, April 5, 2013 6:42 PM >>>Subject: Re: [PERFORM] slow joins? >> >>> >>> >>>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt showsa non-wrapped version of the queries and plan) >>> >>> >>> >>> >>>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote: >>> >>>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking verydifferent query plans, the first is much slower than the second. >>>> >>>> >>>>Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query. >>>> >>>>explain analyze >>>>SELECT COUNT(*) >>>>FROM purchased_items pi >>>>inner join line_items li on li.id = pi.line_item_id >>>>inner join products on products.id = li.product_id >>>>WHERE products.drop_shipper_id = 221; >>>> >>>> Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1) >>>> -> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1) >>>> Hash Cond: (pi.line_item_id = li.id) >>>> -> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639loops=1) >>>> -> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1) >>>> Buckets: 4096 Batches: 4 Memory Usage: 73kB >>>> -> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1) >>>> Hash Cond: (li.product_id = products.id) >>>> -> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926rows=2685499 loops=1) >>>> -> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1) >>>> Buckets: 1024 Batches: 1 Memory Usage: 22kB >>>> -> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752rows=618 loops=1) >>>> Recheck Cond: (drop_shipper_id = 221) >>>> -> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618width=0) (actual time=0.125..0.125 rows=618 loops=1) >>>> Index Cond: (drop_shipper_id = 221) >>>> Total runtime: 2425.302 ms >>>> >>>> >>>>explain analyze >>>>SELECT COUNT(*) >>>>FROM purchased_items pi >>>>inner join line_items li on li.id = pi.line_item_id >>>>inner join products on products.id = li.product_id >>>>WHERE products.drop_shipper_id = 2; >>>> >>>> >>>> Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1) >>>> -> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1) >>>> -> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1) >>>> -> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4)(actual time=0.010..0.074 rows=70 loops=1) >>>> Index Cond: (drop_shipper_id = 2) >>>> -> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8)(actual time=0.002..0.004 rows=2 loops=70) >>>> Index Cond: (product_id = products.id) >>>> -> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4)(actual time=0.002..0.003 rows=1 loops=167) >>>> Index Cond: (line_item_id = li.id) >>>> Heap Fetches: 5 >>>> Total runtime: 0.955 ms >>>>(11 rows) >>>> >>> >> >> >>Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwiseprefer a sequential scan on products and on line_items ? >> > > >Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one. >70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221. > >What are the stats settings for these tables ? >> > > >Whatever the defaults are. > I mis-pasted the tables -- both line_items and purchased items are getting sequential scans for the relevant rows; it ispossible that that there's enough difference to tip the planner to use sequential scans. You might try increasing the stats being collected on those two tables, run analyze on all the tables in the query, and tryit again. GW
try to increase cpu_tuple_cost to 0.1
On 04/06/2013 03:50, Joe Van Dyk wrote:
On 04/06/2013 03:50, Joe Van Dyk wrote:
If I disable sequential scans, hash joins, and merge joins, the query plans become the same and performance on the first slow one is much improved.Is there something else I can do to avoid this problem?below also at https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txtAggregate (cost=869360.53..869360.54 rows=1 width=0) (actual time=103.102..103.102 rows=1 loops=1)-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual time=0.253..101.708 rows=8413 loops=1)-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4) (actual time=0.157..51.766 rows=8178 loops=1)-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..2312.56 rows=618 width=4) (actual time=0.087..6.318 rows=618 loops=1)Index Cond: (drop_shipper_id = 221)-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..702.89 rows=279 width=8) (actual time=0.010..0.069 rows=13 loops=618)Index Cond: (product_id = products.id)-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=8178)Index Cond: (line_item_id = li.id)Heap Fetches: 144Total runtime: 103.442 ms(11 rows)On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;
Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms
explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;
Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)
Julien Cigar <jcigar@ulb.ac.be> wrote: > try to increase cpu_tuple_cost to 0.1 I agree that's on the right track, but possibly an overly blunt tool for the job. The following settings are likely to need adjustment, IMO: effective_cache_size: People often set this to somewhere in the range of 50% to 75% of the RAM on the machine. This setting does not allocate RAM, but tells the planner how likely it is to find things in cache for, say, repeated index access. A higher setting makes the random access involved in index scans seem like less of a problem. random_page_cost: You seem to have a very high cache hit ratio, between shared_buffers and the OS cache. To model this you should decrease random_page_cost to something just above seq_page_cost or equal to it. To reflect the relatively low cost of reading a page from the OS cache (compared to actually reading from disk) you might want to reduce both of these below 1. 0.1 is a not-uncommon setting for instances with the active portion of the database well-cached. cpu_tuple_cost: I always raise this; I think our default is just too low to accurately model the cost of reading a row, compared to the cost factors used for other things. In combination with the above changes I've never had to go beyond 0.03 to get a good plan. I've pushed it to 0.05 to see if that put me near a tipping point for a bad plan, and saw no ill effects. I've never tried higher than 0.05, so I can't speak to that. In any event, your current cost settings aren't accurately modeling actual costs in your environment for your workload. You need to adjust them. One of the estimates was off, so increasing the statistics sample size might help, but I suspect that you need to make adjustments like the above in any event. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 04/06/2013 16:22, Kevin Grittner wrote: > Julien Cigar <jcigar@ulb.ac.be> wrote: > >> try to increase cpu_tuple_cost to 0.1 > I agree that's on the right track, but possibly an overly blunt > tool for the job. The following settings are likely to need > adjustment, IMO: > > effective_cache_size: People often set this to somewhere in the > range of 50% to 75% of the RAM on the machine. This setting does > not allocate RAM, but tells the planner how likely it is to find > things in cache for, say, repeated index access. A higher setting > makes the random access involved in index scans seem like less of a > problem. I agree that the very first thing to check is effective_cache_size > random_page_cost: You seem to have a very high cache hit ratio, > between shared_buffers and the OS cache. To model this you should > decrease random_page_cost to something just above seq_page_cost or > equal to it. To reflect the relatively low cost of reading a page > from the OS cache (compared to actually reading from disk) you > might want to reduce both of these below 1. 0.1 is a not-uncommon > setting for instances with the active portion of the database > well-cached. I would first raise cpu_tuple_cost rather than touch random_page_cost. Raising cpu_tuple_cost is a more "fine-grained method" for discouraging seqscans than random_page_cost is. > cpu_tuple_cost: I always raise this; I think our default is just > too low to accurately model the cost of reading a row, compared to > the cost factors used for other things. In combination with the > above changes I've never had to go beyond 0.03 to get a good plan. > I've pushed it to 0.05 to see if that put me near a tipping point > for a bad plan, and saw no ill effects. I've never tried higher > than 0.05, so I can't speak to that. Yep, default cpu_tuple_cost is just too low .. > In any event, your current cost settings aren't accurately modeling > actual costs in your environment for your workload. You need to > adjust them. > > One of the estimates was off, so increasing the statistics sample > size might help, but I suspect that you need to make adjustments > like the above in any event. >