Обсуждение: slow joins?

Поиск

Список

Период

Сортировка

slow joins?

От

Joe Van Dyk

Дата:

06 апреля 2013 г., 04:38:48

On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.

Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;

Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;

Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)

Re: slow joins?

От

Joe Van Dyk

Дата:

06 апреля 2013 г., 04:43:15

(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)

On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:

On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.

Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;

Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;

Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)

Re: slow joins?

От

Joe Van Dyk

Дата:

06 апреля 2013 г., 04:51:18

If I disable sequential scans, hash joins, and merge joins, the query plans become the same and performance on the first slow one is much improved.

Is there something else I can do to avoid this problem?

below also at https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txt

Aggregate (cost=869360.53..869360.54 rows=1 width=0) (actual time=103.102..103.102 rows=1 loops=1)

-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual time=0.253..101.708 rows=8413 loops=1)

-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4) (actual time=0.157..51.766 rows=8178 loops=1)

-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..2312.56 rows=618 width=4) (actual time=0.087..6.318 rows=618 loops=1)

Index Cond: (drop_shipper_id = 221)

-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..702.89 rows=279 width=8) (actual time=0.010..0.069 rows=13 loops=618)

Index Cond: (product_id = products.id)

-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=8178)

Index Cond: (line_item_id = li.id)

Heap Fetches: 144

Total runtime: 103.442 ms

(11 rows)

On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:

On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.

Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;

Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;

Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)

Re: slow joins?

От

Greg Williamson

Дата:

06 апреля 2013 г., 04:54:18

Joe --

>________________________________
> From: Joe Van Dyk <joe@tanga.com>
>To: pgsql-performance@postgresql.org
>Sent: Friday, April 5, 2013 6:42 PM
>Subject: Re: [PERFORM] slow joins?
>
>
>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt
showsa non-wrapped version of the queries and plan) 
>
>
>
>
>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
>
>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very
differentquery plans, the first is much slower than the second. 
>>
>>
>>Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
>>
>>explain analyze                                                                                    
>>SELECT COUNT(*)                                                                                    
>>FROM purchased_items pi                                                                            
>>inner join line_items li on li.id = pi.line_item_id                                                
>>inner join products      on products.id = li.product_id                                            
>>WHERE products.drop_shipper_id = 221;
>>
>> Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>>   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>>         Hash Cond: (pi.line_item_id = li.id)
>>         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812
rows=3724639loops=1) 
>>         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>>               Buckets: 4096  Batches: 4  Memory Usage: 73kB
>>               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178
loops=1)
>>                     Hash Cond: (li.product_id = products.id)
>>                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual
time=0.081..392.926rows=2685499 loops=1) 
>>                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>>                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
>>                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual
time=0.185..0.752rows=618 loops=1) 
>>                                 Recheck Cond: (drop_shipper_id = 221)
>>                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92
rows=618width=0) (actual time=0.125..0.125 rows=618 loops=1) 
>>                                       Index Cond: (drop_shipper_id = 221)
>> Total runtime: 2425.302 ms
>>
>>
>>explain analyze                                                                                    
>>SELECT COUNT(*)                                                                                    
>>FROM purchased_items pi                                                                            
>>inner join line_items li on li.id = pi.line_item_id                                                
>>inner join products      on products.id = li.product_id                                            
>>WHERE products.drop_shipper_id = 2;                                                                
>>                                                                                                                     

>>
>> Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>>   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>>         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>>               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19 width=4)
(actualtime=0.010..0.074 rows=70 loops=1) 
>>                     Index Cond: (drop_shipper_id = 2)
>>               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279
width=8)(actual time=0.002..0.004 rows=2 loops=70) 
>>                     Index Cond: (product_id = products.id)
>>         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2
width=4)(actual time=0.002..0.003 rows=1 loops=167) 
>>               Index Cond: (line_item_id = li.id)
>>               Heap Fetches: 5
>> Total runtime: 0.955 ms
>>(11 rows)
>>
>


Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or
otherwiseprefer a sequential scan on products and on line_items ? 

What are the stats settings for these tables ?

HTH,

Greg WIlliamson

Re: slow joins?

От

Joe Van Dyk

Дата:

06 апреля 2013 г., 05:56:50

On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:

Joe --

>________________________________
> From: Joe Van Dyk <joe@tanga.com>
>To: pgsql-performance@postgresql.org
>Sent: Friday, April 5, 2013 6:42 PM
>Subject: Re: [PERFORM] slow joins?
>
>
>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt shows a non-wrapped version of the queries and plan)
>
>
>
>
>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
>
>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.
>>
>>
>>Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.
>>
>>explain analyze
>>SELECT COUNT(*)
>>FROM purchased_items pi
>>inner join line_items li on li.id = pi.line_item_id
>>inner join products on products.id = li.product_id
>>WHERE products.drop_shipper_id = 221;
>>
>> Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>> -> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>> Hash Cond: (pi.line_item_id = li.id)
>> -> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
>> -> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>> Buckets: 4096 Batches: 4 Memory Usage: 73kB
>> -> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
>> Hash Cond: (li.product_id = products.id)
>> -> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
>> -> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 22kB
>> -> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
>> Recheck Cond: (drop_shipper_id = 221)
>> -> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
>> Index Cond: (drop_shipper_id = 221)
>> Total runtime: 2425.302 ms
>>
>>
>>explain analyze
>>SELECT COUNT(*)
>>FROM purchased_items pi
>>inner join line_items li on li.id = pi.line_item_id
>>inner join products on products.id = li.product_id
>>WHERE products.drop_shipper_id = 2;
>>
>>
>> Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>> -> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>> -> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>> -> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
>> Index Cond: (drop_shipper_id = 2)
>> -> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
>> Index Cond: (product_id = products.id)
>> -> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
>> Index Cond: (line_item_id = li.id)
>> Heap Fetches: 5
>> Total runtime: 0.955 ms
>>(11 rows)
>>
>

Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or otherwise prefer a sequential scan on products and on line_items ?

Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one.

70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.

What are the stats settings for these tables ?

Whatever the defaults are.

HTH,

Greg WIlliamson

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: slow joins?

От

Greg Williamson

Дата:

06 апреля 2013 г., 06:33:05


Joe --

>________________________________
> From: Joe Van Dyk <joe@tanga.com>
>To: Greg Williamson <gwilliamson39@yahoo.com>
>Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
>Sent: Friday, April 5, 2013 7:56 PM
>Subject: Re: [PERFORM] slow joins?
>
>
>On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:
>
>Joe --
>>
>>>________________________________
>>> From: Joe Van Dyk <joe@tanga.com>
>>>To: pgsql-performance@postgresql.org
>>>Sent: Friday, April 5, 2013 6:42 PM
>>>Subject: Re: [PERFORM] slow joins?
>>
>>>
>>>
>>>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt
showsa non-wrapped version of the queries and plan) 
>>>
>>>
>>>
>>>
>>>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
>>>
>>>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking
verydifferent query plans, the first is much slower than the second. 
>>>>
>>>>
>>>>Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 221;
>>>>
>>>> Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>>>>   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>>>>         Hash Cond: (pi.line_item_id = li.id)
>>>>         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812
rows=3724639loops=1) 
>>>>         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>>>>               Buckets: 4096  Batches: 4  Memory Usage: 73kB
>>>>               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178
loops=1)
>>>>                     Hash Cond: (li.product_id = products.id)
>>>>                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual
time=0.081..392.926rows=2685499 loops=1) 
>>>>                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>>>>                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
>>>>                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual
time=0.185..0.752rows=618 loops=1) 
>>>>                                 Recheck Cond: (drop_shipper_id = 221)
>>>>                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92
rows=618width=0) (actual time=0.125..0.125 rows=618 loops=1) 
>>>>                                       Index Cond: (drop_shipper_id = 221)
>>>> Total runtime: 2425.302 ms
>>>>
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 2;                                                                
>>>>                                                                                                                   
 
>>>>
>>>> Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>>>>   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>>>>         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>>>>               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19
width=4)(actual time=0.010..0.074 rows=70 loops=1) 
>>>>                     Index Cond: (drop_shipper_id = 2)
>>>>               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279
width=8)(actual time=0.002..0.004 rows=2 loops=70) 
>>>>                     Index Cond: (product_id = products.id)
>>>>         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2
width=4)(actual time=0.002..0.003 rows=1 loops=167) 
>>>>               Index Cond: (line_item_id = li.id)
>>>>               Heap Fetches: 5
>>>> Total runtime: 0.955 ms
>>>>(11 rows)
>>>>
>>>
>>
>>
>>Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or
otherwiseprefer a sequential scan on products and on line_items ? 
>>
>
>
>Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one. 
>70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.
> 
>What are the stats settings for these tables ?
>>
>
>
>Whatever the defaults are.
> 

I mis-pasted the tables -- both line_items and purchased items are getting sequential scans for the relevant rows; it
ispossible that that there's enough difference to tip the planner to use sequential scans. 

You might try increasing the stats being collected on those two tables, run analyze on all the tables in the query, and
tryit again. 

GW

Re: slow joins?

От

Julien Cigar

Дата:

06 апреля 2013 г., 13:53:57

try to increase cpu_tuple_cost to 0.1

On 04/06/2013 03:50, Joe Van Dyk wrote:

If I disable sequential scans, hash joins, and merge joins, the query plans become the same and performance on the first slow one is much improved.

Is there something else I can do to avoid this problem?

below also at https://gist.github.com/joevandyk/34e31b3ad5cccb730a50/raw/8081a4298ba50ac93a86df97c1d0aae482ee7d2d/gistfile1.txt

Aggregate (cost=869360.53..869360.54 rows=1 width=0) (actual time=103.102..103.102 rows=1 loops=1)
-> Nested Loop (cost=0.00..869164.63 rows=78360 width=0) (actual time=0.253..101.708 rows=8413 loops=1)
-> Nested Loop (cost=0.00..438422.95 rows=56499 width=4) (actual time=0.157..51.766 rows=8178 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..2312.56 rows=618 width=4) (actual time=0.087..6.318 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..702.89 rows=279 width=8) (actual time=0.010..0.069 rows=13 loops=618)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=8178)
Index Cond: (line_item_id = li.id)
Heap Fetches: 144
Total runtime: 103.442 ms
(11 rows)

On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking very different query plans, the first is much slower than the second.

Any ideas on how I can speed this up? I have btree indexes for all the columns used in the query.

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 221;

Aggregate (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
-> Hash Join (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
Hash Cond: (pi.line_item_id = li.id)
-> Seq Scan on purchased_items pi (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812 rows=3724639 loops=1)
-> Hash (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
Buckets: 4096 Batches: 4 Memory Usage: 73kB
-> Hash Join (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178 loops=1)
Hash Cond: (li.product_id = products.id)
-> Seq Scan on line_items li (cost=0.00..65617.18 rows=2685518 width=8) (actual time=0.081..392.926 rows=2685499 loops=1)
-> Hash (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Bitmap Heap Scan on products (cost=13.07..1676.60 rows=618 width=4) (actual time=0.185..0.752 rows=618 loops=1)
Recheck Cond: (drop_shipper_id = 221)
-> Bitmap Index Scan on index_products_on_drop_shipper_id (cost=0.00..12.92 rows=618 width=0) (actual time=0.125..0.125 rows=618 loops=1)
Index Cond: (drop_shipper_id = 221)
Total runtime: 2425.302 ms

explain analyze
SELECT COUNT(*)
FROM purchased_items pi
inner join line_items li on li.id = pi.line_item_id
inner join products on products.id = li.product_id
WHERE products.drop_shipper_id = 2;

Aggregate (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
-> Nested Loop (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
-> Nested Loop (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
-> Index Scan using index_products_on_drop_shipper_id on products (cost=0.00..80.41 rows=19 width=4) (actual time=0.010..0.074 rows=70 loops=1)
Index Cond: (drop_shipper_id = 2)
-> Index Scan using index_line_items_on_product_id on line_items li (cost=0.00..835.70 rows=279 width=8) (actual time=0.002..0.004 rows=2 loops=70)
Index Cond: (product_id = products.id)
-> Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi (cost=0.00..7.60 rows=2 width=4) (actual time=0.002..0.003 rows=1 loops=167)
Index Cond: (line_item_id = li.id)
Heap Fetches: 5
Total runtime: 0.955 ms
(11 rows)

Re: slow joins?

От

Kevin Grittner

Дата:

06 апреля 2013 г., 17:22:45

Julien Cigar <jcigar@ulb.ac.be> wrote:

> try to increase cpu_tuple_cost to 0.1

I agree that's on the right track, but possibly an overly blunt
tool for the job.  The following settings are likely to need
adjustment, IMO:

effective_cache_size: People often set this to somewhere in the
range of 50% to 75% of the RAM on the machine.  This setting does
not allocate RAM, but tells the planner how likely it is to find
things in cache for, say, repeated index access.  A higher setting
makes the random access involved in index scans seem like less of a
problem.

random_page_cost: You seem to have a very high cache hit ratio,
between shared_buffers and the OS cache.  To model this you should
decrease random_page_cost to something just above seq_page_cost or
equal to it.  To reflect the relatively low cost of reading a page
from the OS cache (compared to actually reading from disk) you
might want to reduce both of these below 1.  0.1 is a not-uncommon
setting for instances with the active portion of the database
well-cached.

cpu_tuple_cost: I always raise this; I think our default is just
too low to accurately model the cost of reading a row, compared to
the cost factors used for other things.  In combination with the
above changes I've never had to go beyond 0.03 to get a good plan.
I've pushed it to 0.05 to see if that put me near a tipping point
for a bad plan, and saw no ill effects.  I've never tried higher
than 0.05, so I can't speak to that.

In any event, your current cost settings aren't accurately modeling
actual costs in your environment for your workload.  You need to
adjust them.

One of the estimates was off, so increasing the statistics sample
size might help, but I suspect that you need to make adjustments
like the above in any event.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: slow joins?

От

Julien Cigar

Дата:

06 апреля 2013 г., 17:44:08

On 04/06/2013 16:22, Kevin Grittner wrote:
> Julien Cigar <jcigar@ulb.ac.be> wrote:
>
>> try to increase cpu_tuple_cost to 0.1
> I agree that's on the right track, but possibly an overly blunt
> tool for the job.  The following settings are likely to need
> adjustment, IMO:
>
> effective_cache_size: People often set this to somewhere in the
> range of 50% to 75% of the RAM on the machine.  This setting does
> not allocate RAM, but tells the planner how likely it is to find
> things in cache for, say, repeated index access.  A higher setting
> makes the random access involved in index scans seem like less of a
> problem.

I agree that the very first thing to check is effective_cache_size

> random_page_cost: You seem to have a very high cache hit ratio,
> between shared_buffers and the OS cache.  To model this you should
> decrease random_page_cost to something just above seq_page_cost or
> equal to it.  To reflect the relatively low cost of reading a page
> from the OS cache (compared to actually reading from disk) you
> might want to reduce both of these below 1.  0.1 is a not-uncommon
> setting for instances with the active portion of the database
> well-cached.

I would first raise cpu_tuple_cost rather than touch random_page_cost.
Raising cpu_tuple_cost is
a more "fine-grained method" for discouraging seqscans than
random_page_cost is.


> cpu_tuple_cost: I always raise this; I think our default is just
> too low to accurately model the cost of reading a row, compared to
> the cost factors used for other things.  In combination with the
> above changes I've never had to go beyond 0.03 to get a good plan.
> I've pushed it to 0.05 to see if that put me near a tipping point
> for a bad plan, and saw no ill effects.  I've never tried higher
> than 0.05, so I can't speak to that.

Yep, default cpu_tuple_cost is just too low ..

> In any event, your current cost settings aren't accurately modeling
> actual costs in your environment for your workload.  You need to
> adjust them.
>
> One of the estimates was off, so increasing the statistics sample
> size might help, but I suspect that you need to make adjustments
> like the above in any event.
>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: slow joins?