Обсуждение: Why does the number of rows are different in actual and estimated.

Поиск
Список
Период
Сортировка

Why does the number of rows are different in actual and estimated.

От
AI Rumman
Дата:
Why does the number of rows are different in actual and estimated?
The default_statistics_target is set to 100.


explain analyze
select *
FROM ( 
SELECT 
entity.id AS "con_s_id", entity.setype AS "con_s_setype" , 
con_details.salutation AS "con_s_salutationtype", con_details.firstname AS "con_s_firstname", 
con_details.phone AS "con_s_phone", con_details.lastname AS "con_s_lastname", 
con_details.accountid AS "con_s_account_id_entityid", con_details.mobile AS "con_s_mobile", 
con_details.title AS "con_s_title", con_details.donotcall AS "con_s_donotcall", 
con_details.fax AS "con_s_fax", con_details.department AS "con_s_department", 
con_details.email AS "con_s_email", con_details.yahooid AS "con_s_yahooid", 
con_details.emailoptout AS "con_s_emailoptout", con_details.reportsto AS "con_s_con__id_entityid", 
con_details.reference AS "con_s_reference", entity.smownerid AS "con_s_assigned_user_id_entityid", 
CASE WHEN entity.owner_type='U' THEN users.user_name ELSE groups.groupname END AS "con_s_assigned_user_id_name", 
CASE WHEN entity.owner_type='U' THEN users.first_name || ' ' || users.last_name ELSE groups.groupname END AS "con_s_assigned_user_id", 
CASE WHEN entity.owner_type='U' THEN 'Users' ELSE 'Groups' END AS "con_s_assigned_user_id_linkmodule", 
entity.modifiedtime AS "con_s_modifiedtime", con_details.notify_owner AS "con_s_notify_owner", 
entity.createdtime AS "con_s_createdtime", entity.description AS "con_s_description", 
con_details.imagename AS "con_s_imagename" 
FROM con_details 
INNER JOIN entity ON con_details.con_id=entity.id 
LEFT JOIN groups ON groups.groupid = entity.smownerid 
LEFT join users ON entity.smownerid= users.id 
WHERE entity.setype='con_s' AND entity.deleted=0 
AND (((con_details.email ILIKE '%@%'))) 
) con_base 
INNER JOIN con_scf ON con_s_base."con_s_id"=con_scf.con_id 
INNER JOIN con_subdetails ON con_s_base."con_s_id"=con_subdetails.con_subscriptionid 
INNER JOIN customerdetails ON con_s_base."con_s_id"=customerdetails.customerid 
INNER JOIN con_address ON con_s_base."con_s_id"=con_address.con_addressid 


Nested Loop  (cost=18560.97..26864.83 rows=24871 width=535) (actual time=1335.157..8492.414 rows=157953 loops=1)
   ->  Hash Left Join  (cost=18560.97..26518.91 rows=116 width=454) (actual time=1335.117..6996.585 rows=205418 loops=1)
         Hash Cond: (entity.smownerid = users.id)
         ->  Hash Left Join  (cost=18547.22..26503.57 rows=116 width=419) (actual time=1334.354..6671.442 rows=205418 loops=1)
               Hash Cond: (entity.smownerid = groups.groupid)
               ->  Nested Loop  (cost=18546.83..26502.72 rows=116 width=398) (actual time=1334.314..6385.664 rows=205418 loops=1)
                     ->  Nested Loop  (cost=18546.83..26273.40 rows=774 width=319) (actual time=1334.272..5025.175 rows=205418 loops=1)
                           ->  Hash Join  (cost=18546.83..24775.02 rows=5213 width=273) (actual time=1334.238..3666.748 rows=205420 loops=1)
                                 Hash Cond: (con_subdetails.con_subscriptionid = entity.id)
                                 ->  Index Scan using con_subdetails_pkey on con_subdetails  (cost=0.00..4953.41 rows=326040 width=29) (actual time=0.019..350
.736 rows=327328 loops=1)
                                 ->  Hash  (cost=18115.71..18115.71 rows=34489 width=244) (actual time=1334.147..1334.147 rows=205420 loops=1)
                                       Buckets: 4096  Batches: 1  Memory Usage: 19417kB
                                       ->  Hash Join  (cost=9337.97..18115.71 rows=34489 width=244) (actual time=418.054..1156.453 rows=205420 loops=1)
                                             Hash Cond: (customerdetails.customerid = entity.id)
                                             ->  Seq Scan on customerdetails  (cost=0.00..4752.46 rows=327146 width=13) (actual time=0.021..176.389 rows=327328 loops=1)
                                             ->  Hash  (cost=6495.65..6495.65 rows=227386 width=231) (actual time=417.839..417.839 rows=205420 loops=1)
                                                   Buckets: 32768  Batches: 1  Memory Usage: 16056kB
                                                   ->  Index Scan using entity_setype_idx on entity  (cost=0.00..6495.65 rows=227386 width=231) (actual time=0.033..2
53.880 rows=205420 loops=1)
                                                         Index Cond: ((setype)::text = 'con_s'::text)
                           ->  Index Scan using con_address_pkey on con_address  (cost=0.00..0.27 rows=1 width=46) (actual time=0.003..0.004 rows=1 loops=2054
20)
                                 Index Cond: (con_addressid = entity.id)
                     ->  Index Scan using con_scf_pkey on con_scf  (cost=0.00..0.28 rows=1 width=79) (actual time=0.003..0.004 rows=1 loops=205418)
                           Index Cond: (con_id = entity.id)
               ->  Hash  (cost=0.34..0.34 rows=4 width=25) (actual time=0.016..0.016 rows=4 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Index Scan using groups_pkey on groups  (cost=0.00..0.34 rows=4 width=25) (actual time=0.008..0.012 rows=4 loops=1)
         ->  Hash  (cost=9.00..9.00 rows=380 width=39) (actual time=0.746..0.746 rows=380 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 28kB
               ->  Index Scan using users_pkey on users  (cost=0.00..9.00 rows=380 width=39) (actual time=0.014..0.440 rows=380 loops=1)
   ->  Index Scan using con_details_pkey on con_details  (cost=0.00..0.29 rows=1 width=85) (actual time=0.004..0.004 rows=1 loops=205418)
         Index Cond: (con_id = entity.id)
         Filter: ((email)::text ~~* '%@%'::text)
 Total runtime: 8573.237 ms

Re: Why does the number of rows are different in actual and estimated.

От
Andrew Dunstan
Дата:
On 12/13/2012 05:12 PM, AI Rumman wrote:
> Why does the number of rows are different in actual and estimated?
>


Isn't that in the nature of estimates? An estimate is a heuristic guess
at the number of rows it will find for the given query or part of a
query. It's not uncommon for estimates to be out by several orders of
magnitude. Guaranteeing estimates within bounded accuracy and in a given
short amount of time (you don't want your planning time to overwhelm
your execution time) isn't possible.

cheers

andrew



Re: Why does the number of rows are different in actual and estimated.

От
Evgeny Shishkin
Дата:

On Dec 14, 2012, at 2:36 AM, Andrew Dunstan <andrew@dunslane.net> wrote:


On 12/13/2012 05:12 PM, AI Rumman wrote:
Why does the number of rows are different in actual and estimated?



Isn't that in the nature of estimates? An estimate is a heuristic guess at the number of rows it will find for the given query or part of a query. It's not uncommon for estimates to be out by several orders of magnitude. Guaranteeing estimates within bounded accuracy and in a given short amount of time (you don't want your planning time to overwhelm your execution time) isn't possible.


The main question i think is what to do with it.

The problem starts here

  ->  Hash Join  (cost=9337.97..18115.71 rows=34489 width=244) (actual time=418.054..1156.453 rows=205420 loops=1)
                                             Hash Cond: (customerdetails.customerid = entity.id)
                                             ->  Seq Scan on customerdetails  (cost=0.00..4752.46 rows=327146 width=13) (actual time=0.021..176.389 rows=327328 loops=1)
                                             ->  Hash  (cost=6495.65..6495.65 rows=227386 width=231) (actual time=417.839..417.839 rows=205420 loops=1)
                                                   Buckets: 32768  Batches: 1  Memory Usage: 16056kB
                                                   ->  Index Scan using entity_setype_idx on entity  (cost=0.00..6495.65 rows=227386 width=231) (actual time=0.033..2
53.880 rows=205420 loops=1)
                                                         Index Cond: ((setype)::text = 'con_s'::text)
                           ->  Index Scan using con_address_pkey on con_address  (cost=0.00..0.27 rows=1 width=46) (actual time=0.003..0.004 rows=1 loops=2054
20)

As you see access methods estimates are ok, it is join result set which is wrong.

How to deal with it?

May be a hack with CTE can help, but is there a way to improve statistics correlation?

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Why does the number of rows are different in actual and estimated.

От
Claudio Freire
Дата:
On Thu, Dec 13, 2012 at 7:36 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 12/13/2012 05:12 PM, AI Rumman wrote:
>>
>> Why does the number of rows are different in actual and estimated?
>>
>
>
> Isn't that in the nature of estimates? An estimate is a heuristic guess at
> the number of rows it will find for the given query or part of a query. It's
> not uncommon for estimates to be out by several orders of magnitude.
> Guaranteeing estimates within bounded accuracy and in a given short amount
> of time (you don't want your planning time to overwhelm your execution time)
> isn't possible.

Although this kind of difference could be indeed a problem:
> Nested Loop  (cost=18560.97..26864.83 rows=24871 width=535) (actual time=1335.157..8492.414 rows=157953 loops=1)
>    ->  Hash Left Join  (cost=18560.97..26518.91 rows=116 width=454) (actual time=1335.117..6996.585 rows=205418
loops=1)

It usually is due to some unrecognized correlation between the joined tables.

And it looks like it all may be starting to go south here:
>                                        ->  Hash Join  (cost=9337.97..18115.71 rows=34489 width=244) (actual
time=418.054..1156.453rows=205420 loops=1) 
>                                              Hash Cond: (customerdetails.customerid = entity.id)


Re: Why does the number of rows are different in actual and estimated.

От
Andrew Dunstan
Дата:
On 12/13/2012 05:42 PM, Claudio Freire wrote:
> And it looks like it all may be starting to go south here:
>>                                         ->  Hash Join  (cost=9337.97..18115.71 rows=34489 width=244) (actual
time=418.054..1156.453rows=205420 loops=1) 
>>                                               Hash Cond: (customerdetails.customerid = entity.id)


Well, it looks like it's choosing a join order that's quite a bit
different from the way the query is expressed, so the OP might need to
play around with forcing the join order some.


cheers

andrew


Re: Why does the number of rows are different in actual and estimated.

От
Evgeny Shishkin
Дата:
On Dec 14, 2012, at 3:09 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

>
> On 12/13/2012 05:42 PM, Claudio Freire wrote:
>> And it looks like it all may be starting to go south here:
>>>                                        ->  Hash Join  (cost=9337.97..18115.71 rows=34489 width=244) (actual
time=418.054..1156.453rows=205420 loops=1) 
>>>                                              Hash Cond: (customerdetails.customerid = entity.id)
>
>
> Well, it looks like it's choosing a join order that's quite a bit different from the way the query is expressed, so
theOP might need to play around with forcing the join order some. 
>
>

OP joins 8 tables, and i suppose join collapse limit is set to default 8. I thought postgresql's optimiser is not
mysql's.

> cheers
>
> andrew
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



Re: Why does the number of rows are different in actual and estimated.

От
Tom Lane
Дата:
Evgeny Shishkin <itparanoia@gmail.com> writes:
> On Dec 14, 2012, at 3:09 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Well, it looks like it's choosing a join order that's quite a bit different from the way the query is expressed, so
theOP might need to play around with forcing the join order some. 

> OP joins 8 tables, and i suppose join collapse limit is set to default 8. I thought postgresql's optimiser is not
mysql's.

It's not obvious to me that there's anything very wrong with the plan.
An 8-way join that produces 150K rows is unlikely to run in milliseconds
no matter what the plan.  The planner would possibly have done the last
join step differently if it had had a better rowcount estimate, but even
if that were free the query would still have been 7 seconds (vs 8.5).

            regards, tom lane


Re: Why does the number of rows are different in actual and estimated.

От
Evgeny Shishkin
Дата:
On Dec 14, 2012, at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Evgeny Shishkin <itparanoia@gmail.com> writes:
>> On Dec 14, 2012, at 3:09 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> Well, it looks like it's choosing a join order that's quite a bit different from the way the query is expressed, so
theOP might need to play around with forcing the join order some. 
>
>> OP joins 8 tables, and i suppose join collapse limit is set to default 8. I thought postgresql's optimiser is not
mysql's.
>
> It's not obvious to me that there's anything very wrong with the plan.
> An 8-way join that produces 150K rows is unlikely to run in milliseconds
> no matter what the plan.  The planner would possibly have done the last
> join step differently if it had had a better rowcount estimate, but even
> if that were free the query would still have been 7 seconds (vs 8.5).
>

May be in this case it is. I once wrote to this list regarding similar problem - joining 4 tables, result set are off
by2257 times - 750ms vs less then 1ms. Unfortunately the question was not accepted to the list. 

I spoke to Bruce Momjian about that problem on one local conference, he said shit happens :)

>             regards, tom lane



Re: Why does the number of rows are different in actual and estimated.

От
Claudio Freire
Дата:
On Thu, Dec 13, 2012 at 8:50 PM, Evgeny Shishkin <itparanoia@gmail.com> wrote:
>>> OP joins 8 tables, and i suppose join collapse limit is set to default 8. I thought postgresql's optimiser is not
mysql's.
>>
>> It's not obvious to me that there's anything very wrong with the plan.
>> An 8-way join that produces 150K rows is unlikely to run in milliseconds
>> no matter what the plan.  The planner would possibly have done the last
>> join step differently if it had had a better rowcount estimate, but even
>> if that were free the query would still have been 7 seconds (vs 8.5).
>>
>
> May be in this case it is. I once wrote to this list regarding similar problem - joining 4 tables, result set are off
by2257 times - 750ms vs less then 1ms. Unfortunately the question was not accepted to the list. 
>
> I spoke to Bruce Momjian about that problem on one local conference, he said shit happens :)

I think it's more likely a missing FK constraint.


Re: Why does the number of rows are different in actual and estimated.

От
AI Rumman
Дата:
Does FK Constraint help to improve performance? Or it is only for maintaining data integrity?

On Thu, Dec 13, 2012 at 7:38 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Thu, Dec 13, 2012 at 8:50 PM, Evgeny Shishkin <itparanoia@gmail.com> wrote:
>>> OP joins 8 tables, and i suppose join collapse limit is set to default 8. I thought postgresql's optimiser is not mysql's.
>>
>> It's not obvious to me that there's anything very wrong with the plan.
>> An 8-way join that produces 150K rows is unlikely to run in milliseconds
>> no matter what the plan.  The planner would possibly have done the last
>> join step differently if it had had a better rowcount estimate, but even
>> if that were free the query would still have been 7 seconds (vs 8.5).
>>
>
> May be in this case it is. I once wrote to this list regarding similar problem - joining 4 tables, result set are off by 2257 times - 750ms vs less then 1ms. Unfortunately the question was not accepted to the list.
>
> I spoke to Bruce Momjian about that problem on one local conference, he said shit happens :)

I think it's more likely a missing FK constraint.

Re: Why does the number of rows are different in actual and estimated.

От
Claudio Freire
Дата:
On Fri, Dec 14, 2012 at 4:01 PM, AI Rumman <rummandba@gmail.com> wrote:
> Does FK Constraint help to improve performance? Or it is only for
> maintaining data integrity?

I'm not entirely sure it's taken into account, I think it is, but a FK
would tell the planner that every non-null value will produce a row.
It seems to think there are a large portion of non-null values that
don't.