Re:

Поиск
Список
Период
Сортировка
От Jeff Cole
Тема Re:
Дата
Msg-id 43756010-6EF6-4F91-B84E-E7D29599C3E8@gmail.com
обсуждение исходный текст
Ответ на Re:  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re:  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Mar 5, 2007, at 8:54 PM, Tom Lane wrote:

> Hm, the cost for the upper nestloop is way less than you would expect
> given that the HASH IN join is going to have to be repeated 100+
> times.
> I think this must be due to a very low "join_in_selectivity" estimate
> but I'm not sure why you are getting that, especially seeing that the
> rowcount estimates aren't far off.  Can you show us the pg_stats
> rows for symptoms.id and symptom_reports.symptom_id?
>

Hi Tom, thanks for the response.  Here are the pg_stats.  I think I
understand what the stats say, but I don't know what to conclude from
them.


plm_stage=# select * from pg_stats where tablename = 'symptoms' and
attname = 'id';
schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs |
histogram_bounds           | correlation
------------+-----------+---------+-----------+-----------
+------------+------------------+-------------------
+-------------------------------------+-------------
public     | symptoms  | id      |         0 |         4 |         -1
|                  |                   |
{1,11,24,34,46,57,71,85,95,106,117} |    0.451606


plm_stage=# select * from pg_stats where tablename =
'symptom_reports' and attname = 'symptom_id';
schemaname |    tablename    |  attname   | null_frac | avg_width |
n_distinct |    most_common_vals
|
most_common_freqs                                   |
histogram_bounds           | correlation
------------+-----------------+------------+-----------+-----------
+------------+------------------------
+-----------------------------------------------------------------------
---------------+-------------------------------------+-------------
public     | symptom_reports | symptom_id |         0 |         4
|         80 | {3,2,4,1,5,8,9,7,10,6} |
{0.094,0.0933333,0.0933333,0.092,0.0913333,0.0903333,0.0866667,0.0843333
,0.084,0.08} | {12,18,24,30,38,44,51,57,91,91,114} |   0.0955925



And Ismo, I followed your suggestion to re-write the SQL more
cleanly, and you are right it was faster, so that is certainly a
solution.  Although I am still curious why my original query slowed
down after the vacuum analyze.  In any case, here is the explain
analyze from the new query.   Compare that to the  3441.452 ms of the
old query after the analyze (and 134.045 ms before the analyze):

plm_stage=# explain analyze SELECT count(distinct s.id) AS count_all
FROM symptoms s ,symptom_reports sr,users u WHERE s.id=sr.symptom_id
and sr.user_id=u.id and u.disease_id in (1);
                                                                QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Aggregate  (cost=770.05..770.06 rows=1 width=4) (actual
time=176.749..176.751 rows=1 loops=1)
    ->  Hash Join  (cost=89.43..737.50 rows=13020 width=4) (actual
time=7.762..142.063 rows=13038 loops=1)
          Hash Cond: ("outer".symptom_id = "inner".id)
          ->  Hash Join  (cost=86.09..538.86 rows=13020 width=4)
(actual time=7.277..89.293 rows=13038 loops=1)
                Hash Cond: ("outer".user_id = "inner".id)
                ->  Seq Scan on symptom_reports sr
(cost=0.00..257.38 rows=13038 width=8) (actual time=0.003..30.499
rows=13038 loops=1)
                ->  Hash  (cost=82.41..82.41 rows=1471 width=4)
(actual time=7.261..7.261 rows=1471 loops=1)
                      ->  Seq Scan on users u  (cost=0.00..82.41
rows=1471 width=4) (actual time=0.006..4.133 rows=1471 loops=1)
                            Filter: (disease_id = 1)
          ->  Hash  (cost=3.07..3.07 rows=107 width=4) (actual
time=0.469..0.469 rows=107 loops=1)
                ->  Seq Scan on symptoms s  (cost=0.00..3.07 rows=107
width=4) (actual time=0.007..0.247 rows=107 loops=1)
Total runtime: 176.842 ms
(12 rows)



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Estimate the size of the SQL file generated by pg_dump utility
Следующее
От: Carlos Moreno
Дата:
Сообщение: Re: Insert performance