Re: SeqScans on boolen values / How to speed this up?

От: Stephen Frost
Тема: Re: SeqScans on boolen values / How to speed this up?
Дата: ,
Msg-id: 20100705134707.GK21875@tamriel.snowman.net
(см: обсуждение, исходный текст)
Ответ на: SeqScans on boolen values / How to speed this up?  (Jens Hoffrichter)
Список: pgsql-performance

Скрыть дерево обсуждения

SeqScans on boolen values / How to speed this up?  (Jens Hoffrichter, )
 Re: SeqScans on boolen values / How to speed this up?  (Stephen Frost, )
 Re: SeqScans on boolen values / How to speed this up?  (Craig Ringer, )
 Re: SeqScans on boolen values / How to speed this up?  (Scott Marlowe, )

Jens,

* Jens Hoffrichter () wrote:
> I'm just curious if there is any way to improve the performance of
> those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing
> I have done yet has removed those.

SeqScans aren't necessairly bad.  Also, providing your postgresql.conf
parameters would be useful in doing any kind of analysis work like this.

For starters, why are you using left joins for these queries?  When you
use a left-join and then have a filter on the right-hand table that
requires it to be non-null, you're causing it to be an inner join
anyway.  Fixing that might change/improve the plans you're getting.

> The statements and query plans are:
>
> ---- Query 1 -----
>
> explain analyze SELECT
> n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as
> athletes from nations n left join persons p on n.nation_id =
> p.nation_id left join efclicences e on p.person_id = e.person_id where
> continent = 'eu' and  p.deleted = false and p.inactive = false and
> e.fencer = true group by
> n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short;

Alright, for this one, you're processing 144k rows in persons
up into the aggregate, how big is the table?  If it's anything less than
1M, seqscanning that is almost certainly the fastest way.  You could
*test* that theory by disabling seqscans and running the query again for
the timing.  If it's faster, then you probably need to adjust some PG
parameters (eg: effective_cache_size, maybe random_page_cost) for your
system.

>                                                                    QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=9997.21..9997.32 rows=44 width=33) (actual
> time=872.000..872.000 rows=44 loops=1)
>    Sort Key: n.name_short
>    Sort Method:  quicksort  Memory: 28kB
>    ->  HashAggregate  (cost=9995.45..9996.01 rows=44 width=33) (actual
> time=872.000..872.000 rows=44 loops=1)
>          ->  Hash Join  (cost=5669.49..9611.83 rows=30690 width=33)
> (actual time=332.000..720.000 rows=142240 loops=1)
>                Hash Cond: (e.person_id = p.person_id)
>                ->  Seq Scan on efclicences e  (cost=0.00..2917.29
> rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1)
>                      Filter: fencer
>                ->  Hash  (cost=5285.87..5285.87 rows=30690 width=33)
> (actual time=332.000..332.000 rows=142240 loops=1)
>                      ->  Hash Join  (cost=7.10..5285.87 rows=30690
> width=33) (actual time=0.000..256.000 rows=142240 loops=1)
>                            Hash Cond: (p.nation_id = n.nation_id)
>                            ->  Seq Scan on persons p
> (cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000
> rows=142418 loops=1)
>                                  Filter: ((NOT deleted) AND (NOT inactive))
>                            ->  Hash  (cost=6.55..6.55 rows=44
> width=25) (actual time=0.000..0.000 rows=44 loops=1)
>                                  ->  Seq Scan on nations n
> (cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44
> loops=1)
>                                        Filter: ((continent)::text = 'eu'::text)
>  Total runtime: 880.000 ms
> (17 rows)
>
> --- Query 2 ---
> explain analyze SELECT persons.person_id AS persons_person_id FROM
> persons LEFT OUTER JOIN indexing_persons ON persons.person_id =
> indexing_persons.person_id WHERE  indexing_persons.person_id IS  NULL
> OR persons.modified > indexing_persons.indexed ORDER  BY
> persons.modified DESC LIMIT 1000;

For this one, you might try indexing persons.modified and
indexing_persons.indexed and see if that changes things.

>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=17755.23..17757.73 rows=1000 width=16) (actual
> time=372.000..372.000 rows=0 loops=1)
>    ->  Sort  (cost=17755.23..17994.61 rows=95753 width=16) (actual
> time=372.000..372.000 rows=0 loops=1)
>          Sort Key: persons.modified
>          Sort Method:  quicksort  Memory: 25kB
>          ->  Hash Left Join  (cost=4313.44..12505.20 rows=95753
> width=16) (actual time=372.000..372.000 rows=0 loops=1)
>                Hash Cond: (persons.person_id = indexing_persons.person_id)
>                Filter: ((indexing_persons.person_id IS NULL) OR
> (persons.modified > indexing_persons.indexed))
>                ->  Seq Scan on persons  (cost=0.00..4438.29
> rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1)
>                ->  Hash  (cost=2534.86..2534.86 rows=142286 width=16)
> (actual time=140.000..140.000 rows=143629 loops=1)
>                      ->  Seq Scan on indexing_persons
> (cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000
> rows=143629 loops=1)
>  Total runtime: 372.000 ms
> (11 rows)

    Thanks,

        Stephen

Вложения

В списке pgsql-performance по дате сообщения:

От: MUHAMMAD ASIF
Дата:
Сообщение: Re: using dbt2 postgresql 8.4 - rampup time issue
От: Robert Haas
Дата:
Сообщение: Re: big data - slow select (speech search)