Обсуждение: SeqScans on boolen values / How to speed this up?

Поиск
Список
Период
Сортировка

SeqScans on boolen values / How to speed this up?

От
Jens Hoffrichter
Дата:
Hello everyone,

We've recently finished developing a bigger webapplication, and we are
about to put it online.

I ran some load tests yesterday, and configured 'slow query' logging
beforehand, so I could see if there might be a performance bottleneck
in the PG. While I discovered no real problems, the log file analysis
using pgFouine revealed two queries, which are executed often, and
take quite a bit some time.

I'm just curious if there is any way to improve the performance of
those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing
I have done yet has removed those.

The statements and query plans are:

---- Query 1 -----

explain analyze SELECT
n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as
athletes from nations n left join persons p on n.nation_id =
p.nation_id left join efclicences e on p.person_id = e.person_id where
continent = 'eu' and  p.deleted = false and p.inactive = false and
e.fencer = true group by
n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short;
                                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9997.21..9997.32 rows=44 width=33) (actual
time=872.000..872.000 rows=44 loops=1)
   Sort Key: n.name_short
   Sort Method:  quicksort  Memory: 28kB
   ->  HashAggregate  (cost=9995.45..9996.01 rows=44 width=33) (actual
time=872.000..872.000 rows=44 loops=1)
         ->  Hash Join  (cost=5669.49..9611.83 rows=30690 width=33)
(actual time=332.000..720.000 rows=142240 loops=1)
               Hash Cond: (e.person_id = p.person_id)
               ->  Seq Scan on efclicences e  (cost=0.00..2917.29
rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1)
                     Filter: fencer
               ->  Hash  (cost=5285.87..5285.87 rows=30690 width=33)
(actual time=332.000..332.000 rows=142240 loops=1)
                     ->  Hash Join  (cost=7.10..5285.87 rows=30690
width=33) (actual time=0.000..256.000 rows=142240 loops=1)
                           Hash Cond: (p.nation_id = n.nation_id)
                           ->  Seq Scan on persons p
(cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000
rows=142418 loops=1)
                                 Filter: ((NOT deleted) AND (NOT inactive))
                           ->  Hash  (cost=6.55..6.55 rows=44
width=25) (actual time=0.000..0.000 rows=44 loops=1)
                                 ->  Seq Scan on nations n
(cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44
loops=1)
                                       Filter: ((continent)::text = 'eu'::text)
 Total runtime: 880.000 ms
(17 rows)

--- Query 2 ---
explain analyze SELECT persons.person_id AS persons_person_id FROM
persons LEFT OUTER JOIN indexing_persons ON persons.person_id =
indexing_persons.person_id WHERE  indexing_persons.person_id IS  NULL
OR persons.modified > indexing_persons.indexed ORDER  BY
persons.modified DESC LIMIT 1000;
                                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17755.23..17757.73 rows=1000 width=16) (actual
time=372.000..372.000 rows=0 loops=1)
   ->  Sort  (cost=17755.23..17994.61 rows=95753 width=16) (actual
time=372.000..372.000 rows=0 loops=1)
         Sort Key: persons.modified
         Sort Method:  quicksort  Memory: 25kB
         ->  Hash Left Join  (cost=4313.44..12505.20 rows=95753
width=16) (actual time=372.000..372.000 rows=0 loops=1)
               Hash Cond: (persons.person_id = indexing_persons.person_id)
               Filter: ((indexing_persons.person_id IS NULL) OR
(persons.modified > indexing_persons.indexed))
               ->  Seq Scan on persons  (cost=0.00..4438.29
rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1)
               ->  Hash  (cost=2534.86..2534.86 rows=142286 width=16)
(actual time=140.000..140.000 rows=143629 loops=1)
                     ->  Seq Scan on indexing_persons
(cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000
rows=143629 loops=1)
 Total runtime: 372.000 ms
(11 rows)

---- Table definitions ---

\d persons
                                            Table "public.persons"
       Column        |           Type           |
    Modifiers
---------------------+--------------------------+-------------------------------------------------------------
 person_id           | bigint                   | not null default
nextval('persons_person_id_seq'::regclass)
 givenname           | character varying(100)   | not null
 surname             | character varying(100)   | not null
 name_display_short  | character varying(20)    | not null
 name_display_long   | character varying(50)    | not null
 title               | character varying(50)    |
 postnominals        | character varying(10)    |
 gender              | character varying(1)     |
 dateofbirth         | date                     |
 nation_id           | bigint                   |
 club_id             | bigint                   |
 handed              | character varying(1)     |
 comment             | text                     |
 national_identifier | character varying(50)    |
 fie_identifier      | character varying(50)    |
 honorary_member_efc | boolean                  | not null
 honorary_member_fie | boolean                  | not null
 created             | timestamp with time zone | not null
 modified            | timestamp with time zone | not null
 dead                | boolean                  | not null
 inactive            | boolean                  | not null
 deleted             | boolean                  | not null
 urlidentifier       | character varying(50)    | not null
 profilepicture      | bigint                   |
 ophardt_identifier  | bigint                   |
 idtoken             | character varying(10)    |
 consolidated        | bigint                   |
Indexes:
    "persons_pkey" PRIMARY KEY, btree (person_id)
    "persons_urlidentifier_key" UNIQUE, btree (urlidentifier)
    "idx_persons_deleted" btree (deleted)
    "idx_persons_inactive" btree (inactive)
    "idx_persons_inactive_deleted" btree (inactive, deleted)
Foreign-key constraints:
    "persons_club_id_fkey" FOREIGN KEY (club_id) REFERENCES
clubs(club_id) ON UPDATE CASCADE ON DELETE SET NULL
    "persons_consolidated_fkey" FOREIGN KEY (consolidated) REFERENCES
persons(person_id) ON UPDATE CASCADE ON DELETE CASCADE
    "persons_nation_id_fkey" FOREIGN KEY (nation_id) REFERENCES
nations(nation_id)
Triggers:
    persons_modified BEFORE UPDATE ON persons FOR EACH ROW EXECUTE
PROCEDURE setmodified()


 \d nations
                                           Table "public.nations"
      Column       |           Type           |
  Modifiers
-------------------+--------------------------+-------------------------------------------------------------
 nation_id         | bigint                   | not null default
nextval('nations_nation_id_seq'::regclass)
 code              | character varying(3)     | not null
 name_short        | character varying(100)   | not null
 name_official     | character varying(200)   | not null
 name_official_en  | character varying(200)   | not null
 website           | character varying(255)   |
 flag              | character varying(255)   |
 comment           | text                     |
 geocode_longitude | double precision         |
 geocode_latitude  | double precision         |
 geocode_zoom      | double precision         |
 created           | timestamp with time zone | not null
 modified          | timestamp with time zone | not null
 inactive          | boolean                  | not null default false
 deleted           | boolean                  | not null default false
 efc               | boolean                  | not null
 subname           | character varying(255)   |
 street            | character varying(255)   |
 postcode          | character varying(255)   |
 city              | character varying(255)   |
 country           | character varying(255)   |
 fax               | character varying(255)   |
 mobile            | character varying(255)   |
 phone             | character varying(255)   |
 email             | character varying(255)   |
 urlidentifier     | character varying(50)    | not null
 continent         | character varying(2)     | not null default
'eu'::character varying
 logo_p2picture_id | bigint                   |
 idtoken           | character varying(10)    |
Indexes:
    "nations_pkey" PRIMARY KEY, btree (nation_id)
Foreign-key constraints:
    "nations_logo_p2picture_id_fkey" FOREIGN KEY (logo_p2picture_id)
REFERENCES p2picture(picture_id) ON UPDATE CASCADE ON DELETE CASCADE


 \d efclicences
                                           Table "public.efclicences"
    Column     |           Type           |
  Modifiers
---------------+--------------------------+---------------------------------------------------------------------
 efclicence_id | bigint                   | not null default
nextval('efclicences_efclicence_id_seq'::regclass)
 person_id     | bigint                   | not null
 valid_from    | date                     | not null
 valid_to      | date                     |
 created       | timestamp with time zone | not null
 modified      | timestamp with time zone | not null
 inactive      | boolean                  | not null
 fencer        | boolean                  | not null
 official      | boolean                  | not null
 referee       | boolean                  | not null
 member        | boolean                  | not null
Indexes:
    "efclicences_pkey" PRIMARY KEY, btree (efclicence_id)
Foreign-key constraints:
    "efclicences_person_id_fkey" FOREIGN KEY (person_id) REFERENCES
persons(person_id) ON UPDATE CASCADE ON DELETE CASCADE

 \d indexing_persons
                                       Table "public.indexing_persons"
  Column   |           Type           |                              Modifiers
-----------+--------------------------+----------------------------------------------------------------------
 person_id | bigint                   | not null default
nextval('indexing_persons_person_id_seq'::regclass)
 indexed   | timestamp with time zone |
Indexes:
    "indexing_persons_pkey" PRIMARY KEY, btree (person_id)
Foreign-key constraints:
    "indexing_persons_person_id_fkey" FOREIGN KEY (person_id)
REFERENCES persons(person_id) ON DELETE CASCADE


--- Additional info ---

These are mostly stock table definitions, and not much has done yet to
improve performance there.

Autovacuuming is turned on for the PG, I have increased the available
memory a bit (as the db server as 4 GB of RAM), and added logging
options to the stock Debian configuration, but nothing more.

The system in a XEN vServer running on 4 Cores, with those said 4 GB of RAM.

It is nothing deal breaking at the moment, the performance of those
queries, as we don't have a problem at the moment, but I'm curious to
learn more about query optimization, to maybe be able to analyze and
correct problems in the future myself, so any help and remarks are
greatly appreciated.

Thanks in advance!

Jens

Re: SeqScans on boolen values / How to speed this up?

От
Stephen Frost
Дата:
Jens,

* Jens Hoffrichter (jens.hoffrichter@gmail.com) wrote:
> I'm just curious if there is any way to improve the performance of
> those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing
> I have done yet has removed those.

SeqScans aren't necessairly bad.  Also, providing your postgresql.conf
parameters would be useful in doing any kind of analysis work like this.

For starters, why are you using left joins for these queries?  When you
use a left-join and then have a filter on the right-hand table that
requires it to be non-null, you're causing it to be an inner join
anyway.  Fixing that might change/improve the plans you're getting.

> The statements and query plans are:
>
> ---- Query 1 -----
>
> explain analyze SELECT
> n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as
> athletes from nations n left join persons p on n.nation_id =
> p.nation_id left join efclicences e on p.person_id = e.person_id where
> continent = 'eu' and  p.deleted = false and p.inactive = false and
> e.fencer = true group by
> n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short;

Alright, for this one, you're processing 144k rows in persons
up into the aggregate, how big is the table?  If it's anything less than
1M, seqscanning that is almost certainly the fastest way.  You could
*test* that theory by disabling seqscans and running the query again for
the timing.  If it's faster, then you probably need to adjust some PG
parameters (eg: effective_cache_size, maybe random_page_cost) for your
system.

>                                                                    QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=9997.21..9997.32 rows=44 width=33) (actual
> time=872.000..872.000 rows=44 loops=1)
>    Sort Key: n.name_short
>    Sort Method:  quicksort  Memory: 28kB
>    ->  HashAggregate  (cost=9995.45..9996.01 rows=44 width=33) (actual
> time=872.000..872.000 rows=44 loops=1)
>          ->  Hash Join  (cost=5669.49..9611.83 rows=30690 width=33)
> (actual time=332.000..720.000 rows=142240 loops=1)
>                Hash Cond: (e.person_id = p.person_id)
>                ->  Seq Scan on efclicences e  (cost=0.00..2917.29
> rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1)
>                      Filter: fencer
>                ->  Hash  (cost=5285.87..5285.87 rows=30690 width=33)
> (actual time=332.000..332.000 rows=142240 loops=1)
>                      ->  Hash Join  (cost=7.10..5285.87 rows=30690
> width=33) (actual time=0.000..256.000 rows=142240 loops=1)
>                            Hash Cond: (p.nation_id = n.nation_id)
>                            ->  Seq Scan on persons p
> (cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000
> rows=142418 loops=1)
>                                  Filter: ((NOT deleted) AND (NOT inactive))
>                            ->  Hash  (cost=6.55..6.55 rows=44
> width=25) (actual time=0.000..0.000 rows=44 loops=1)
>                                  ->  Seq Scan on nations n
> (cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44
> loops=1)
>                                        Filter: ((continent)::text = 'eu'::text)
>  Total runtime: 880.000 ms
> (17 rows)
>
> --- Query 2 ---
> explain analyze SELECT persons.person_id AS persons_person_id FROM
> persons LEFT OUTER JOIN indexing_persons ON persons.person_id =
> indexing_persons.person_id WHERE  indexing_persons.person_id IS  NULL
> OR persons.modified > indexing_persons.indexed ORDER  BY
> persons.modified DESC LIMIT 1000;

For this one, you might try indexing persons.modified and
indexing_persons.indexed and see if that changes things.

>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=17755.23..17757.73 rows=1000 width=16) (actual
> time=372.000..372.000 rows=0 loops=1)
>    ->  Sort  (cost=17755.23..17994.61 rows=95753 width=16) (actual
> time=372.000..372.000 rows=0 loops=1)
>          Sort Key: persons.modified
>          Sort Method:  quicksort  Memory: 25kB
>          ->  Hash Left Join  (cost=4313.44..12505.20 rows=95753
> width=16) (actual time=372.000..372.000 rows=0 loops=1)
>                Hash Cond: (persons.person_id = indexing_persons.person_id)
>                Filter: ((indexing_persons.person_id IS NULL) OR
> (persons.modified > indexing_persons.indexed))
>                ->  Seq Scan on persons  (cost=0.00..4438.29
> rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1)
>                ->  Hash  (cost=2534.86..2534.86 rows=142286 width=16)
> (actual time=140.000..140.000 rows=143629 loops=1)
>                      ->  Seq Scan on indexing_persons
> (cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000
> rows=143629 loops=1)
>  Total runtime: 372.000 ms
> (11 rows)

    Thanks,

        Stephen

Вложения

Re: SeqScans on boolen values / How to speed this up?

От
Craig Ringer
Дата:
On 05/07/10 19:36, Jens Hoffrichter wrote:
> Hello everyone,
>
> We've recently finished developing a bigger webapplication, and we are
> about to put it online.
>
> I ran some load tests yesterday, and configured 'slow query' logging
> beforehand, so I could see if there might be a performance bottleneck
> in the PG. While I discovered no real problems, the log file analysis
> using pgFouine revealed two queries, which are executed often, and
> take quite a bit some time.

It might be worth looking at what queries have results that change
infrequently or don't have to be up to the minute accurate, so they're
candidates for caching. Memcached is an incredibly handy tool for taking
load off your database.

--
Craig Ringer

Re: SeqScans on boolen values / How to speed this up?

От
Scott Marlowe
Дата:
On Mon, Jul 5, 2010 at 5:36 AM, Jens Hoffrichter
<jens.hoffrichter@gmail.com> wrote:
> Hello everyone,
>
> We've recently finished developing a bigger webapplication, and we are
> about to put it online.

If you're checking for bools, and 99.99% of the result is just true or
just false, look at creating partial indexes on the .01% part.

create index .... (boolfield) where boolfield is true

(or is false)