Re: optimizing query

Поиск
Список
Период
Сортировка
От Chantal Ackermann
Тема Re: optimizing query
Дата
Msg-id 3E2FB2D1.9020904@biomax.de
обсуждение исходный текст
Ответ на Re: optimizing query  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Ответы Re: optimizing query  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: [PERFORM] optimizing query  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
hi Stephan,

thank you for your reply.

I ran vacuum analyze before calling explain. As this is a newly built
database where no rows have been deleted, yet, I thought vacuum full
would have no effect. In fact, BEFORE running vacuum full, the cost of
the query is estimates by explain analyze as 33 secs, and AFTER running
it, the cost is estimate to be 43 secs??? (Hey, I want at least the 10
secs back ;-) )

I have just installed this database on a "bigger" (see the system info
further down) machine, and I expected the queries would run _really_
fast. especially, as there is a lot more data to be inserted in the
occurrences tables.

This is the row count of the tables and the output of explain analyze
before and after running vacuum full (after that, I listed some system
and postgresql information):

relate=# select count(*) from gene;
  count
--------
  218085
(1 row)

relate=# select count(*) from disease;
  count
--------
  164597
(1 row)

relate=# select count(*) from disease_occurrences_puid;
  count
--------
  471915
(1 row)

relate=# select count(*) from gene_occurrences_puid;
  count
--------
  339347
(1 row)

relate=# explain analyze SELECT DISTINCT gene.gene_name,
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid,
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND
gene.gene_id=gene_occurrences_puid.gene_id;

               QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=342175.89..352511.77 rows=137812 width=33) (actual
time=32112.66..33139.23 rows=219435 loops=1)
    ->  Sort  (cost=342175.89..345621.18 rows=1378118 width=33) (actual
time=32112.65..32616.14 rows=695158 loops=1)
          Sort Key: gene.gene_name, gene_occurrences_puid.puid
          ->  Merge Join  (cost=63671.50..98237.36 rows=1378118
width=33) (actual time=10061.83..17940.02 rows=695158 loops=1)
                Merge Cond: ("outer".puid = "inner".puid)
                ->  Index Scan using disease_occpd_puid_i on
disease_occurrences_puid  (cost=0.00..14538.05 rows=471915 width=4)
(actual time=0.03..3917.99 rows=471915 loops=1)
                ->  Sort  (cost=63671.50..64519.87 rows=339347 width=29)
(actual time=10061.69..10973.64 rows=815068 loops=1)
                      Sort Key: gene_occurrences_puid.puid
                      ->  Merge Join  (cost=0.00..22828.18 rows=339347
width=29) (actual time=0.21..3760.59 rows=339347 loops=1)
                            Merge Cond: ("outer".gene_id = "inner".gene_id)
                            ->  Index Scan using gene_pkey on gene
(cost=0.00..7668.59 rows=218085 width=21) (actual time=0.02..955.19
rows=218073 loops=1)
                            ->  Index Scan using gene_id_puid_uni on
gene_occurrences_puid  (cost=0.00..9525.57 rows=339347 width=8) (actual
time=0.02..1523.81 rows=339347 loops=1)
  Total runtime: 33244.81 msec
(13 rows)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
AFTER

relate=# vacuum full verbose analyze;

relate=# explain analyze SELECT DISTINCT gene.gene_name,
gene_occurrences_puid.puid FROM gene, disease_occurrences_puid,
gene_occurrences_puid WHERE
disease_occurrences_puid.puid=gene_occurrences_puid.puid AND
gene.gene_id=gene_occurrences_puid.gene_id;

               QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=359069.64..369948.41 rows=145050 width=33) (actual
time=42195.60..43229.04 rows=219435 loops=1)
    ->  Sort  (cost=359069.64..362695.90 rows=1450503 width=33) (actual
time=42195.59..42694.70 rows=695158 loops=1)
          Sort Key: gene.gene_name, gene_occurrences_puid.puid
          ->  Merge Join  (cost=63732.51..99264.24 rows=1450503
width=33) (actual time=13172.40..27973.79 rows=695158 loops=1)
                Merge Cond: ("outer".puid = "inner".puid)
                ->  Index Scan using disease_occpd_puid_i on
disease_occurrences_puid  (cost=0.00..14543.06 rows=471915 width=4)
(actual time=36.50..10916.29 rows=471915 loops=1)
                ->  Sort  (cost=63732.51..64580.88 rows=339347 width=29)
(actual time=13126.56..14048.38 rows=815068 loops=1)
                      Sort Key: gene_occurrences_puid.puid
                      ->  Merge Join  (cost=0.00..22889.19 rows=339347
width=29) (actual time=58.00..6775.55 rows=339347 loops=1)
                            Merge Cond: ("outer".gene_id = "inner".gene_id)
                            ->  Index Scan using gene_pkey on gene
(cost=0.00..7739.91 rows=218085 width=21) (actual time=29.00..3416.01
rows=218073
loops=1)
                            ->  Index Scan using gene_id_puid_uni on
gene_occurrences_puid  (cost=0.00..9525.57 rows=339347 width=8) (actual
time=28.69..1936.83 rows=339347 loops=1)
  Total runtime: 43338.94 msec

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Postgres Version: 7.3.1
CPU: 1666.767 MHz
RAM: 2070492 kB
shmmax/shmall: 1048576000

postgresql.conf:
shared_buffers: 121600
max_connections: 64
max_fsm_relations = 200
max_fsm_pages = 40000
effective_cache_size = 8000

********************************************************************

Thank you again for your interest and help!

Chantal


Stephan Szabo wrote:
> (Replying to general and performance in a hope to move this
> to performance after a couple of replies).
>
> On Wed, 22 Jan 2003, Chantal Ackermann wrote:
>
>
>>I am getting the following output from EXPLAIN, concerning a query with
>>joins. The merge uses index scans but takes too long, in my opinion. The
>>query is in fact only a part (subquery) of another one, but it is the
>>bottle neck.
>>
>>As I am quite ignorant in optimizing queries, and I have no idea where
>>to find documentation on the net on how to learn optimizing my queries,
>>I am posting this here in hope someone will give me either tips how to
>>optimize, or where to find some tutorial that could help me get along on
>>my own.
>>
>>dropping the "DISTINCT" has some effect, but I can't really do without.
>
>
> The first thing is, have you done ANALYZE recently to make sure that the
> statistics are correct and what does EXPLAIN ANALYZE give you (that will
> run the query and give the estimate and actual).  Also, if you haven't
> vacuumed recently, you may want to vacuum full.
>
> How many rows are there on gene, disease and both occurrances tables?
> I'd wonder if perhaps using explicit sql join syntax (which postgres uses
> to constrain order) to join disease and disease_occurrences_puid before
> joining it to the other two would be better or worse in practice.
>
>


В списке pgsql-general по дате отправления:

Предыдущее
От: "Rick Gigger"
Дата:
Сообщение: embedded postgres
Следующее
От: "Bjorn Metzdorf"
Дата:
Сообщение: Re: tsearch comments