Обсуждение: Query much faster with enable_seqscan=0

Поиск
Список
Период
Сортировка

Query much faster with enable_seqscan=0

От
Ogden
Дата:
Hello,

I have received some help from the IRC channel, however, the problem still exists. When running the following query with enable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. The machine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.


Query:
SELECT tr.id, tr.sid FROM test_registration tr, INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) WHERE. tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid GROUP BY tr.id, tr.sid



demo=# \d test_registration Table "public.test_registration" Column | Type | Modifiers ------------------------+-----------------------------+------------------------ id | uuid | not null sid | character varying(36) | not null created_date | timestamp without time zone | not null default now() modified_date | timestamp without time zone | not null test_administration_id | uuid | not null teacher_number | character varying(15) | test_version_id | uuid | Indexes: "test_registration_pkey" PRIMARY KEY, btree (id) "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id) "test_registration_teacher" btree (teacher_number) "test_registration_test_id" btree (test_administration_id) demo=# \d test_registration_result Table "public.test_registration_result" Column | Type | Modifiers ----------------------+-----------------------+----------- answer | character varying(15) | question_id | uuid | not null score | double precision | test_registration_id | uuid | not null Indexes: "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id) "test_registration_result_answer" btree (test_registration_id, answer, score) "test_registration_result_test" btree (test_registration_id)

Explain Analyze: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1) -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1) Hash Cond: (r.test_registration_id = tr.id) -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101 rows=37198734 loops=1) -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 160kB -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458 rows=2972 loops=1) Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0) (actual time=0.507..0.507 rows=2972 loops=1) Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) Total runtime: 14670.337 ms (11 rows) real 0m14.698s user 0m0.000s sys 0m0.008s


With "set enable_seqscan=0;" SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1) -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1) -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840 width=25) (actual time=0.050..1.610 rows=2972 loops=1) Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid) -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792 width=16) (actual time=0.019..0.106 rows=215 loops=2972) Index Cond: (r.test_registration_id = tr.id) Total runtime: 749.745 ms (7 rows) real 0m0.759s user 0m0.008s sys 0m0.000s


The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:

shared_buffers = 2048MB
work_mem = 8MB
maintenance_work_mem = 256MB 
wal_buffers = 640kB
random_page_cost = 4.0
effective_cache_size = 7000MB
default_statistics_target = 200


free -m:
total used free shared buffers cached Mem: 8003 7849 153 0 25 7555 -/+ buffers/cache: 268 7735 Swap: 7640 0 7639


Any help would be appreciated. Thank you very much.

Ogden

Re: Query much faster with enable_seqscan=0

От
Kenneth Marshall
Дата:
You DB is more than likely cached. You should adjust your
page costs to better reflect reality and then the planner
can make more accurate estimates and then choose the proper
plan.

Cheers,
Ken

On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote:
> Hello,
>
> I have received some help from the IRC channel, however, the problem still exists. When running the following query
withenable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds.
Themachine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.  
>
>
> Query:
> SELECT tr.id, tr.sid
>             FROM
>             test_registration tr,
>             INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>             WHERE.
>             tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>             GROUP BY tr.id, tr.sid
>
>
>
> demo=# \d test_registration
>                        Table "public.test_registration"
>          Column         |            Type             |       Modifiers
> ------------------------+-----------------------------+------------------------
>  id                     | uuid                        | not null
>  sid                    | character varying(36)       | not null
>  created_date           | timestamp without time zone | not null default now()
>  modified_date          | timestamp without time zone | not null
>  test_administration_id | uuid                        | not null
>  teacher_number         | character varying(15)       |
>  test_version_id        | uuid                        |
> Indexes:
>     "test_registration_pkey" PRIMARY KEY, btree (id)
>     "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id)
>     "test_registration_teacher" btree (teacher_number)
>     "test_registration_test_id" btree (test_administration_id)
>
> demo=# \d test_registration_result
>          Table "public.test_registration_result"
>         Column        |         Type          | Modifiers
> ----------------------+-----------------------+-----------
>  answer               | character varying(15) |
>  question_id          | uuid                  | not null
>  score                | double precision      |
>  test_registration_id | uuid                  | not null
> Indexes:
>     "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id)
>     "test_registration_result_answer" btree (test_registration_id, answer, score)
>     "test_registration_result_test" btree (test_registration_id)
>
>
> Explain Analyze:
>
>
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1)
>    ->  Hash Join  (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980
loops=1)
>          Hash Cond: (r.test_registration_id = tr.id)
>          ->  Seq Scan on test_registration_result r  (cost=0.00..681946.72 rows=37199972 width=16) (actual
time=0.015..6073.101rows=37198734 loops=1) 
>          ->  Hash  (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1)
>                Buckets: 1024  Batches: 1  Memory Usage: 160kB
>                ->  Bitmap Heap Scan on test_registration tr  (cost=44.29..2952.57 rows=2840 width=25) (actual
time=0.528..1.458rows=2972 loops=1) 
>                      Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>                      ->  Bitmap Index Scan on test_registration_test_administration_id  (cost=0.00..43.58 rows=2840
width=0)(actual time=0.507..0.507 rows=2972 loops=1) 
>                            Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>  Total runtime: 14670.337 ms
> (11 rows)
>
>
> real    0m14.698s
> user    0m0.000s
> sys    0m0.008s
>
>
> With "set enable_seqscan=0;"
>
>
> SET
>                                                                                      QUERY PLAN
                                                               
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1)
>    ->  Nested Loop  (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1)
>          ->  Index Scan using test_registration_test_administration_id on test_registration tr  (cost=0.00..4413.96
rows=2840width=25) (actual time=0.050..1.610 rows=2972 loops=1) 
>                Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>          ->  Index Scan using test_registration_result_answer on test_registration_result r  (cost=0.00..416.07
rows=792width=16) (actual time=0.019..0.106 rows=215 loops=2972) 
>                Index Cond: (r.test_registration_id = tr.id)
>  Total runtime: 749.745 ms
> (7 rows)
>
>
> real    0m0.759s
> user    0m0.008s
> sys    0m0.000s
>
>
> The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:
>
> shared_buffers = 2048MB
> work_mem = 8MB
> maintenance_work_mem = 256MB
> wal_buffers = 640kB
> random_page_cost = 4.0
> effective_cache_size = 7000MB
> default_statistics_target = 200
>
>
> free -m:
>              total       used       free     shared    buffers     cached
> Mem:          8003       7849        153          0         25       7555
> -/+ buffers/cache:        268       7735
> Swap:         7640          0       7639
>
>
> Any help would be appreciated. Thank you very much.
>
> Ogden

Re: Query much faster with enable_seqscan=0

От
Ogden
Дата:
I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?

Thank you

Ogden


On Sep 21, 2010, at 1:06 PM, Kenneth Marshall wrote:

> You DB is more than likely cached. You should adjust your
> page costs to better reflect reality and then the planner
> can make more accurate estimates and then choose the proper
> plan.
>
> Cheers,
> Ken
>
> On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote:
>> Hello,
>>
>> I have received some help from the IRC channel, however, the problem still exists. When running the following query
withenable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds.
Themachine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.  
>>
>>
>> Query:
>> SELECT tr.id, tr.sid
>>            FROM
>>            test_registration tr,
>>            INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>>            WHERE.
>>            tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>>            GROUP BY tr.id, tr.sid
>>
>>
>>
>> demo=# \d test_registration
>>                       Table "public.test_registration"
>>         Column         |            Type             |       Modifiers
>> ------------------------+-----------------------------+------------------------
>> id                     | uuid                        | not null
>> sid                    | character varying(36)       | not null
>> created_date           | timestamp without time zone | not null default now()
>> modified_date          | timestamp without time zone | not null
>> test_administration_id | uuid                        | not null
>> teacher_number         | character varying(15)       |
>> test_version_id        | uuid                        |
>> Indexes:
>>    "test_registration_pkey" PRIMARY KEY, btree (id)
>>    "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id)
>>    "test_registration_teacher" btree (teacher_number)
>>    "test_registration_test_id" btree (test_administration_id)
>>
>> demo=# \d test_registration_result
>>         Table "public.test_registration_result"
>>        Column        |         Type          | Modifiers
>> ----------------------+-----------------------+-----------
>> answer               | character varying(15) |
>> question_id          | uuid                  | not null
>> score                | double precision      |
>> test_registration_id | uuid                  | not null
>> Indexes:
>>    "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id)
>>    "test_registration_result_answer" btree (test_registration_id, answer, score)
>>    "test_registration_result_test" btree (test_registration_id)
>>
>>
>> Explain Analyze:
>>
>>
>>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HashAggregate  (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1)
>>   ->  Hash Join  (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980
loops=1)
>>         Hash Cond: (r.test_registration_id = tr.id)
>>         ->  Seq Scan on test_registration_result r  (cost=0.00..681946.72 rows=37199972 width=16) (actual
time=0.015..6073.101rows=37198734 loops=1) 
>>         ->  Hash  (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1)
>>               Buckets: 1024  Batches: 1  Memory Usage: 160kB
>>               ->  Bitmap Heap Scan on test_registration tr  (cost=44.29..2952.57 rows=2840 width=25) (actual
time=0.528..1.458rows=2972 loops=1) 
>>                     Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>>                     ->  Bitmap Index Scan on test_registration_test_administration_id  (cost=0.00..43.58 rows=2840
width=0)(actual time=0.507..0.507 rows=2972 loops=1) 
>>                           Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>> Total runtime: 14670.337 ms
>> (11 rows)
>>
>>
>> real    0m14.698s
>> user    0m0.000s
>> sys    0m0.008s
>>
>>
>> With "set enable_seqscan=0;"
>>
>>
>> SET
>>                                                                                     QUERY PLAN
                                                               
>>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HashAggregate  (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1)
>>   ->  Nested Loop  (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1)
>>         ->  Index Scan using test_registration_test_administration_id on test_registration tr  (cost=0.00..4413.96
rows=2840width=25) (actual time=0.050..1.610 rows=2972 loops=1) 
>>               Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>>         ->  Index Scan using test_registration_result_answer on test_registration_result r  (cost=0.00..416.07
rows=792width=16) (actual time=0.019..0.106 rows=215 loops=2972) 
>>               Index Cond: (r.test_registration_id = tr.id)
>> Total runtime: 749.745 ms
>> (7 rows)
>>
>>
>> real    0m0.759s
>> user    0m0.008s
>> sys    0m0.000s
>>
>>
>> The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and
database:
>>
>> shared_buffers = 2048MB
>> work_mem = 8MB
>> maintenance_work_mem = 256MB
>> wal_buffers = 640kB
>> random_page_cost = 4.0
>> effective_cache_size = 7000MB
>> default_statistics_target = 200
>>
>>
>> free -m:
>>             total       used       free     shared    buffers     cached
>> Mem:          8003       7849        153          0         25       7555
>> -/+ buffers/cache:        268       7735
>> Swap:         7640          0       7639
>>
>>
>> Any help would be appreciated. Thank you very much.
>>
>> Ogden
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Query much faster with enable_seqscan=0

От
Jesper Krogh
Дата:
On 2010-09-21 20:21, Ogden wrote:
> I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?
>

Should be lowered to a bit over seq_page_cost..  and more importantly..
you should
make sure that you have updated your statistics .. run "ANALYZE";

--
Jesper

Re: Query much faster with enable_seqscan=0

От
Ogden
Дата:
How odd, I set the following:

seq_page_cost = 1.0
random_page_cost = 2.0

And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZE
now- how often is it recommended to do this? 

Thank you

Ogden


On Sep 21, 2010, at 1:51 PM, Jesper Krogh wrote:

> On 2010-09-21 20:21, Ogden wrote:
>> I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?
>>
>
> Should be lowered to a bit over seq_page_cost..  and more importantly.. you should
> make sure that you have updated your statistics .. run "ANALYZE";
>
> --
> Jesper
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Query much faster with enable_seqscan=0

От
"Joshua D. Drake"
Дата:
On Tue, 2010-09-21 at 14:02 -0500, Ogden wrote:
> How odd, I set the following:
>
> seq_page_cost = 1.0
> random_page_cost = 2.0
>
> And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running
ANALYZEnow - how often is it recommended to do this? 

PostgreSQL's defaults are based on extremely small and some would say
(non production) size databases. As a matter of course I always
recommend bringing seq_page_cost and random_page_cost more in line.

However, you may want to try moving random_page_cost back to 4 and try
increasing cpu_tuple_cost instead.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: Query much faster with enable_seqscan=0

От
Greg Smith
Дата:
Joshua D. Drake wrote:
> PostgreSQL's defaults are based on extremely small and some would say
> (non production) size databases. As a matter of course I always
> recommend bringing seq_page_cost and random_page_cost more in line.
>

Also, they presume that not all of your data is going to be in memory,
and the query optimizer needs to be careful about what it does and
doesn't pull from disk.  If that's not the case, like here where there's
8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost
and random_page_cost can make sense.  Don't be afraid to think lowering
below 1.0 is going too far--something more like 0.01 for sequential and
0.02 for random may actually reflect reality here.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


Re: Query much faster with enable_seqscan=0

От
Ogden
Дата:
On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:

> Joshua D. Drake wrote:
>> PostgreSQL's defaults are based on extremely small and some would say
>> (non production) size databases. As a matter of course I always
>> recommend bringing seq_page_cost and random_page_cost more in line.
>>
>
> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful
aboutwhat it does and doesn't pull from disk.  If that's not the case, like here where there's 8GB of RAM and a 7GB
database,dramatic reductions to both seq_page_cost and random_page_cost can make sense.  Don't be afraid to think
loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect
realityhere. 
>

I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was
certainlythese figures I messed around with. I have set: 

seq_page_cost = 0.01
random_page_cost = 0.02
cpu_tuple_cost = 0.01

Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.

I truly appreciate everyone's help.

Ogden


Re: Query much faster with enable_seqscan=0

От
Tom Lane
Дата:
Greg Smith <greg@2ndquadrant.com> writes:
> and the query optimizer needs to be careful about what it does and
> doesn't pull from disk.  If that's not the case, like here where there's
> 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost
> and random_page_cost can make sense.  Don't be afraid to think lowering
> below 1.0 is going too far--something more like 0.01 for sequential and
> 0.02 for random may actually reflect reality here.

If you are tuning for an all-in-RAM situation, you should set
random_page_cost equal to seq_page_cost (and usually set both smaller
than 1).  By definition, those costs are equal if you're fetching from
RAM.  If it's only mostly-in-RAM then keeping random_page_cost a bit
higher makes sense.

            regards, tom lane

Re: Query much faster with enable_seqscan=0

От
Tom Lane
Дата:
Ogden <lists@darkstatic.com> writes:
> SELECT tr.id, tr.sid
>             FROM
>             test_registration tr,
>             INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>             WHERE.
>             tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>             GROUP BY tr.id, tr.sid

Seeing that tr.id is a primary key, I think you might be a lot better
off if you avoided the inner join and group by.  I think what you really
want here is something like

SELECT tr.id, tr.sid
            FROM
            test_registration tr
            WHERE
            tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
            AND EXISTS(SELECT 1 FROM test_registration_result r
                       WHERE tr.id = r.test_registration_id)

            regards, tom lane

Re: Query much faster with enable_seqscan=0

От
"Joshua D. Drake"
Дата:
On Tue, 2010-09-21 at 14:02 -0500, Ogden wrote:
> How odd, I set the following:
>
> seq_page_cost = 1.0
> random_page_cost = 2.0
>
> And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running
ANALYZEnow - how often is it recommended to do this? 

PostgreSQL's defaults are based on extremely small and some would say
(non production) size databases. As a matter of course I always
recommend bringing seq_page_cost and random_page_cost more in line.

However, you may want to try moving random_page_cost back to 4 and try
increasing cpu_tuple_cost instead.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: Query much faster with enable_seqscan=0

От
Ogden
Дата:
On Sep 21, 2010, at 2:34 PM, Ogden wrote:

>
> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>
>> Joshua D. Drake wrote:
>>> PostgreSQL's defaults are based on extremely small and some would say
>>> (non production) size databases. As a matter of course I always
>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>
>>
>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful
aboutwhat it does and doesn't pull from disk.  If that's not the case, like here where there's 8GB of RAM and a 7GB
database,dramatic reductions to both seq_page_cost and random_page_cost can make sense.  Don't be afraid to think
loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect
realityhere. 
>>
>
> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it
wascertainly these figures I messed around with. I have set: 
>
> seq_page_cost = 0.01
> random_page_cost = 0.02
> cpu_tuple_cost = 0.01
>
> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.
>
> I truly appreciate everyone's help.
>
> Ogden
>


I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after
tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a
matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the
speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for
thisor what should those settings really be? Perhaps 0.01 is too low? 

Thank you

Ogden

Re: Query much faster with enable_seqscan=0

От
Scott Carey
Дата:
On Sep 22, 2010, at 6:36 AM, Ogden wrote:

>
> On Sep 21, 2010, at 2:34 PM, Ogden wrote:
>
>>
>> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>>
>>> Joshua D. Drake wrote:
>>>> PostgreSQL's defaults are based on extremely small and some would say
>>>> (non production) size databases. As a matter of course I always
>>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>>
>>>
>>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful
aboutwhat it does and doesn't pull from disk.  If that's not the case, like here where there's 8GB of RAM and a 7GB
database,dramatic reductions to both seq_page_cost and random_page_cost can make sense.  Don't be afraid to think
loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect
realityhere. 
>>>
>>
>> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it
wascertainly these figures I messed around with. I have set: 
>>
>> seq_page_cost = 0.01
>> random_page_cost = 0.02
>> cpu_tuple_cost = 0.01
>>
>> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.

>>
>> I truly appreciate everyone's help.
>>
>> Ogden
>>
>
>
> I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after
tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a
matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the
speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for
thisor what should those settings really be? Perhaps 0.01 is too low? 
>
> Thank you
>
> Ogden

When not cached, the plan with sequential scans will almost always be much faster.

When cached in memory, the ones using indexes are almost always faster.

The tuning parameters are essentially telling postgres the likelihood of finding things on disk instead versus in
memory.  The default parameters are essentially "not likely in memory, with a somewhat slow disk". 




> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Query much faster with enable_seqscan=0

От
Robert Haas
Дата:
On Wed, Sep 22, 2010 at 9:36 AM, Ogden <lists@darkstatic.com> wrote:
>
> On Sep 21, 2010, at 2:34 PM, Ogden wrote:
>
>>
>> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>>
>>> Joshua D. Drake wrote:
>>>> PostgreSQL's defaults are based on extremely small and some would say
>>>> (non production) size databases. As a matter of course I always
>>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>>
>>>
>>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful
aboutwhat it does and doesn't pull from disk.  If that's not the case, like here where there's 8GB of RAM and a 7GB
database,dramatic reductions to both seq_page_cost and random_page_cost can make sense.  Don't be afraid to think
loweringbelow 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect
realityhere. 
>>>
>>
>> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it
wascertainly these figures I messed around with. I have set: 
>>
>> seq_page_cost = 0.01
>> random_page_cost = 0.02
>> cpu_tuple_cost = 0.01
>>
>> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.
>>
>> I truly appreciate everyone's help.
>>
>> Ogden
>>
>
>
> I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after
tweakingthose numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a
matterof 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the
speedof my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for
thisor what should those settings really be? Perhaps 0.01 is too low? 

Yeah, I think those numbers are a bit low.  Your database probably
isn't fully cached.  Keep in mind there's going to be some fluctuation
as to what is and is not in cache, and you can't expect whatever plan
the planner picks to be exactly perfect for both cases.  I might try
something more like 0.2 / 0.1.  If you really need the query to be
fast, though, you might need to do more than jigger the page costs.
Did you try Tom's suggested rewrite?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Query much faster with enable_seqscan=0

От
Ogden
Дата:
On Sep 21, 2010, at 6:30 PM, Tom Lane wrote:

> Ogden <lists@darkstatic.com> writes:
>> SELECT tr.id, tr.sid
>>            FROM
>>            test_registration tr,
>>            INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>>            WHERE.
>>            tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>>            GROUP BY tr.id, tr.sid
>
> Seeing that tr.id is a primary key, I think you might be a lot better
> off if you avoided the inner join and group by.  I think what you really
> want here is something like
>
> SELECT tr.id, tr.sid
>            FROM
>            test_registration tr
>            WHERE
>            tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>            AND EXISTS(SELECT 1 FROM test_registration_result r
>                       WHERE tr.id = r.test_registration_id)
>
>             regards, tom lane
>

Thank you for this suggestion, however, what if I wanted some columns from test_registration_result - this wouldn't
work,for example if I wanted test_registration_result.answer to be fetched. Hence, I had to have a JOIN with
test_registration_resultand a GROUP BY. I still am not happy with my query - the EXISTS executes in great speed however
Icannot retrieve any of the columns from that table.  

Thank you

Ogden


Re: Query much faster with enable_seqscan=0

От
Samuel Gendler
Дата:


On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ogden <lists@darkstatic.com> writes:
> SELECT tr.id, tr.sid
>             FROM
>             test_registration tr,
>             INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>             WHERE.
>             tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>             GROUP BY tr.id, tr.sid

Seeing that tr.id is a primary key, I think you might be a lot better
off if you avoided the inner join and group by.  I think what you really
want here is something like

SELECT tr.id, tr.sid
           FROM
           test_registration tr
           WHERE
           tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
           AND EXISTS(SELECT 1 FROM test_registration_result r
                      WHERE tr.id = r.test_registration_id)

                       regards, tom lane


Could you explain the logic behind why this structure is better than the other? Is it always the case that one should just always use the 'exists(select 1 from x...)' structure when trying to strip rows that don't join or is it just the case when you know that the rows which do join are a fairly limited subset?  Does the same advantage exist if filtering rows in the joined table on some criteria, or is it better at that point to use an inner join and add a where clause to filter the joined rows.

select table1.columns
from  table1, table2
where table1.column = 'some_value'
   and table1.fk = table2.pk
  AND table2.column = 'some_other_value'

versus

select table1.columns
  from table1
where table1.column = 'some_value'
   and exists(select 1 from table2 where table1.fk = table2.pk
                      and table2.column ='some_other_value')


Re: Query much faster with enable_seqscan=0

От
Robert Haas
Дата:
On Tue, Oct 12, 2010 at 10:28 PM, Samuel Gendler
<sgendler@ideasculptor.com> wrote:
>
>
> On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Ogden <lists@darkstatic.com> writes:
>> > SELECT tr.id, tr.sid
>> >             FROM
>> >             test_registration tr,
>> >             INNER JOIN test_registration_result r on (tr.id =
>> > r.test_registration_id)
>> >             WHERE.
>> >
>> > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>> >             GROUP BY tr.id, tr.sid
>>
>> Seeing that tr.id is a primary key, I think you might be a lot better
>> off if you avoided the inner join and group by.  I think what you really
>> want here is something like
>>
>> SELECT tr.id, tr.sid
>>            FROM
>>            test_registration tr
>>            WHERE
>>
>>  tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>>            AND EXISTS(SELECT 1 FROM test_registration_result r
>>                       WHERE tr.id = r.test_registration_id)
>>
>>                        regards, tom lane
>>
>
> Could you explain the logic behind why this structure is better than the
> other? Is it always the case that one should just always use the
> 'exists(select 1 from x...)' structure when trying to strip rows that don't
> join or is it just the case when you know that the rows which do join are a
> fairly limited subset?  Does the same advantage exist if filtering rows in
> the joined table on some criteria, or is it better at that point to use an
> inner join and add a where clause to filter the joined rows.
> select table1.columns
> from  table1, table2
> where table1.column = 'some_value'
>    and table1.fk = table2.pk
>   AND table2.column = 'some_other_value'
> versus
> select table1.columns
>   from table1
> where table1.column = 'some_value'
>    and exists(select 1 from table2 where table1.fk = table2.pk
>                       and table2.column ='some_other_value')

I don't think there's much difference between those two cases.  I
think Tom's point was that GROUP BY can be expensive - which it
certainly can.  It's absolutely necessary and unavoidable for certain
queries, of course, but don't include it unless you need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company