Re: Abnormal performance difference between Postgres and MySQL

Поиск
Список
Период
Сортировка
От Farhan Husain
Тема Re: Abnormal performance difference between Postgres and MySQL
Дата
Msg-id 3df32b6d0902260945l57f761fdteedb29b2a5f28070@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Abnormal performance difference between Postgres and MySQL  (Scott Carey <scott@richrelevance.com>)
Ответы Re: Abnormal performance difference between Postgres and MySQL
Список pgsql-performance
On Wed, Feb 25, 2009 at 6:07 PM, Scott Carey <scott@richrelevance.com> wrote:
I will second Kevin’s suggestion.  Unless you think you will have more than a few dozen concurrent queries, start with work_mem around 32MB.
For the query here, a very large work_mem might help it hash join depending on the data... But that’s not the real problem here.

The real problem is that it does a huge scan of all of the a1 table, and sorts it.  Its pretty clear that this table has incorrect statistics.  It thinks that it will get about 1 million rows back in the scan, but it is actually 3 million in the scan.

Crank up the statistics target on that table from the default to at least 100, perhaps even 1000.  This is a large table, the default statistics target of 10 is not good for large tables with skewed column data.  Those to try increasing the target on are the columns filtered in the explain: graphid, prop, and obj.  Then run vacuum analzye on that table (a1).  The planner should then have better stats and will likely be able to use a better plan for the join.

The other tables involved in the join also seem to have  bad statistics.  You might just take the easiest solution and change the global statistics target and vacuum analyze the tables involved:

set default_statistics_target = 50;
vacuum analyze jena_g1t1_stmt ;

(test the query)

Repeat for several values of the default statistics target.  You can run “explain” before running the actual query, to see if the plan changed.  If it has not, the time will not likely change.
The max value for the statistics target is 1000, which makes analyzing and query planning slower, but more accurate.  In most cases, dramatic differences can happen between the default of 10 and values of 25 or 50.  Sometimes, you have to go into the hundreds, and it is safer to do this on a per-column basis once you get to larger values.

For larger database, I recommend increasing the default to 20 to 40 and re-analyzing all the tables.






On 2/25/09 3:11 PM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

>>> Farhan Husain <russoue@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov wrote:
>> >>> Farhan Husain <russoue@gmail.com> wrote:
>> > The machine postgres is running on has 4 GB of RAM.
>>
>> In addition to the other suggestions, you should be sure that
>> effective_cache_size is set to a reasonable value, which would
>> probably be somewhere in the neighborhood of '3GB'.

> The execution time has not improved. I am going to increase the
> shared_buffers now keeping the work_mem same.

Increasing shared_buffers is good, but it has been mentioned that this
will not change the plan, which currently scans and sorts the whole
table for a1.  Nobody will be surprised when you report minimal
change, if any.  If you have not changed effective_cache_size (be sure
not to confuse this with any of the other configuration values) it
will think you only have 128MB of cache, which will be off by a factor
of about 24 from reality.

Also, I'm going to respectfully differ with some of the other posts on
the best setting for work_mem.  Most benchmarks I've run and can
remember seeing posted found best performance for this at somewhere
between 16MB and 32MB.  You do have to be careful if you have a large
number of concurrent queries, however, and keep it lower.  In most
such cases, though, you're better off using a connection pool to limit
concurrent queries instead.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Thanks a lot Scott! I think that was the problem. I just changed the default statistics target to 50 and ran explain. The plan changed and I ran explain analyze. Now it takes a fraction of a second!

Thanks to all of you who wanted to help me. I would be happy if someone does me one last favor. I want to know how these query plans are generated and how the parameters you suggested to change affects it. If there is any article, paper or book on it please give me the name or url.

Here is the output of my latest tasks:

ingentadb=# set default_statistics_target=50;
SET
ingentadb=# show default_statistics_target;
 default_statistics_target
---------------------------
 50
(1 row)

ingentadb=# vacuum analyze jena_g1t1_stmt;
VACUUM
ingentadb=# EXPLAIN select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
                                                                        QUERY PLAN                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..37838.46 rows=7568 width=134)
   ->  Nested Loop  (cost=0.00..7485.09 rows=495 width=148)
         ->  Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0  (cost=0.00..1160.62 rows=97 width=74)
               Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
               Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
         ->  Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1  (cost=0.00..65.15 rows=4 width=74)
               Index Cond: (((a1.subj)::text = (a0.subj)::text) AND ((a1.prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
               Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
   ->  Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2  (cost=0.00..61.17 rows=12 width=134)
         Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
         Filter: (a2.graphid = 1)
(11 rows)

ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
                                                                        QUERY PLAN                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..37838.46 rows=7568 width=134) (actual time=6.535..126.791 rows=30 loops=1)
   ->  Nested Loop  (cost=0.00..7485.09 rows=495 width=148) (actual time=4.404..64.078 rows=30 loops=1)
         ->  Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0  (cost=0.00..1160.62 rows=97 width=74) (actual time=2.127..2.270 rows=30 loops=1)
               Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
               Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
         ->  Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1  (cost=0.00..65.15 rows=4 width=74) (actual time=2.054..2.056 rows=1 loops=30)
               Index Cond: (((a1.subj)::text = (a0.subj)::text) AND ((a1.prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
               Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
   ->  Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2  (cost=0.00..61.17 rows=12 width=134) (actual time=2.083..2.086 rows=1 loops=30)
         Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
         Filter: (a2.graphid = 1)
 Total runtime: 127.065 ms
(12 rows)


Thanks and regards,

--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas

В списке pgsql-performance по дате отправления:

Предыдущее
От: Farhan Husain
Дата:
Сообщение: Re: Abnormal performance difference between Postgres and MySQL
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Abnormal performance difference between Postgres and MySQL