PostgreSQL performance enhancement when query planner fails to guess the right plan

Поиск
Список
Период
Сортировка
От Constantin Teodorescu
Тема PostgreSQL performance enhancement when query planner fails to guess the right plan
Дата
Msg-id 44D79C62.2050303@flex.ro
обсуждение исходный текст
Ответы Re: PostgreSQL performance enhancement when query planner fails to
Re: PostgreSQL performance enhancement when query
Список pgsql-hackers
Hello all, hope you are remembering me, some years ago I've designed the 
PgAccess , the Tcl/Tk visual interface to PostgreSQL.

Thought you haven't received any news from me, I continued working with 
PostgreSQL, being involved in very big projects in Romania.
Right now, the national identification of the cows, sheep, goats and 
pigs in Romania runs on PostgreSQL on a very big database.
Once again , I had to thank you all for keeping up maintaining and 
improving PostgreSQL.

My message to all of you is related to this big project (a government 
sustained project) and some performance issues.

Very few words about the database: approx. 60 tables, 30 of them 
containing 10 millions to 50 millions records , the whole database is 
approx 40 Gb size !

In order to get a good performance, the database is operated on a dual 
XEON with 6 Gb RAM IBM x235 server, the database with tables and indexes 
carefully distributed on 6 different SCSI disks, in different 
tablespaces in such a manner to allow parallelizing reads and HDD head 
movements on different devices when joining those big tables.

We have tuned every possible parameter in config file, we have 
reorganized queries, analyzing explains in order to get the best results 
for all big queries and we succeeded most of the time.
But we have encountered some problems. Due to constant updates and 
inserts into the database, it's size is growing continuously.
Of course we are doing DAILY the needed maintaince, vacuums, analyzes 
and backups.
Due to permanent changes in database size and statistics there are 
queries that sometimes change their execution plan, badly choosing 
another plan and executing those queries in 2,3 minutes instead of 10 
seconds, the usual execution time since the query plan is "switched". We 
have done any effort in changing subselects and the query sentence in 
order to "force" using some indexes, continuously watching the explain 
results.

We have faced yesterday with such a problem with a query that "switched" 
the query plan to a very bad one, almost putting the whole system down.
The only way that we have succeeded to make it work again was by using 
the "SET ENABLE_MERGE_JOIN to OFF".
For the moment it works but in our opinion this is NOT the best approach 
to guide the planner to a better query-plan variant.

Our suggestion would be : extending the EXPLAIN and SELECT commands like 
that:

EXPLAIN VARIANTS SELECT ...... (and so on) that will display the 
different query plans analyzed by the planner and their "estimated time 
values" , not just the "best guess" .

assuming that the EXPLAIN VARIANTS will show 3 or 4 different query 
plans, the database manager will be able to experiment, to test, and to 
decide by himself what is "THE BEST PLAN FOR ME", instead of letting 
postgresql planner to to that. Doing this, we would be able to clearly 
specify then in the SELECT statement the "version" of the query-plan 
that would be used in execution like in the following example:

SELECT .... (very big and complex query) ... USING PLAN 3;

Specifying the desired plan could be of course, different.
I realise that it would be probably better that the query-plan will 
guess the right and optimal plan. I agree that this can be done be 
tweaking parameters and costs BUT THIS CAN TAKE A LOT OF TIME, much more 
than a couple of tests on the real database. An experimented database 
admin can detect much easier the appropriate plan and "force" the 
executor to select that one that he desires.

In our opinion, this would be the simplest and the most non-intrusive 
method of "manual choosing" another query plan rather than indirectly 
setting ON or OFFS various parameters that could affect badly other 
queries.
First of all, it's assumed that the query planner HAS ALREADY evaluated 
different variants and it decides to use one based upon the statistics 
informations of the involved tables and "costs" for various types of 
access.
Unfortunately, due to a very difficult adjustment of those costs and 
timings of the HDD performance, IO transfer speeds, PostgreSQL is 
choosing sometimes a wrong plan.
If we would have the power of choosing and experimenting different plans 
with "SELECT .... USING PLAN <that-one>" we can select than the right 
one in our real world.

The "... USING PLAN 9" extension to the language I hope that it's the 
most delicate and innocent  :-) that I hope that it can be accepted and 
it will give a extremely powerful way of controlling the execution 
performance.

Hope that you know the old joke with someone who is receiving an email 
message : "Hi , I'm the Albanian virus. Due to our poor technologies, I 
cannot do much so please delete some of your files and pretend to be 
scared" :-)
I admit that this approach might be called "the Albanian way of choosing 
the best query plan" :-) but you must admit also that it does not place 
a big burden on the developers, it does not change anything in what have 
been done since now and it allows the developers and database 
administrators to dive into the query plan ocean and to get out of there 
the best of the quickest of the fastest query plan. :-)

Hope that this long message have not disturb you so much ... so I'm 
waiting for your comments and suggestions.

Best regards,
Constantin Teodorescu
Braila, ROMANIA


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: CSStorm occurred again by postgreSQL8.2
Следующее
От: Lukas Smith
Дата:
Сообщение: Re: PostgreSQL performance enhancement when query planner fails to