Re: Experimental evaluation of PostgreSQL's query optimizer

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: Experimental evaluation of PostgreSQL's query optimizer
Дата
Msg-id A737B7A37273E048B164557ADEF4A58B5378A976@ntex2010i.host.magwien.gv.at
обсуждение исходный текст
Ответ на Experimental evaluation of PostgreSQL's query optimizer  (Viktor Leis <leis@in.tum.de>)
Ответы Re: Experimental evaluation of PostgreSQL's query optimizer  (Viktor Leis <leis@in.tum.de>)
Re: Experimental evaluation of PostgreSQL's query optimizer  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> 
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
> 
> I hope this will be of interest to some of you.

I have read the paper with great interest, and I have some comments.

- The paper mentions that the "Join Order Benchmark" has high cross-table correlation, and this correlation is
responsiblefor bad cardinality estimates that cause bad plans with all RDBMS. Wouldn't it be interesting to do the same
experimentwith a different real-word data sets to see if that is indeed typical and not an idiosyncrasy of that
specificbenchmark?
 

- The paper suggests that sampling the base tables is preferable to using statistics because it gives better estimates,
butI think that that is only a win with long running, complicated, data warehouse style queries. For the typical OLTP
queryit would incur intolerable planning times. Any ideas on that?
 

- From my experience in tuning SQL queries I can confirm your one finding, namely that bad cardinality estimates are
theprime source for bad plan choices. Perhaps it would be valuable to start thinking about statistics for inter-table
correlation.What about something as "simple" as a factor per (joinable) attribute pair that approximates the total row
countof a join on these attributes, divided by the planner's estimate?
 

- I also can corroborate your finding that nested loop joins are often harmful, particularly when the inner loop is a
sequentialscan. One of the first things I do when investigating bad performance of a query whose plan has a nestend
loopjoin is to set enable_nestloop to "off" and see if that makes a difference, and it often does. Maybe it would be a
winto bias the planner against nested loop joins. This is dreaming, but it might be nice to have some number as to how
reliablea certain estimate is, which is high if the estimate is, say, derived from a single filter on a base table and
sinksas more conditions are involved or numbers pulled out of thin air.
 

Yours,
Laurenz Albe

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jesper Pedersen
Дата:
Сообщение: Re: Additional LWLOCK_STATS statistics
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Freeze avoidance of very large table.