Обсуждение: Suboptimal plan choice problem with 8.3RC2

Поиск
Список
Период
Сортировка

Suboptimal plan choice problem with 8.3RC2

От
"Guillaume Smet"
Дата:
Hi -hackers,

While testing RC2 on the new servers of one of our customers, I found
a query really slow on the new server which is quite fast on the old
box currently in production (production is 8.1 at the moment).
If I set enable_nestloop to off, the query is fast (the plan is
different from the 8.1 plan though).

I attached:
- the plan with regular configuration;
- the plan after disabling nested loops;
- the plan obtained with 8.1 on the current production box;
- the relevant configuration and schema of the concerned tables.

The plans are really too different to find a narrower test case so
it's the real test case.

Feel free to ask any additional information or tests.

Regards,

--
Guillaume

Вложения

Re: Suboptimal plan choice problem with 8.3RC2

От
Tom Lane
Дата:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> While testing RC2 on the new servers of one of our customers, I found
> a query really slow on the new server which is quite fast on the old
> box currently in production (production is 8.1 at the moment).

Have you ANALYZEd these tables lately?  It looks like 8.3 likes the plan
it likes because it thinks that the ranges of el.numasso and a.numasso
are fairly distinct:

>                ->  Merge Join  (cost=7.55..2905.50 rows=65 width=68) (actual time=5138.556..8106.465 rows=36
loops=1)
>                      Merge Cond: (el.numasso = a.numasso)
>                      ->  Nested Loop  (cost=0.00..254537.64 rows=90 width=37) (actual time=5137.405..8104.863 rows=36
loops=1)

The only way the merge join could have an estimated cost that's barely
1% of the estimate for one of its inputs is if the planner thinks the
merge will stop after reading only 1% of that input, ie, the largest
a.numasso value is only about 1% of the way through the range of
el.numasso.  If the a.numasso distribution has a long tail, you might
need to raise the statistics target to fix this estimate.

I'd expect 8.1 to make about the same estimate given the same stats,
so I think it's not looking at the same stats.
        regards, tom lane


Re: Suboptimal plan choice problem with 8.3RC2

От
"Guillaume Smet"
Дата:
On Jan 22, 2008 8:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only way the merge join could have an estimated cost that's barely
> 1% of the estimate for one of its inputs is if the planner thinks the
> merge will stop after reading only 1% of that input, ie, the largest
> a.numasso value is only about 1% of the way through the range of
> el.numasso.  If the a.numasso distribution has a long tail, you might
> need to raise the statistics target to fix this estimate.

The statistics target was fine (I set it to 30 by default). But...

> I'd expect 8.1 to make about the same estimate given the same stats,
> so I think it's not looking at the same stats.

Yep, the statistics were the problem, sorry for the noise. The query
performs in 50ms after an ANALYZE so far better than with 8.1.

The 8.3RC2 box is using the default configuration of autovacuum
though. Shouldn't it take care of keeping the statistics up to date?
That's what I thought from what I've read on autovacuum so far (it's
the first time I use it in "production" though, it was a manual
process until now) - and that's why I didn't check it. Or should we
still run the first ANALYZE manually?

Andrew from Supernews also pointed the lack of an index on
evelieu(numasso). It's even better with it (less than a ms).

--
Guillaume


Re: Suboptimal plan choice problem with 8.3RC2

От
Alvaro Herrera
Дата:
Guillaume Smet escribió:
> On Jan 22, 2008 8:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > I'd expect 8.1 to make about the same estimate given the same stats,
> > so I think it's not looking at the same stats.
> 
> Yep, the statistics were the problem, sorry for the noise. The query
> performs in 50ms after an ANALYZE so far better than with 8.1.
> 
> The 8.3RC2 box is using the default configuration of autovacuum
> though. Shouldn't it take care of keeping the statistics up to date?
> That's what I thought from what I've read on autovacuum so far (it's
> the first time I use it in "production" though, it was a manual
> process until now) - and that's why I didn't check it. Or should we
> still run the first ANALYZE manually?

No, autovacuum should have taken care of it.  I would be interesting in
knowing why it didn't.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Suboptimal plan choice problem with 8.3RC2

От
"Guillaume Smet"
Дата:
On Jan 22, 2008 9:52 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> No, autovacuum should have taken care of it.  I would be interesting in
> knowing why it didn't.

I just dropped the database on monday morning and import it again.
Nothing really fancy.

It seems that autovacuum took care of them just after the import which
is what I expected:

cityvox_prod=# select relname, last_vacuum, last_autovacuum,
last_analyze, last_autoanalyze from pg_stat_all_tables where
schemaname = 'cityvox' AND relname IN('association', 'evelieu',
'assovil', 'lieu', 'vilquartier') ORDER BY relname;  relname   | last_vacuum | last_autovacuum |         last_analyze
 |       last_autoanalyze
 

-------------+-------------+-----------------+-------------------------------+-------------------------------association
|            |                 | 2008-01-22
 
20:34:26.813283+01 | 2008-01-21 12:10:50.30652+01assovil     |             |                 | 2008-01-22
20:34:46.548442+01 | 2008-01-21 12:10:50.573546+01evelieu     |             |                 | 2008-01-22
20:34:33.193569+01 | 2008-01-21 12:11:06.237325+01lieu        |             |                 | 2008-01-22
20:34:35.936066+01 | 2008-01-21 12:11:52.085856+01vilquartier |             |                 | 2008-01-22
20:34:43.409459+01 | 2008-01-21 12:12:08.391397+01

So I wonder why the stats were so bad... I didn't update the data at
all after the initial import. Any idea?

I'll check the stats of these tables next time I drop/recreate the database.

--
Guillaume


Re: Suboptimal plan choice problem with 8.3RC2

От
Alvaro Herrera
Дата:
Guillaume Smet escribió:

> It seems that autovacuum took care of them just after the import which
> is what I expected:

[...]

> So I wonder why the stats were so bad... I didn't update the data at
> all after the initial import. Any idea?

Hmm, perhaps the analyze was done with the default statistic target (10)?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Suboptimal plan choice problem with 8.3RC2

От
"Guillaume Smet"
Дата:
On Jan 22, 2008 11:22 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > So I wonder why the stats were so bad... I didn't update the data at
> > all after the initial import. Any idea?
>
> Hmm, perhaps the analyze was done with the default statistic target (10)?

It's set to 30 in the postgresql.conf from the beginning (= at least 3
weeks) and PostgreSQL has been restarted a few times (at least for the
upgrade to RC2)  so I don't think it's a configuration problem.

I just set default_statistics_target to 10 and run a few
ANALYZE+EXPLAIN ANALYZE query and the plan is the good one for all the
runs. It's set to 10 on the 8.1 box by the way.

I can't drop/recreate the database at will because the customer is
also testing it but I'll try to find a moment to do it tomorrow.

--
Guillaume


Re: Suboptimal plan choice problem with 8.3RC2

От
"Guillaume Smet"
Дата:
On Jan 22, 2008 11:37 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> I can't drop/recreate the database at will because the customer is
> also testing it but I'll try to find a moment to do it tomorrow.

I didn't reproduce the problem this time. I'll check after each
drop/recreate to see if I got something or if it was a false alarm.

--
Guillaume