Re: strange query plans

Поиск
Список
Период
Сортировка
От Stephan Szabo
Тема Re: strange query plans
Дата
Msg-id Pine.BSF.4.21.0012041517020.67964-100000@megazone23.bigpanda.com
обсуждение исходный текст
Ответ на Re: strange query plans  (Chris Jones <chris@mt.sri.com>)
Список pgsql-general
On 4 Dec 2000, Chris Jones wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
>
> > On 30 Nov 2000, Chris Jones wrote:
> >
> > > PG seems to be choosing a sub-optimal query plan.  It's doing a
> > > sequential scan of a 120000-tuple table, instead of an index scan for
> > > the 16 matching rows.  Running PG 7.0.2:
> > >
> > > fastfacts=> vacuum analyze event;
> > > VACUUM
> > > fastfacts=> explain select type from event where type = 'IPOETC_EVENT';
> > > NOTICE:  QUERY PLAN:
> > >
> > > Seq Scan on event  (cost=0.00..6664.25 rows=6224 width=12)
> >
> > > I know that PG is frequently smarter than I am, but this doesn't seem
> > > like a case where it's made a good decision.  What am I missing?
> >
> > postgres is expecting 6224 rows to match rather than the 16 that are
> > actually there.  Usual questions are has this table been vacuum analyzed
> > recently and is there a very common value that is much more common
> > than other data in the column (this throws off the estimates).
>
> It was vacuum analyzed immediately before, and the key distribution is
> anything but uniform.  The frequencies range from 16 to 64337.  I'm
> assuming that PG's query optimizer isn't well-informed enough to
> correctly make this kind of decision.  Is there any way I could force
> it?  I have two types which account for about 90% of the data in the
> table; they won't ever have this particular query run on them.  The
> other 10% of the table could really benefit from an Index Scan.

Yeah, that'd do it.  I believe that it takes a fraction of the most
common value as the expected (except for that value maybe, don't know)
and thinks there are alot of rows to get.  A usual temporary hack is
to do something like you mentioned (using nulls) because those are
handled separately.  :(  There's been alot of talk about keeping better
statistics.

There is a set command which will try to force the optimizer to not try a
certain plan.  I don't remember what the exact names are for the options,
but it's come up in the mailing list archives.  The problem here is that
you have to remember to send the set before and then set it back
afterwards to not bias the optimizer for other cases.

> The best workaround I've found so far is to create a column called
> "sillytype", which is the same as type, except that it's null for
> those two values.  Now it uses the index, but I'm not entirely
> comfortable with this solution.


В списке pgsql-general по дате отправления:

Предыдущее
От: Dan Lyke
Дата:
Сообщение: Why PostgreSQL is not that popular as MySQL?
Следующее
От: Mike Castle
Дата:
Сообщение: Re: Sequences in transaction