Обсуждение: Index problem

Поиск
Список
Период
Сортировка

Index problem

От
"Rigmor Ukuhe"
Дата:
Hi,

I have a table containing columns:

  "END_DATE" timestamptz NOT NULL
  "REO_ID" int4 NOT NULL

and i am indexed "REO_ID" coulumn.
I have a query:

select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN
('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
,'113737' ,'113812'  ,'113828'  ,'113762'  ,'113842'  ,'113869'  ,'113925'
,'113976'  ,'114035'  ,'114044'  ,'114057'  ,'114070'  ,'114084'  ,'114094'
,'114119' )

and it is _not_ using that index

But following query (notice there are less id-s in WHERE clause, but rest is
same)

select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where  "REO_ID" IN
('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
,'113737' )

will _is_ using index:

Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
rows=102 width=12)

What causes this behaviour? is there any workaround? Suggestions?

best,
Rigmor Ukuhe
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003


Re: Index problem

От
Tomasz Myrta
Дата:
> Hi,
>
> I have a table containing columns:
>
>   "END_DATE" timestamptz NOT NULL
>   "REO_ID" int4 NOT NULL
>
> and i am indexed "REO_ID" coulumn.
> I have a query:
>
> select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN
> ('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
> ,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
> ,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
> ,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
> ,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
> ,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
> ,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
> ,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
> ,'113737' ,'113812'  ,'113828'  ,'113762'  ,'113842'  ,'113869'  ,'113925'
> ,'113976'  ,'114035'  ,'114044'  ,'114057'  ,'114070'  ,'114084'  ,'114094'
> ,'114119' )
>
> and it is _not_ using that index
>
> But following query (notice there are less id-s in WHERE clause, but rest is
> same)
>
> select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where  "REO_ID" IN
> ('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
> ,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
> ,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
> ,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
> ,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
> ,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
> ,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
> ,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
> ,'113737' )
>
> will _is_ using index:

Why not. It's just because the second query is more selective. Probably
you don't have too many rows in your table and Postgres thinks it's
better (faster) to use sequential scan than index one.

Regards,
Tomasz Myrta


Re: Index problem

От
"Matt Clark"
Дата:
> What causes this behaviour? is there any workaround? Suggestions?
>

How many rows are there in the table, and can you post the 'explain analyze' for both queries after doing a 'vacuum
verboseanalyze 
[tablename]'?

Cheers

Matt



Re: Index problem

От
"Rigmor Ukuhe"
Дата:
> > What causes this behaviour? is there any workaround? Suggestions?
> >
>
> How many rows are there in the table, and can you post the
> 'explain analyze' for both queries after doing a 'vacuum verbose analyze
> [tablename]'?

There are about 2500 rows in that table.

1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
(cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
loops=1)
Total runtime: 98.74 msec

2nd query explain analyze: NOTICE:  QUERY PLAN:

Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
Total runtime: 21.59 msec

Any help?

Rigmor


>
> Cheers
>
> Matt
>
>
>
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003
>
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003


Re: Index problem

От
"Matt Clark"
Дата:
> There are about 2500 rows in that table.
>
> 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
> (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
> loops=1)
> Total runtime: 98.74 msec
>
> 2nd query explain analyze: NOTICE:  QUERY PLAN:
>
> Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
[snip]
> PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
> rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
> Total runtime: 21.59 msec

With only 2500 rows the planner could be deciding that it's going to have to read every disk block to do an index scan
anyway,so it 
might as well do a sequential scan.  If the pages are in fact in the kernel cache then the compute time will dominate,
notthe IO 
time, so it ends up looking like a bad plan, but it's probably not really such a bad plan...

Is your effective_cache_size set to something sensibly large?

You could also try decreasing cpu_index_tuple_cost and cpu_tuple_cost.  These will affect all your queries though, so
whatyou gain 
on one might be lost on another.

Matt



Re: Index problem

От
Tom Lane
Дата:
"Rigmor Ukuhe" <rigmor.ukuhe@finestmedia.com> writes:
>>> What causes this behaviour? is there any workaround? Suggestions?

At some point the planner is going to decide that one seqscan is cheaper
than repeated indexscans.  At some point it'll be right ... but in this
case it seems its relative cost estimates are off a bit.  You might try
reducing random_page_cost to bring them more into line with reality.
(But keep in mind that the reality you are measuring appears to be
small-table-already-fully-cached reality.  On a large table you might
find that small random_page_cost isn't such a hot idea after all.)

            regards, tom lane