Обсуждение: Increasing select max(datecol) from bilkaib where datecol<=date'2008-11-01' and (cr='00' or db='00') speed

От:
"Andrus"
Дата:

There are columns
 kuupaev date,  cr char(10), db char(10)
 and regular indexes  for all those fields.
bilkaib table contains large number of rows.

The following query takes too much time.
How to make it faster ?
I think PostgreSql should use multiple indexes as bitmaps to speed it.

I can re-write this query in any way or split to multiple statements if this
makes it faster.

Andrus.

explain analyze select max(kuupaev) from bilkaib where
kuupaev<=date'2008-11-01' and (cr='00' or db='00')

"Result  (cost=339.75..339.76 rows=1 width=0) (actual
time=52432.256..52432.260 rows=1 loops=1)"
"  InitPlan"
"    ->  Limit  (cost=0.00..339.75 rows=1 width=4) (actual
time=52432.232..52432.236 rows=1 loops=1)"
"          ->  Index Scan Backward using bilkaib_kuupaev_idx on bilkaib
(cost=0.00..1294464.73 rows=3810 width=4) (actual time=52432.222..52432.222
rows=1 loops=1)"
"                Index Cond: (kuupaev <= '2008-11-01'::date)"
"                Filter: ((kuupaev IS NOT NULL) AND ((cr = '00'::bpchar) OR
(db = '00'::bpchar)))"
"Total runtime: 52432.923 ms"

"PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)"


От:
Matthew Wakeling
Дата:

Firstly, please upgrade to Postgres 8.3 if possible.

On Wed, 12 Nov 2008, Andrus wrote:
> There are columns
> kuupaev date,  cr char(10), db char(10)
> and regular indexes  for all those fields.

Create a single index on (cr, db, datecol).

Matthew

--
Those who do not understand Unix are condemned to reinvent it, poorly.
                -- Henry Spencer

От:
"Vladimir Sitnikov"
Дата:



On Wed, Nov 12, 2008 at 9:02 AM, Andrus <> wrote:
There are columns
kuupaev date,  cr char(10), db char(10)
and regular indexes  for all those fields.
bilkaib table contains large number of rows.

The following query takes too much time.
How to make it faster ?
I think PostgreSql should use multiple indexes as bitmaps to speed it.
I am afraid I do not see a way to use bitmaps to get any improvement here: the server will still need to read the whole indices to figure out the answer.

I suggest you to create two more indices:

create index date_with_zero_cr on bilkaib(date) where cr='00';
create index date_with_zero_db on bilkaib(date) where db='00';

And rewrite query as follows:
select greatest(
   (select max(date) from bilkaib where datecol<=date'2008-11-01' and cr='00'),
   (select max(date) from bilkaib where datecol<=date'2008-11-01' and db='00'))


Regards,
Vladimir Sitnikov

От:
Matthew Wakeling
Дата:

On Wed, 12 Nov 2008, Vladimir Sitnikov wrote:
> And rewrite query as follows:
> select greatest(
>    (select max(date) from bilkaib where datecol<=date'2008-11-01' and cr='00'),
>    (select max(date) from bilkaib where datecol<=date'2008-11-01' and db='00'))

Oops, yes, I missed the "OR" in the query. This rewrite is good - my
suggested index would not have helped.

> I suggest you to create two more indices:
>
> create index date_with_zero_cr on bilkaib(date) where cr='00';
> create index date_with_zero_db on bilkaib(date) where db='00';

Alternatively if you create an index on (cr, bilkaib) and one on (db,
bilkaib) then you will be able to use other values in the query too.

Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are.                 -- Kyle Hearn

От:
"Andrus"
Дата:

Matthew,

Thank you.

bilkaib table contains GL transactions for every day.
00 records are initial balance records and they appear only in start of year
or start of month.
They may present or may be not present for some month if initial balance is
not calculated yet.
If 00 records are present, usuallly there are lot of them for single date
for db and cr columns.
This query finds initial balance date befeore given date.
bilkaib table contains several year transactions so it is large.

>Alternatively if you create an index on (cr, bilkaib) and one on (db,
>bilkaib) then you will be able to use other values in the query too.

I'm sorry I do'nt understand this.
What does the (cr, bilkaib) syntax mean?
Should I create two functions indexes and re-write query as Vladimir
suggests or is there better appoach ?

Andrus.


От:
"Vladimir Sitnikov"
Дата:



This query finds initial balance date befeore given date.
If you are not interested in other balances except initial ones (the ones that have '00') the best way is to create partial indices that I have suggested.
That will keep size of indices small, while providing good performance (constant response time)
 
bilkaib table contains several year transactions so it is large.
That is not a problem for the particular case. However, when you evaluate query performance, it really makes sense giving number of rows in each table (is 100K rows a "large" table? what about 10M rows?)  and other properties of the data stored in the table (like number of rows that have cr='00')
 
Alternatively if you create an index on (cr, bilkaib) and one on (db, bilkaib) then you will be able to use other values in the query too.
That means if you create one index on biklaib (cr, datecol) and another index on (db, datecol) you will be able to improve queries like
select greatest(
   (select max(date) from bilkaib where datecol<=date'2008-11-01' and cr=XXX),
   (select max(date) from bilkaib where datecol<=date'2008-11-01' and db=YYY)).
with arbitrary XXX and YYY. I am not sure if you really want this.


I'm sorry I do'nt understand this.
What does the (cr, bilkaib) syntax mean?
I believe that should be read as (cr, datecol).

 
Should I create two functions indexes and re-write query as Vladimir suggests or is there better appoach ?
I am afraid PostgreSQL is not smart enough to rewrite query with "or" into two separate index scans. There is no way to improve the query significantly without rewriting it.

Note:  for this case indices on (datecol), (cr) and (db) are not very helpful.
 
Regards,
Vladimir Sitnikov

От:
hubert depesz lubaczewski
Дата:

On Wed, Nov 12, 2008 at 07:02:10PM +0200, Andrus wrote:
> explain analyze select max(kuupaev) from bilkaib where
> kuupaev<=date'2008-11-01' and (cr='00' or db='00')

do you always have this: "(cr='00' or db='00')"? or do the values (00)
change?
if they don't change, or *most* of the queries have "(cr='00' or
db='00')", than the biggest time difference you will get after creating
this index:
create index test on bilkaib (kuupaev) where cr='00' or db='00';

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk:  / aim:depeszhdl / skype:depesz_hdl / gg:6749007

От:
"Andrus"
Дата:

Vladimir,

>I am afraid PostgreSQL is not smart enough to rewrite query with "or" into
>two separate index scans. There is no way to improve the query
>significantly without rewriting it.
>Note:  for this case indices on (datecol), (cr) and (db) are not very
>helpful.

Thank you very much.
I added you indexes to db and re-write query.
Now it runs fast.

Andrus.


От:
"Andrus"
Дата:

Depesz,

> do you always have this: "(cr='00' or db='00')"? or do the values (00)
> change?
> if they don't change, or *most* of the queries have "(cr='00' or
> db='00')", than the biggest time difference you will get after creating
> this index:
> create index test on bilkaib (kuupaev) where cr='00' or db='00';

I have always  cr='00' or db='00'  clause. Separate values are never tested.
I changed by queries back to old values and created this single index.
This seems to be even better that Vladimir suggestion.
Thank you very much.

Andrus.