Обсуждение: Postgres not willing to use an index?

Поиск
Список
Период
Сортировка

Postgres not willing to use an index?

От
Mario Splivalo
Дата:
I have a table, like this:

CREATE TABLE transactions
(
   transaction_id integer NOT NULL DEFAULT
nextval('transactions_seq'::regclass),
   transaction_type integer NOT NULL,
   transaction_client_id integer NOT NULL,
   transaction_destination_id integer NOT NULL,
   transaction_operator_id integer NOT NULL,
   transaction_application_id integer NOT NULL,
   transaction_application_service character varying NOT NULL,
   transaction_quantity integer NOT NULL,
   transaction_time_commit timestamp with time zone NOT NULL,
   transaction_time_received timestamp with time zone NOT NULL,
   transaction_gateway_id character(36) NOT NULL,
   transaction_payment_amount integer NOT NULL DEFAULT 0,
   CONSTRAINT transactions_pk PRIMARY KEY (transaction_id),
   CONSTRAINT transactions_uq__gateway_id UNIQUE (transaction_gateway_id)
)
WITH (OIDS=FALSE);

Now, all the _type, client_id, destination_id, operator_id, and
application_id are foreigen-keyed to coresponding tables. There are no
indices on those columns.

Besides PK and uq-constraint indices I have this index:

CREATE INDEX transactions_idx__client_data ON transactions
USING btree (transaction_client_id, transaction_destination_id,
transaction_operator_id, transaction_application_id,
transaction_time_commit)

The table_count is like this:

jura=# select count(*) from transactions;
   count
----------
  13751457
(1 row)

There are roughly 500.000 - 600.000 transactions for each month. There
are also transactions from past two years in the table.

I often SELECT data from the table for specified time period - usualy
from begining to the end of the month, like this:

SELECT <some-columns> FROM transactions WHERE transaction_time_commit
BETWEEN '2009-01-01' AND '2009-01-31 23:59:59';

The problem is that postgres is never using an index:

jura=# explain analyze select * from transactions where
transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59';

             QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on transactions  (cost=0.00..416865.85 rows=593713 width=91)
(actual time=4.067..3918.629 rows=525051 loops=1)
    Filter: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 4026.404 ms
(3 rows)

Time: 4068.521 ms


If I force it not to use sequential scans, it is using index, with
benefits of shorter execution time:
jura=# set enable_seqscan to false;
SET
Time: 0.103 ms
jura=# explain analyze select * from transactions where
transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59';

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on transactions  (cost=410369.98..629869.67
rows=593713 width=91) (actual time=1060.569..1280.500 rows=525051 loops=1)
    Recheck Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
    ->  Bitmap Index Scan on transactions_idx__client_data
(cost=0.00..410221.55 rows=593713 width=0) (actual
time=1058.992..1058.992 rows=525051 loops=1)
          Index Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 1388.882 ms
(5 rows)

Time: 1396.737 ms


Now, I found interesting is that if I create index just on
transaction_time_commit column (and I leave
transactions_idx__client_data index), then postgres is using that new index.

Also, if I change idx__client_data index like this (first I drop it, and
then I create new one):

CREATE INDEX transactions_idx__client_data ON transactions
USING btree (transaction_client_id, transaction_destination_id,
transaction_time_commit);

then postgres is using that index:

jura=# explain analyze select * from transactions where
transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59';

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on transactions  (cost=349473.37..568973.06
rows=593713 width=91) (actual time=949.224..1128.848 rows=525051 loops=1)
    Recheck Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
    ->  Bitmap Index Scan on transactions_idx__client_data
(cost=0.00..349324.94 rows=593713 width=0) (actual time=947.678..947.678
rows=525051 loops=1)
          Index Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 1234.989 ms
(5 rows)

Time: 1235.727 ms



Now, I have many 'selects' on the transactions table (still, not as many
as inserts), mostly filtered on transaction_time, client_id,
destination_id and application_id, but there is fair amount of 'selects'
filtered only on transaction_time.

Now, shall I keep the original index and add another one on just
transaction_time (there is, I guess, overhead of maintaining two
indices), or shall I remove transaction_time from original index, and
create another one?

And, is it normal for postgres to 'ignore' the transaction_time column
in original index?


This is the postgres version I'm using:
jura=# select version();
                                                     version

----------------------------------------------------------------------------------------------------------------
  PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)
(1 row)


    Mike

Re: Postgres not willing to use an index?

От
Grzegorz Jaśkiewicz
Дата:
On Fri, Feb 6, 2009 at 3:43 PM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:

> Besides PK and uq-constraint indices I have this index:
>
> CREATE INDEX transactions_idx__client_data ON transactions
> USING btree (transaction_client_id, transaction_destination_id,
> transaction_operator_id, transaction_application_id,
> transaction_time_commit)

I think it is because it isn't just a simple index, and for some
reason planner decides - that going through every
transaction_application_id, etc, etc just to find right
transaction_time_commit isn't worth the hassle.
Try using few more indexes, on less columns.
Also, if I may - I don't think it is quite usefull to have column
names that include table name in every single one of them, makes
things so much less readable.

--
GJ

Re: Postgres not willing to use an index?

От
Tom Lane
Дата:
Mario Splivalo <mario.splivalo@megafon.hr> writes:
> Besides PK and uq-constraint indices I have this index:

> CREATE INDEX transactions_idx__client_data ON transactions
> USING btree (transaction_client_id, transaction_destination_id,
> transaction_operator_id, transaction_application_id,
> transaction_time_commit)

> SELECT <some-columns> FROM transactions WHERE transaction_time_commit
> BETWEEN '2009-01-01' AND '2009-01-31 23:59:59';
> The problem is that postgres is never using an index:

Hardly surprising --- a search on the index's lowest-order column would
require scanning practically all of the index.  (If you think about the
ordering of the index entries you'll see why.)  If this is a typical
query then you need a separate index on transaction_time_commit.

The fine manual goes into some detail about how to design indexes;
http://www.postgresql.org/docs/8.3/static/indexes.html
particularly 11.3, 11.5.

            regards, tom lane

Re: Postgres not willing to use an index?

От
Robert Haas
Дата:
On Fri, Feb 6, 2009 at 11:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mario Splivalo <mario.splivalo@megafon.hr> writes:
>> Besides PK and uq-constraint indices I have this index:
>
>> CREATE INDEX transactions_idx__client_data ON transactions
>> USING btree (transaction_client_id, transaction_destination_id,
>> transaction_operator_id, transaction_application_id,
>> transaction_time_commit)
>
>> SELECT <some-columns> FROM transactions WHERE transaction_time_commit
>> BETWEEN '2009-01-01' AND '2009-01-31 23:59:59';
>> The problem is that postgres is never using an index:
>
> Hardly surprising --- a search on the index's lowest-order column would
> require scanning practically all of the index.  (If you think about the
> ordering of the index entries you'll see why.)  If this is a typical
> query then you need a separate index on transaction_time_commit.
>
> The fine manual goes into some detail about how to design indexes;
> http://www.postgresql.org/docs/8.3/static/indexes.html
> particularly 11.3, 11.5.

What's weird about this example is that when he sets enable_seqscan to
off, the bitmap index scan plan is actually substantially faster, even
though it in fact does scan nearly the entire heap.  I don't
understand how it can be faster to scan the index and the heap than to
just scan the heap.

...Robert

Re: Postgres not willing to use an index?

От
"Kevin Grittner"
Дата:
>>> Robert Haas <robertmhaas@gmail.com> wrote:
> What's weird about this example is that when he sets enable_seqscan
to
> off, the bitmap index scan plan is actually substantially faster,
even
> though it in fact does scan nearly the entire heap.  I don't
> understand how it can be faster to scan the index and the heap than
to
> just scan the heap.

It's cached in the second test, maybe?

-Kevin

Re: Postgres not willing to use an index?

От
Robert Haas
Дата:
On Fri, Feb 6, 2009 at 12:41 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>>>> Robert Haas <robertmhaas@gmail.com> wrote:
>> What's weird about this example is that when he sets enable_seqscan to
>> off, the bitmap index scan plan is actually substantially faster, even
>> though it in fact does scan nearly the entire heap.  I don't
>> understand how it can be faster to scan the index and the heap than to
>> just scan the heap.
>
> It's cached in the second test, maybe?

I gather that the results were repeatable, but perhaps Mario could
double-check that?

...Robert

Re: Postgres not willing to use an index?

От
Mario Splivalo
Дата:
Tom Lane wrote:
>
> Hardly surprising --- a search on the index's lowest-order column would
> require scanning practically all of the index.  (If you think about the
> ordering of the index entries you'll see why.)  If this is a typical
> query then you need a separate index on transaction_time_commit.

Yes, actually I just moved transaction_time_commit column to the
begining of the index, since, most of the time I run queries based on
transaction_time_commit and then transaction_client_id and
transaction_destination_id.

> The fine manual goes into some detail about how to design indexes;
> http://www.postgresql.org/docs/8.3/static/indexes.html
> particularly 11.3, 11.5.

I see it now. I read the manual concerning CREATE INDEX command, and
there is no mention of multicolumn indices, did not notice Note that
points to Chapter 11.

    Mike


Re: Postgres not willing to use an index?

От
Mario Splivalo
Дата:
Robert Haas wrote:
> On Fri, Feb 6, 2009 at 12:41 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>>>>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> What's weird about this example is that when he sets enable_seqscan to
>>> off, the bitmap index scan plan is actually substantially faster, even
>>> though it in fact does scan nearly the entire heap.  I don't
>>> understand how it can be faster to scan the index and the heap than to
>>> just scan the heap.
>> It's cached in the second test, maybe?
>
> I gather that the results were repeatable, but perhaps Mario could
> double-check that?

I think that it is always cached - the machine has 4GB of RAM, and i'm
just using it for testing. Now, I think that the cache is used because
there is no I/O wait when I run the queries (if you have any suggestion
on how to check cache usage, since I have no idea):

jura=# set enable_seqscan to true;
SET
jura=# explain analyze select * from transactions where
transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59';

             QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on transactions  (cost=0.00..418365.68 rows=759775 width=91)
(actual time=928.342..3788.232 rows=722176 loops=1)
    Filter: ((transaction_time_commit >= '2008-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2008-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 3936.744 ms
(3 rows)

jura=# set enable_seqscan to false;
SET
jura=# explain analyze select * from transactions where
transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59';

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on transactions  (cost=428882.89..651630.52
rows=759775 width=91) (actual time=1358.040..1633.867 rows=722176 loops=1)
    Recheck Cond: ((transaction_time_commit >= '2008-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2008-01-31 23:59:59+01'::timestamp with time zone))
    ->  Bitmap Index Scan on transactions_idx__client_data
(cost=0.00..428692.95 rows=759775 width=0) (actual
time=1354.485..1354.485 rows=722176 loops=1)
          Index Cond: ((transaction_time_commit >= '2008-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2008-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 1778.938 ms
(5 rows)


Now, transactions_idx__client_data index has transaction_time_commit as
the last column in index.

When I 'recreate' the database, and run the queries again, first run
which uses sequential scan is around 10 seconds, heavy I/O, any
subsequent query run takes cca 3900 msecs, as shown above.

When I say 'disable seqscan', planner uses Bitmap Index Scan, as shown
above, just that the first query takes around 25 seconds to run, with
heavy I/O. Any subsequent query runs take somewhat less than 2 seconds,
as shown above.

I'm not sure on what to do to minimize the impact of the OS-cache, apart
from taking RAM modules out of the machine - if you have any suggestions
I'll try to apply them.

On production database I changed the index so that
'transaction_time_commit' is the first column, and now I don't get any
sequential scans on transactions table when only WHERE condition is on
transaction_time_commit.

    Mike

Re: Postgres not willing to use an index?

От
Mario Splivalo
Дата:
Mario Splivalo wrote:
> Robert Haas wrote:
> jura=# set enable_seqscan to false;
> SET
> jura=# explain analyze select * from transactions where
> transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59';
>
>                  QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>  Bitmap Heap Scan on transactions  (cost=428882.89..651630.52
> rows=759775 width=91) (actual time=1358.040..1633.867 rows=722176 loops=1)
>    Recheck Cond: ((transaction_time_commit >= '2008-01-01
> 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
> '2008-01-31 23:59:59+01'::timestamp with time zone))
>    ->  Bitmap Index Scan on transactions_idx__client_data
> (cost=0.00..428692.95 rows=759775 width=0) (actual
> time=1354.485..1354.485 rows=722176 loops=1)
>          Index Cond: ((transaction_time_commit >= '2008-01-01
> 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
> '2008-01-31 23:59:59+01'::timestamp with time zone))
>  Total runtime: 1778.938 ms
> (5 rows)
>
>

I neglected to paste this 'explain analyze', when I changed the index so
that 'transaction_time_commit' is first column in the index:

jura=# explain analyze select * from transactions where
transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59';

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on transactions  (cost=7550.51..233419.58 rows=250880
width=91) (actual time=95.139..280.008 rows=525051 loops=1)
    Recheck Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
    ->  Bitmap Index Scan on transactions_idx__client_data
(cost=0.00..7487.79 rows=250880 width=0) (actual time=93.382..93.382
rows=525051 loops=1)
          Index Cond: ((transaction_time_commit >= '2009-01-01
00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <=
'2009-01-31 23:59:59+01'::timestamp with time zone))
  Total runtime: 386.665 ms
(5 rows)

Thank you, Tom!

    Mike