Обсуждение: Use of sequence rather than index scan for one text column on one instance of a database

Поиск
Список
Период
Сортировка

Use of sequence rather than index scan for one text column on one instance of a database

От
Bill Kirtley
Дата:
Hello-

I've discovered that lookups on one column in one instance of my
database performs badly.

The table has columns 'email' and 'key', both of type 'character
varying(255)', and both with btree indices.  The table has ~ 500k
rows, and no rows of either column are blank or null, and all values
are different.

\d users (abbreviated)
                                           Table "public.users"
         Column        |            Type
|                     Modifiers
----------------------+-----------------------------
+----------------------------------------------------
  id                   | integer                     | not null
default nextval('users_id_seq'::regclass)
  password             | character varying(40)       | not null
  email                | character varying(255)      | not null
  key                  | character varying(255)      |
...
Indexes:
     "users_pkey" PRIMARY KEY, btree (id)
     "index_users_on_email" UNIQUE, btree (email)
     "users_key_index" btree (key)
     "xxx" btree (email)

On the main production database, a select looking at the email column
winds up scanning the whole table:

EXPLAIN ANALYZE SELECT * FROM users WHERE (users.email = 'example.com');
                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------
  Seq Scan on users  (cost=0.00..21097.90 rows=1 width=793) (actual
time=186.692..186.692 rows=0 loops=1)
    Filter: ((email)::text = 'example.com'::text)
  Total runtime: 186.735 ms
(3 rows)

... where on that same database selecting on the 'key' column uses the
index as expected:

EXPLAIN ANALYZE SELECT * FROM users WHERE (users.key = 'example.com');
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Index Scan using users_key_index on users  (cost=0.00..6.38 rows=1
width=793) (actual time=0.021..0.021 rows=0 loops=1)
    Index Cond: ((key)::text = 'example.com'::text)
  Total runtime: 0.061 ms
(3 rows)

We're running postgresql 8.3 on solaris with 8G of RAM on a sun X4100
connected to a battery-backed sun disk shelf.

select version(); reports: PostgreSQL 8.3.3 64-bit on i386-pc-
solaris2.11, compiled by /opt/SUNWspro.40/SS11/bin/cc -Xa

We have test databases which are restored (pg_dump/pg_restore) backups
of this data,  and on these the select on 'email' uses the index as
expected.

Dropping and re-adding that 'index_users_on_email' had no effect.

Spelunking through our logs we seem to have had this problem as far
back as I can practically go, so I can't look at any changes that
might be suspicious.

We did try adding a new column (cleverly named email2) and copying the
data (update users set email2=email) and adding the appropriate index
and the query performed quickly.  So we can fix the immediate problem,
but I'd feel more comfortable understanding it.

Do folks on this list have suggestions for how to further diagnose this?

Thanks in advance,
-Bill Kirtley

Re: Use of sequence rather than index scan for one text column on one instance of a database

От
Tom Lane
Дата:
Bill Kirtley <bill@actbluetech.com> writes:
> On the main production database, a select looking at the email column
> winds up scanning the whole table:
> ... where on that same database selecting on the 'key' column uses the
> index as expected:

That's just bizarre.  I assume that setting enable_seqscan = off
doesn't persuade it to use the index either?

> Dropping and re-adding that 'index_users_on_email' had no effect.

How did you do that exactly?  A regular CREATE INDEX, or did you
use CREATE INDEX CONCURRENTLY?  If the latter, please show the output
from
select xmin,* from pg_index where indexrelid = 'index_users_on_email'::regclass;

I notice you have two indexes on email:

> Indexes:
>      "users_pkey" PRIMARY KEY, btree (id)
>      "index_users_on_email" UNIQUE, btree (email)
>      "users_key_index" btree (key)
>      "xxx" btree (email)

I can't think why that would be a problem, but does getting rid of
the "xxx" one make a difference?

> We have test databases which are restored (pg_dump/pg_restore) backups
> of this data,  and on these the select on 'email' uses the index as
> expected.

Are the test machines using the exact same Postgres executables?

            regards, tom lane

Re: Use of sequence rather than index scan for one text column on one instance of a database

От
Tom Lane
Дата:
Bill Kirtley <bill@actbluetech.com> writes:
> For what it's worth:

> select xmin,* from pg_index where indexrelid =
> 'users_key_index'::regclass;
>   xmin | indexrelid | indrelid | indnatts | indisunique | indisprimary
> | indisclustered | indisvalid | indcheckxmin | indisready | indkey |
> indclass | indoption | indexprs | indpred
> ------+------------+----------+----------+-------------+--------------
> +----------------+------------+--------------+------------+--------
> +----------+-----------+----------+---------
>   1006 |   15076176 |    17516 |        1 | f           | f
> | f              | t          | f            | t          | 10
> |    10042 | 0         |          |

Uh ... 'users_key_index'?  What's that?  What would actually be the most
useful is to compare the pg_index entries for the working and
non-working indexes (the ones on email and key).

            regards, tom lane

Re: Use of sequence rather than index scan for one text column on one instance of a database

От
Tom Lane
Дата:
Bill Kirtley <bill@actbluetech.com> writes:
> select xmin,* from pg_index where indexrelid =
> 'index_users_on_email'::regclass;
>     xmin   | indexrelid | indrelid | indnatts | indisunique |
> indisprimary | indisclustered | indisvalid | indcheckxmin | indisready
> | indkey | indclass | indoption | indexprs | indpred
> ----------+------------+----------+----------+-------------
> +--------------+----------------+------------+--------------
> +------------+--------+----------+-----------+----------+---------
>   12651453 |   24483560 |    17516 |        1 | t           |
> f            | f              | t          | t            | t
> | 6      |    10042 | 0         |          |
> (1 row)

Okay, the basic cause of the issue is now clear: the index has
indcheckxmin true, which means it's not usable until local
TransactionXmin exceeds the tuple's xmin (12651453 here).  This
is all a pretty unsurprising consequence of the HOT optimizations
added in 8.3.  The question is why that state persisted long
enough to be a problem.  Perhaps you have long-running background
transactions?  TransactionXmin is normally the oldest XID that was
running when your own transaction started, so basically the index
isn't usable until all transactions that were running while it
was built complete.  I had been thinking that this only happened
for concurrent index builds, but actually regular builds can be
subject to it as well.

We've seen some complaints about this behavior before.  I wonder if
there's a way to work a bit harder to avoid the indcheckxmin labeling
--- right now the code is pretty conservative about setting that bit
if there's any chance at all of an invalid HOT chain.

            regards, tom lane

Re: Use of sequence rather than index scan for one text column on one instance of a database

От
Bill Kirtley
Дата:
Hi Tom-

Thanks for the response, but I'm not sure what to do with it.

Are you suggesting we might have some transaction (or residue) that's
hanging around and causing this problem?

We do have transactions that run on the order of  a couple minutes at
times.  In the past, under heavy db load they have piled up on top of
each other, but as far as I can tell they've finished.

Is there something more I can look at to try and diagnose this?  As I
mentioned we do have a workaround of copying the column and building
an index on the new column ... is it time to take that step?  And if
so, should we be monitoring for this sort of thing on an ongoing basis?

Are there different options we can pass to CREATE INDEX to build and
index that would be usable?  Should we be stopping our application
while applying indices (we don't always) to ensure the db is quiescent
at the time?

Regards,
-Bill Kirtley

On Sep 24, 2009, at 12:26 PM, Tom Lane wrote:

> Bill Kirtley <bill@actbluetech.com> writes:
>> select xmin,* from pg_index where indexrelid =
>> 'index_users_on_email'::regclass;
>>    xmin   | indexrelid | indrelid | indnatts | indisunique |
>> indisprimary | indisclustered | indisvalid | indcheckxmin |
>> indisready
>> | indkey | indclass | indoption | indexprs | indpred
>> ----------+------------+----------+----------+-------------
>> +--------------+----------------+------------+--------------
>> +------------+--------+----------+-----------+----------+---------
>>  12651453 |   24483560 |    17516 |        1 | t           |
>> f            | f              | t          | t            | t
>> | 6      |    10042 | 0         |          |
>> (1 row)
>
> Okay, the basic cause of the issue is now clear: the index has
> indcheckxmin true, which means it's not usable until local
> TransactionXmin exceeds the tuple's xmin (12651453 here).  This
> is all a pretty unsurprising consequence of the HOT optimizations
> added in 8.3.  The question is why that state persisted long
> enough to be a problem.  Perhaps you have long-running background
> transactions?  TransactionXmin is normally the oldest XID that was
> running when your own transaction started, so basically the index
> isn't usable until all transactions that were running while it
> was built complete.  I had been thinking that this only happened
> for concurrent index builds, but actually regular builds can be
> subject to it as well.
>
> We've seen some complaints about this behavior before.  I wonder if
> there's a way to work a bit harder to avoid the indcheckxmin labeling
> --- right now the code is pretty conservative about setting that bit
> if there's any chance at all of an invalid HOT chain.
>
>             regards, tom lane