Обсуждение: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

Поиск
Список
Период
Сортировка

Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Mario Splivalo
Дата:
I have two tables, like this:

Big table:

CREATE TABLE photo_info_data
(
   photo_id integer NOT NULL,
   field_name character varying NOT NULL,
   field_value character varying,
   CONSTRAINT photo_info_data_pk PRIMARY KEY (photo_id, field_name)
)
WITH (OIDS=FALSE);

CREATE INDEX user_info_data_ix_field_value
   ON user_info_data
   USING btree
   (field_value);


Small table:

CREATE TABLE t_query_data
(
   i integer,
   "key" character varying,
   op character varying,
   "value" character varying
)
WITH (OIDS=FALSE);

I have around 2400000 rows in photo_info_data, and just two rows in
t_query_data:
  i |  key     | op | value
---+----------+----+--------
  1 | f-stop   | eq | 2.6
  2 | shutter  | gt | 1/100


This is the query I'm executing:

SELECT
    *
FROM
    photo_info_data u
    JOIN t_query_data t on u.field_name = key

This query takes around 900ms to execute. It returns 6 rows.

When I do 'explain analyze' for some reason it takes around 7 seconds,
and this is what I get:

phototest=# explain analyze select * from photo_info_data u join
t_query_data t on u.field_name = key;
                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58676.31 rows=218048 width=68) (actual
time=2381.895..7087.225 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.042..3454.112 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.016..0.016 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.003..0.007 rows=2 loops=1)
  Total runtime: 7087.291 ms
(6 rows)

Time: 7088.663 ms


I can rerun this query many times, it's always around 7 seconds. I/O
wait during the query is nonexistant, it just takes 100% of CPU time (i
have a DualCore Opteron server).

If I force the planner not to use sequential_scan, here is what I get:

phototest=# explain analyze select * from photo_info_data u join
t_query_data t on u.field_name = key;
                                                                  QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=100039134.84..100130206.79 rows=218048 width=68)
(actual time=271.138..540.998 rows=6 loops=1)
    ->  Seq Scan on t_query_data t  (cost=100000000.00..100000001.02
rows=2 width=18) (actual time=0.008..0.015 rows=2 loops=1)
    ->  Bitmap Heap Scan on photo_info_data u  (cost=39134.84..63740.08
rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
          Recheck Cond: ((u.field_name)::text = (t.key)::text)
          ->  Bitmap Index Scan on photo_info_data_pk
(cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435
rows=3 loops=2)
                Index Cond: ((u.field_name)::text = (t.key)::text)
  Total runtime: 541.065 ms
(7 rows)

Time: 542.147 ms


The database currently has only those two tables. I have vacuumed them
prior running above queries.

I tought this information also might be important:
phototest=# select key, count(*) from photo_info_data u join
t_query_data t on u.field_name = key group by key;
   key     | count
----------+-------
  f-stop   |     3
  shutter  |     3
(2 rows)


Am I doing something wrong here? The photo_info_data would hold around
10.000.000 records, should I be doing 'set seq_scan to false' each time
I will want to run this query? (Since I'm accessing postgres trough JDBC
I'll have same situation I had weeks ago, I described it here also).

    Mike

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Tom Lane
Дата:
Mario Splivalo <mario.splivalo@megafon.hr> writes:
>     ->  Bitmap Heap Scan on photo_info_data u  (cost=39134.84..63740.08
> rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
>           Recheck Cond: ((u.field_name)::text = (t.key)::text)
>           ->  Bitmap Index Scan on photo_info_data_pk
> (cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435
> rows=3 loops=2)
>                 Index Cond: ((u.field_name)::text = (t.key)::text)

You need to figure out why that rowcount estimate is off by more than
four orders of magnitude :-(

            regards, tom lane

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Mario Splivalo
Дата:
Tom Lane wrote:
> Mario Splivalo <mario.splivalo@megafon.hr> writes:
>>     ->  Bitmap Heap Scan on photo_info_data u  (cost=39134.84..63740.08
>> rows=109024 width=50) (actual time=270.464..270.469 rows=3 loops=2)
>>           Recheck Cond: ((u.field_name)::text = (t.key)::text)
>>           ->  Bitmap Index Scan on photo_info_data_pk
>> (cost=0.00..39107.59 rows=109024 width=0) (actual time=270.435..270.435
>> rows=3 loops=2)
>>                 Index Cond: ((u.field_name)::text = (t.key)::text)
>
> You need to figure out why that rowcount estimate is off by more than
> four orders of magnitude :-(

Huh, thnx! :) Could you give me some starting points, what do I do?

Could it be because table is quite large, and there are only 3 columns
that match join condition?

Now, after I finished writing above lines, index creation on
photo_info_data(field_name) was done. When I rerun above query, here is
what I get:

phototest=# explain analyze select field_name, count(*) from
t_query_data t join photo_info_data u on t.key = u.field_name group by
field_name;

  QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=57414.33..57414.61 rows=22 width=9) (actual
time=0.135..0.139 rows=2 loops=1)
    ->  Nested Loop  (cost=2193.50..56324.09 rows=218048 width=9)
(actual time=0.063..0.114 rows=6 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=6) (actual time=0.019..0.022 rows=2 loops=1)
          ->  Bitmap Heap Scan on photo_info_data u
(cost=2193.50..26798.74 rows=109024 width=9) (actual time=0.025..0.030
rows=3 loops=2)
                Recheck Cond: ((u.field_name)::text = (t.key)::text)
                ->  Bitmap Index Scan on photo_info_data_ix__field_name
  (cost=0.00..2166.24 rows=109024 width=0) (actual time=0.019..0.019
rows=3 loops=2)
                      Index Cond: ((u.field_name)::text = (t.key)::text)
  Total runtime: 0.200 ms
(8 rows)


So, I guess I solved my problem! :) The explain analyze still shows that
row estimate is 'quite off' (109024 estimated vs only 3 actuall), but
the query is light-speeded :)

I tought that having primary key (and auto-index because of primary key)
on (photo_id, field_name) should be enough. Now I have two indexes on
field_name, but that seems to do good.

    Mike

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Scott Marlowe
Дата:
On Mon, Mar 30, 2009 at 9:34 AM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:

>         ->  Bitmap Heap Scan on photo_info_data u (cost=2193.50..26798.74
> rows=109024 width=9) (actual time=0.025..0.030 rows=3 loops=2)
>               Recheck Cond: ((u.field_name)::text = (t.key)::text)
>               ->  Bitmap Index Scan on photo_info_data_ix__field_name
>  (cost=0.00..2166.24 rows=109024 width=0) (actual time=0.019..0.019 rows=3
> loops=2)

> So, I guess I solved my problem! :) The explain analyze still shows that row
> estimate is 'quite off' (109024 estimated vs only 3 actuall), but the query
> is light-speeded :)

It's not really solved, it's just a happy coincidence that the current
plan runs well.  In order to keep the query planner making good
choices you need to increase stats target for the field in the index
above.  The easiest way to do so is to do this:

alter database mydb set default_statistics_target=100;

and run analyze again:

analyze;

> I tought that having primary key (and auto-index because of primary key) on
> (photo_id, field_name) should be enough. Now I have two indexes on
> field_name, but that seems to do good.

Nope, it's about the stats collected that let the planner make the right choice.

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Mario Splivalo
Дата:
Scott Marlowe wrote:
>
> It's not really solved, it's just a happy coincidence that the current
> plan runs well.  In order to keep the query planner making good
> choices you need to increase stats target for the field in the index
> above.  The easiest way to do so is to do this:
>
> alter database mydb set default_statistics_target=100;
>
> and run analyze again:
>
> analyze;

So, i removed the index on field_name, set
default_default_statistics_target to 100, analyzed, and the results are
the same:

                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58676.31 rows=218048 width=68) (actual
time=0.067..12268.394 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.013..6426.611 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.006 rows=2 loops=1)
  Total runtime: 12268.459 ms
(6 rows)

I even changed default_statistics_target to 1000:


------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=1.04..58580.29 rows=208561 width=67) (actual
time=0.054..12434.283 rows=6 loops=1)
    Hash Cond: ((u.field_name)::text = (t.key)::text)
    ->  Seq Scan on photo_info_data u  (cost=0.00..47499.46 rows=2398446
width=49) (actual time=0.012..6129.923 rows=2398446 loops=1)
    ->  Hash  (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
          ->  Seq Scan on t_query_data t  (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.004 rows=2 loops=1)
  Total runtime: 12434.338 ms
(6 rows)


Even when I run this query, I get sequential scan:

explain analyze select * from photo_info_data where field_name =
'f-spot' or field_name = 'shutter';

                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
  Seq Scan on photo_info_data  (cost=0.00..59491.69 rows=1705 width=49)
(actual time=0.018..1535.963 rows=6 loops=1)
    Filter: (((field_name)::text = 'f-spot'::text) OR
((field_name)::text = 'shutter'::text))
  Total runtime: 1536.010 ms
(3 rows)

These are the representations of te values 'f-spot' and 'shutter' for
the field field_name in photo_info_data table:

xmltest=# select field_name, count(*) from user_info_data where
field_name in ('visina', 'spol') group by field_name;
  field_name | count
------------+-------
  'f-spot'   |     3
  'shutter'  |     3
(2 rows)


Maybe my test-data is poor? As I've mentioned, photo_info_data has
little over 2300000 rows. And this is complete 'distribution' of the data:

xmltest=# select field_name, count(*) from user_info_data group by
field_name order by count(*) desc;
    field_name   | count
----------------+--------
  field_Xx1      | 350000
  field_Xx2      | 332447
  field_Xx3      | 297414
  field_Xx4      | 262394
  field_Xx5      | 227396
  field_Xx6      | 192547
  field_Xx7      | 157612
  field_Xx8      | 122543
  field_Xx9      |  87442
  field_Xx10     |  52296
  field_1        |  50000
  field_2        |  47389
  field_3        |  42412
  field_4        |  37390
  field_5        |  32366
  field_6        |  27238
  field_7        |  22360
  field_Xx11     |  17589
  field_8        |  17412
  field_9        |  12383
  field_10       |   7386
  field_11       |   2410
  f-spot         |      3
  shutter        |      3
  focal          |      3
  flash          |      3
  m_city         |      3
  person         |      3
  iso            |      2
(29 rows)

No matter what field_name value I enter in WHERE condition, planner
chooses sequential scan. Only when I add seperate index on field_name,
planner chooes index scan or bitmap index scan.

    Mike

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Scott Marlowe
Дата:
On Mon, Apr 6, 2009 at 6:20 AM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:
> Scott Marlowe wrote:
>>
>> It's not really solved, it's just a happy coincidence that the current
>> plan runs well.  In order to keep the query planner making good
>> choices you need to increase stats target for the field in the index
>> above.  The easiest way to do so is to do this:
>>
>> alter database mydb set default_statistics_target=100;
>>
>> and run analyze again:
>>
>> analyze;
>
> So, i removed the index on field_name, set default_default_statistics_target
> to 100, analyzed, and the results are the same:

Why did you remove the index?

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Mario Splivalo
Дата:
Scott Marlowe wrote:
> On Mon, Apr 6, 2009 at 6:20 AM, Mario Splivalo
> <mario.splivalo@megafon.hr> wrote:
>> Scott Marlowe wrote:
>>> It's not really solved, it's just a happy coincidence that the current
>>> plan runs well.  In order to keep the query planner making good
>>> choices you need to increase stats target for the field in the index
>>> above.  The easiest way to do so is to do this:
>>>
>>> alter database mydb set default_statistics_target=100;
>>>
>>> and run analyze again:
>>>
>>> analyze;
>> So, i removed the index on field_name, set default_default_statistics_target
>> to 100, analyzed, and the results are the same:
>
> Why did you remove the index?
>

Because I already have index on that column, index needed to enforce PK
constraint. Here is the original DDL for the table:

CREATE TABLE photo_info_data
(
   photo_id integer NOT NULL,
   field_name character varying NOT NULL,
   field_value character varying,
   CONSTRAINT photo_info_data_pk PRIMARY KEY (user_id, field_name)
)

CREATE INDEX photo_info_data_ix_field_value
   ON user_info_data USING btree (field_value);

So, there is index on (user_id, field_name). Postgres is using index for
user_id (...WHERE user_id = 12345) but not on field-name (...WHERE
field_name = 'f-spot'). When I add extra index on field name:

CREATE INDEX photo_info_data_ix__field_name
   ON user_info_data USING btree (field_name);

Then that index is used.

    Mike

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Scott Marlowe
Дата:
On Mon, Apr 6, 2009 at 8:37 AM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:
> Scott Marlowe wrote:
>>
>> On Mon, Apr 6, 2009 at 6:20 AM, Mario Splivalo
>> <mario.splivalo@megafon.hr> wrote:
>>>
>>> Scott Marlowe wrote:
>>>>
>>>> It's not really solved, it's just a happy coincidence that the current
>>>> plan runs well.  In order to keep the query planner making good
>>>> choices you need to increase stats target for the field in the index
>>>> above.  The easiest way to do so is to do this:
>>>>
>>>> alter database mydb set default_statistics_target=100;
>>>>
>>>> and run analyze again:
>>>>
>>>> analyze;
>>>
>>> So, i removed the index on field_name, set
>>> default_default_statistics_target
>>> to 100, analyzed, and the results are the same:
>>
>> Why did you remove the index?
>>
>
> Because I already have index on that column, index needed to enforce PK
> constraint. Here is the original DDL for the table:
>
> CREATE TABLE photo_info_data
> (
>  photo_id integer NOT NULL,
>  field_name character varying NOT NULL,
>  field_value character varying,
>  CONSTRAINT photo_info_data_pk PRIMARY KEY (user_id, field_name)
> )
>
> CREATE INDEX photo_info_data_ix_field_value
>  ON user_info_data USING btree (field_value);
>
> So, there is index on (user_id, field_name). Postgres is using index for
> user_id (...WHERE user_id = 12345) but not on field-name (...WHERE
> field_name = 'f-spot'). When I add extra index on field name:
>
> CREATE INDEX photo_info_data_ix__field_name
>  ON user_info_data USING btree (field_name);
>
> Then that index is used.

On older versions of pgsql, the second of two terms in a multicolumn
index can't be used alone.  On newer versions it can, but it is much
less efficient than if it's a single column index or if the term is
the first one not the second.

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Mario Splivalo
Дата:
Scott Marlowe wrote:
>> CREATE INDEX photo_info_data_ix_field_value
>>  ON user_info_data USING btree (field_value);
>>
>> So, there is index on (user_id, field_name). Postgres is using index for
>> user_id (...WHERE user_id = 12345) but not on field-name (...WHERE
>> field_name = 'f-spot'). When I add extra index on field name:
>>
>> CREATE INDEX photo_info_data_ix__field_name
>>  ON user_info_data USING btree (field_name);
>>
>> Then that index is used.
>
> On older versions of pgsql, the second of two terms in a multicolumn
> index can't be used alone.  On newer versions it can, but it is much
> less efficient than if it's a single column index or if the term is
> the first one not the second.

I'm using 8.3.7. So, you'd also suggest to keep that extra (in a way
redundant) index on field_name, since I need PK on (photo_id, field_name) ?

    Mike

Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

От
Scott Marlowe
Дата:
On Mon, Apr 6, 2009 at 8:50 AM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:
> Scott Marlowe wrote:
>>>
>>> CREATE INDEX photo_info_data_ix_field_value
>>>  ON user_info_data USING btree (field_value);
>>>
>>> So, there is index on (user_id, field_name). Postgres is using index for
>>> user_id (...WHERE user_id = 12345) but not on field-name (...WHERE
>>> field_name = 'f-spot'). When I add extra index on field name:
>>>
>>> CREATE INDEX photo_info_data_ix__field_name
>>>  ON user_info_data USING btree (field_name);
>>>
>>> Then that index is used.
>>
>> On older versions of pgsql, the second of two terms in a multicolumn
>> index can't be used alone.  On newer versions it can, but it is much
>> less efficient than if it's a single column index or if the term is
>> the first one not the second.
>
> I'm using 8.3.7. So, you'd also suggest to keep that extra (in a way
> redundant) index on field_name, since I need PK on (photo_id, field_name) ?

Either that or reverse the terms in the pk.

Also, you might want to look at adjusting random_page_access to
something around 1.5 to 2.0.