Обсуждение: Possible infinite loop in query using bitmap scans

Поиск
Список
Период
Сортировка

Possible infinite loop in query using bitmap scans

От
Casey Duncan
Дата:
I have this report query that runs daily on a table with several
hundred million rows total using pg 8.1.3 on Debian Linux on hw with
dual opteron processors:

  SELECT count(*) FROM webhits
        WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
            AND date_recorded >= '3/10/2006'::TIMESTAMP
            AND date_recorded < '3/11/2006'::TIMESTAMP;

Here is the explain output:


             QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------
  Aggregate  (cost=794775.08..794775.09 rows=1 width=0)
    ->  Bitmap Heap Scan on webhits  (cost=315820.45..794771.74
rows=1337 width=0)
          Recheck Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
          Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND
(status = 200))
          ->  BitmapAnd  (cost=315820.45..315820.45 rows=249152 width=0)
                ->  Bitmap Index Scan on webhits_date_idx1
(cost=0.00..140407.45 rows=15379741 width=0)
                      Index Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
                ->  Bitmap Index Scan on webhits_path_idx2
(cost=0.00..175412.76 rows=15343959 width=0)
                      Index Cond: (((path)::text >=
'/radio/tuner'::character varying) AND ((path)::text <
'/radio/tunes'::character varying))

According to the planner it should take <15 minutes which is typical in
practice. About half the times it runs, however, it never terminates
(even after days) and just spins consuming 99+% of CPU with no disk
activity. This query was never a problem in postgres versions < 8.1.2,
however the data has grown substantially since that time. I notice it
uses the recent in-memory bitmap feature, so I wondered if it was
exposing a bug.

If I restart the postmaster, the query will complete in the expected
time.

-Casey


Re: Possible infinite loop in query using bitmap scans

От
Michael Fuhr
Дата:
On Sun, Mar 12, 2006 at 11:36:23PM -0800, Casey Duncan wrote:
>  SELECT count(*) FROM webhits
>        WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
>            AND date_recorded >= '3/10/2006'::TIMESTAMP
>            AND date_recorded < '3/11/2006'::TIMESTAMP;
[...]
>  Aggregate  (cost=794775.08..794775.09 rows=1 width=0)
[...]
> According to the planner it should take <15 minutes which is typical in
> practice.

The planner's cost estimate is in units of disk page fetches, not
time.  The above estimate isn't 794775.09 ms (~13.25 min) but rather
794775.09 times the cost of a single page fetch, however much that
is.  See "Using EXPLAIN" in the "Performance Tips" chapter of the
documentation.

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html#USING-EXPLAIN

> About half the times it runs, however, it never terminates
> (even after days) and just spins consuming 99+% of CPU with no disk
> activity. This query was never a problem in postgres versions < 8.1.2,
> however the data has grown substantially since that time. I notice it
> uses the recent in-memory bitmap feature, so I wondered if it was
> exposing a bug.

If the problem happens half the time then you have a somewhat
repeatable test case.  Do you get more consistent performance if
you set enable_bitmapscan to off?  What's the query plan if you do
that?

If you narrow the search criteria so the query returns fewer rows,
do you still see the problem?  Can you identify a "sour spot" where
the problem starts to happen?

> If I restart the postmaster, the query will complete in the expected
> time.

Does the problem eventually start happening again?  If so, after
how long?  How did you determine that the restart is relevant?  Do
you consistently see different (presumably better) performance after
a restart than if you don't restart?

--
Michael Fuhr

Re: Possible infinite loop in query using bitmap scans

От
Casey Duncan
Дата:
On Mar 13, 2006, at 9:50 AM, Michael Fuhr wrote:

> On Sun, Mar 12, 2006 at 11:36:23PM -0800, Casey Duncan wrote:
>>  SELECT count(*) FROM webhits
>>        WHERE path LIKE '/radio/tuner_%.swf' AND status = 200
>>            AND date_recorded >= '3/10/2006'::TIMESTAMP
>>            AND date_recorded < '3/11/2006'::TIMESTAMP;
> [...]
>>  Aggregate  (cost=794775.08..794775.09 rows=1 width=0)
> [...]
>> According to the planner it should take <15 minutes which is typical
>> in
>> practice.
>
> The planner's cost estimate is in units of disk page fetches, not
> time.  The above estimate isn't 794775.09 ms (~13.25 min) but rather
> 794775.09 times the cost of a single page fetch, however much that
> is.  See "Using EXPLAIN" in the "Performance Tips" chapter of the
> documentation.
>
> http://www.postgresql.org/docs/8.1/interactive/performance-
> tips.html#USING-EXPLAIN

Doh! I should've known that.

>> About half the times it runs, however, it never terminates
>> (even after days) and just spins consuming 99+% of CPU with no disk
>> activity. This query was never a problem in postgres versions < 8.1.2,
>> however the data has grown substantially since that time. I notice it
>> uses the recent in-memory bitmap feature, so I wondered if it was
>> exposing a bug.
>
> If the problem happens half the time then you have a somewhat
> repeatable test case.  Do you get more consistent performance if
> you set enable_bitmapscan to off?  What's the query plan if you do
> that?

Here's the plan with bitmap scans off:


       QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------
  Aggregate  (cost=891363.71..891363.72 rows=1 width=0)
    ->  Index Scan using webhits_date_idx1 on webhits
(cost=0.00..891360.30 rows=1362 width=0)
          Index Cond: ((date_recorded >= '2006-03-10
00:00:00'::timestamp without time zone) AND (date_recorded <
'2006-03-11 00:00:00'::timestamp without time zone))
          Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND
(status = 200))
(4 rows)

The query runs to completion this way in about 40 minutes. I turned
bitmap scans back on and it hangs again (I ran it for about 5 hours).

> If you narrow the search criteria so the query returns fewer rows,
> do you still see the problem?  Can you identify a "sour spot" where
> the problem starts to happen?

I'll do that tomorrow, and let you know.

>> If I restart the postmaster, the query will complete in the expected
>> time.
>
> Does the problem eventually start happening again?  If so, after
> how long?  How did you determine that the restart is relevant?  Do
> you consistently see different (presumably better) performance after
> a restart than if you don't restart?

This is a production box, so I can't restart it whenever I want and I
haven't yet reproduced it elsewhere -- the data base size makes that
cumbersome at best -- but once after it hung up, I restarted postgres
and the report ran to completion for a couple of days then started
hanging again. Today it seems to pretty consistently hang, I'll see if
I can restart it overnight and test it again.

-Casey


Re: Possible infinite loop in query using bitmap scans

От
Casey Duncan
Дата:
On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote:
[..]
>>> If I restart the postmaster, the query will complete in the expected
>>> time.
>>
>> Does the problem eventually start happening again?  If so, after
>> how long?  How did you determine that the restart is relevant?  Do
>> you consistently see different (presumably better) performance after
>> a restart than if you don't restart?

I restarted postgres this morning and this time it didn't seem to help.
That query has been running for several hours now. I'm going to let it
go a while longer to see if it eventually completes, but I suspect it
won't. So perhaps the prior restart was just dumb luck.

I'll try some narrower queries as well.

-Casey


Re: Possible infinite loop in query using bitmap scans

От
Casey Duncan
Дата:
On Mar 14, 2006, at 1:31 PM, Casey Duncan wrote:
> On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote:
> [..]
>>>> If I restart the postmaster, the query will complete in the expected
>>>> time.
>>>
>>> Does the problem eventually start happening again?  If so, after
>>> how long?  How did you determine that the restart is relevant?  Do
>>> you consistently see different (presumably better) performance after
>>> a restart than if you don't restart?
>
> I restarted postgres this morning and this time it didn't seem to
> help. That query has been running for several hours now. I'm going to
> let it go a while longer to see if it eventually completes, but I
> suspect it won't. So perhaps the prior restart was just dumb luck.
>
> I'll try some narrower queries as well.

Interestingly, I can only narrow the query by about an hour (i.e., a 23
hour span) before the plan changes to a simple index scan. That may
explain why this only started hanging recently since it probably only
started using the bitmap scan for a day interval recently.

Let me know if there is anything I should do or any info you need to
further pin this down.

In the mean time I will disable bitmap scans for this query.

Thanks.

-Casey