Обсуждение: 7.1.3 w/ Perl/DBI application hangs

Поиск
Список
Период
Сортировка

7.1.3 w/ Perl/DBI application hangs

От
"Doran L. Barton"
Дата:
I've got a fairly mission-critical Perl/mod_perl application built on top of
PostgreSQL that is causing me grief. I've been battling it for a week or so
and have determined I think the problem is a PostgreSQL bug.

The application is a web-based event-scheduling system for an office of
accountants. The server is an Athlon 1.0Ghz CPU running Red Hat 7.2 and Red
Hat's PostgreSQL 7.1.3 build. The hard disks are UW SCSI disks connected
via software RAID level 1 (complements of Red Hat Linux). There is 256M of
RAM available. I've tweaked the max_connections, shared_buffers, and
sort_mem variables with no change in behavior resulting.

I recently revised the application to make better use of complex queries
in PostgreSQL instead of doing most of the work in Perl - which was making
things quite slow.

The problem manifests itself when these more-complex queries seem to "hang" -
never returning any kind of result whatsoever. When this happens, a 'ps' on
the server shows SELECT queries spinning - taking up CPU and memory. When
frustrated users refresh their browsers, this creates more of these
processes - until someone pages me and I log in and the load average is
something like 12.2.

I do VACUUM ANALYZE daily - even though the data in the database doesn't
change that much. There are approximately 16,000 records in the events
database which comprises about three years of events data (this application
has been running for a couple years). Doing a VACUUM ANALYZE when the
database is "hanging" doesn't change anything. The VACUUM ANALYZE does
return a successful result.

Stopping the postgresql service and restarting also does not fix the
proble. It does kill off the "hanging" processes, but once the database
server is running again, new queries hang just as before.

The ONLY (temporary) solution I've found is:

  (1) shut down Apache
  (2) restart PostgreSQL
  (3) pg_dump the database
  (4) stop Postgresql
  (5) rename the data/ directory and create a new data/ directory
  (6) Initalize the database in the new data/ directory
  (7) recreate users and my database
  (8) restore the data from the previous dump
  (9) restart Apache

After I do the above steps, the application works fine for about 24 hours.

Then the problem begins manifesting itself again.  Another thing I've
noticed is that this problem devlops independently of actual application
usage. For example, I will go through the above steps on a Saturday evening
and users will complain of not being able to use the application on Monday
morning. The application is not used at all on Sundays.

Now, I'm not familiar with some more the more down-and-dirty tools of
debugging PostgreSQL (although I've been using PG for various web-based
application projects since 1995), so please bear with me. I'll run
whatever commands I am instructed. If a developer needs ssh access to
the machine to witness first-hand what is going on, that can be arranged.

This is a serious bug and it is affecting a potentially cool application.
Please help me. :) I'm a PostgreSQL poster child.

BTW: The queries I'm submitting look like this:

    -- Find weekly/biweekly recurring events for user
    SELECT event.* FROM
      event, u_participant WHERE
      event.event_id = u_participant.event_id AND
      u_participant.username = 'rld' AND
      (
        event.recur = 'Bi-Weekly' OR
        event.recur = 'Weekly' ) AND
      event.t_begin < '2001-08-05' AND
      event.recur_end >= '2001-07-29'
    UNION
    -- Find weekly/biweekly recurring events for groups user is in
    SELECT event.* FROM
       event, g_participant, group_user WHERE
       event.event_id = g_participant.event_id AND
       group_user.group_id = g_participant.group_id AND
       group_user.username = 'rld' AND
    (
        event.recur = 'Bi-Weekly' OR
        event.recur = 'Weekly' ) AND
      event.t_begin < '2001-08-05' AND
      event.recur_end >= '2001-07-29'
    UNION
    -- Find monthly recurring events for user
    SELECT event.* FROM
      event, u_participant WHERE
      event.event_id = u_participant.event_id AND
      u_participant.username = 'rld' AND
      event.recur = 'Monthly' AND
      (
         (
           date_part('month', DATE '2001-07-29') =
             date_part('month', DATE '2001-08-05') AND
           (
             date_part('day', event.t_begin) >=
               date_part('day', DATE '2001-07-29')  AND
             date_part('day', event.t_begin) <
               date_part('day', DATE '2001-08-05'))) OR
         (
           (
             date_part('month', DATE '2001-07-29') =
               date_part('month', event.t_begin) AND
             date_part('day', event.t_begin) >=
               date_part('day', DATE '2001-07-29')) OR
           (
             date_part('month', DATE '2001-08-05') =
               date_part('month', event.t_begin) AND
             date_part('day', event.t_begin) <
               date_part('day', DATE '2001-08-05')))) AND

       event.t_begin < '2001-08-05' AND
       event.recur_end >= '2001-07-29'
       UNION ...

... You get the picture.

-=Fozz

--
-------------------------------------------------------------------------
Doran L. Barton <fozz@iodynamics.com> - Chief Super Hero - Iodynamics LLC
< http://www.iodynamics.com/ > - Linux solutions and dynamic websites
 "Cars will not have intercourse on this bridge."
    -- Seen in a Tokyo traffic handbook

Re: 7.1.3 w/ Perl/DBI application hangs

От
"Doran L. Barton"
Дата:
I have made an interesting observation that SHOULD have been made before
(yeah, stupid of me not to have made this connection before):

The problem I'm having with the queries "hanging" is apparently happening
as soon as a VACUUM ANALYZE is done on the database. Until a VACUUM ANALYZE
is done, the database is speedy and responsive. As soon as a VACUUM ANALYZE
is performed, the same queries executed before spin indefinitely.

A VACUUM by itself doesn't bring on this behavior- so it must be the
ANALYZE portion.

Again, this is 7.1.3 built by Red Hat for Red Hat 7.2 running on an Athlon
1.0Ghz system. Please let me know what information could be helpful in
tracking down this "bug." Is it a bug?

-=Fozz

--
-------------------------------------------------------------------------
Doran L. Barton <fozz@iodynamics.com> - Chief Super Hero - Iodynamics LLC
< http://www.iodynamics.com/ > - Linux solutions and dynamic websites
 "We take your bags and send them in all directions."
    -- Seen in a Copenhagen airline ticket office

Re: 7.1.3 w/ Perl/DBI application hangs

От
Tom Lane
Дата:
"Doran L. Barton" <fozz@iodynamics.com> writes:
> The problem I'm having with the queries "hanging" is apparently happening
> as soon as a VACUUM ANALYZE is done on the database. Until a VACUUM ANALYZE
> is done, the database is speedy and responsive. As soon as a VACUUM ANALYZE
> is performed, the same queries executed before spin indefinitely.

> A VACUUM by itself doesn't bring on this behavior- so it must be the
> ANALYZE portion.

Interesting.  VACUUM ANALYZE essentially does the following:

        begin transaction;
        vacuum table;        -- slow
        commit transaction;
        [ stop here if no ANALYZE requested ]
        begin transaction;
        delete rows for table in pg_statistic;
        scan table to construct new statistics;    -- slow
        add rows for table to pg_statistic;
        commit transaction;

I am wondering if queries started while the ANALYZE scan is in progress
somehow fail to see any statistics available for the table, and end up
choosing stupid query plans.  That shouldn't happen, since the deletion
of the old rows is part of a not-yet-committed transaction --- other
transactions should see the old rows as still valid.  But it'd be worth
your while to launch a few EXPLAINs to see if the plans for the problem
queries change while the ANALYZE phase is in progress.

            regards, tom lane