Re: [HACKERS] Heh, the disappearing problem!

Поиск
Список
Период
Сортировка
От Karl Denninger
Тема Re: [HACKERS] Heh, the disappearing problem!
Дата
Msg-id 19980309213350.59454@mcs.net
обсуждение исходный текст
Ответ на Re: [HACKERS] Heh, the disappearing problem!  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [HACKERS] Heh, the disappearing problem!  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-hackers
On Mon, Mar 09, 1998 at 10:01:17PM -0500, Bruce Momjian wrote:
> > 1)    Select a set of records from a join on a couple of tables.
> >
> > 2)    Select from a different table (using the results from the first
> >     select), looking specifically for certain data which we then use
> >     programmatically to perform a set of computations.
> >
> > Now, the first select is just peachy.  It returns ~1500 or so records.
> >
> > The iterative loop on the second select used to run through the entire 1500
> > records or so (doing a select for each) in ~20-30 seconds.  Now it takes
> > roughly 2-3 minutes to do the same job.  I have checked with "explain" that
> > the indices are being used for all queries - they are.
> >
> > I've seen this also with a few other processes that do the same kind of
> > thing, and get the same results.
> >
> > I'm wondering if what we're seeing here is a severe degredation of locking
> > performance.  That is, could this be related to the new deadlock detection
> > code?  We're doing selects/updates within several tables  in these jobs, but
> > all within one transaction block (begin/commit or begin/rollback).
>
> The deadlock code only runs after a minute of waiting for a lock, and
> every minute thereafter, so if you are not waiting on a lock, you don't
> run that code.
>
> Now the queue behavior has changed.  We used to give readers preference,
> but now we do some queue management that is fairer to readers and
> writers.

Well, the other problem that comes with this however is CPU load on the
server.  I find that the system load goes through the ceiling - there's
not a lot of disk I/O going on during this time, but the CPU is damn busy,
almost all of it in user time.

I don't think this explains all of it.  Something is going on in the
interaction of the different processes being handled.

Basically, we have the following:

1)    One big "select" to get candidates.

2)    A second select on another table using values from the candidate to
    check two algorythmic values (which can't really be pushed up into
    the first select as a join - if it could, that would make things
    easy).

3)    If either of the two conditions are met, then we take a process
    detour and do the appropriate system-level things, and update the
    candidate record. If we do take either of these, we open a
    transaction to make sure we either commit all changes or none of
    them.

4)    If the candidate was modified (ie: a "begin" was executed along
    with one or more "updates") we send a "commit", assuming we were
    able to do the system-level things ok.

While this is running, no other instance can be (we use another table with a
single "flag" value as a means of preventing this), because if two of these
come along at once all hell would break loose.

This is "set off" by an async trigger when there could be candidates.

With 6.2.1 this worked fine - we got through the entire process in a few
seconds, and all was well.

With 6.3 I have had to schedule this to run on 30 minute intervals, because
its 10x+ slower than it was under 6.2.1.  I'm giving serious consideration
to a table reconstruction so I can use a join to get all of this in one
pass, which will get rid of the need to loop over the first select's
returned values.

I've seen this same kind of behavior in a few other places as well; in
places where you are doing reads and writes in a mixed environment (ie: read
something, write something (typically in a different table) based on what
you read) performance of 6.3 has gone in the toilet.  Update jobs that used
to run in tens of seconds are requiring several minutes to complete now.

And again, we're not seeing much disk I/O during this period - but we *ARE*
seeing a hell of a lot of CPU activity, almost all in user mode.

--
--
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly to FULL DS-3 Service
                 | NEW! K56Flex support on ALL modems
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "박귀태"
Дата:
Сообщение: ...
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Why no Oracle, Sybase, Informix etc.