Обсуждение: problems with new vacuum (??)

Поиск
Список
Период
Сортировка

problems with new vacuum (??)

От
Barry Lind
Дата:
Over the last two days I have been struggling with running vacuum on a 
7.2b4 database that is running in a production environment.  This was 
essentially the first time I ran vacuum on this database since it was 
upgraded to 7.2.  This database is characterized by one large table that 
has many inserts and deletes, however generally contains zero rows.  So 
over the course of the last few weeks this table had grown in size to 
about 2.5G (or more correctly the corresponding toast table grew that 
large).

So the first problem I had was that the vaccum (regular vacuum not full 
vacuum) took a very long time on this table (2+ hours).  Now I would 
expect it to take a while, so that in and of itself isn't a problem. 
But while this vacuum was running the rest of the system was performing 
very poorly.  Opperations that usually are subsecond, where taking 
minutes to complete.  At first I thought there was some sort of locking 
problem, but these opperations did complete, but after a very long time.

In looking at the log files from this time, I noticed that while the 
vacuum process was running, there were a lot of the following messages 
in the log file:

2001-12-31 22:16:40 [20655]  DEBUG:  recycled transaction log file 
000000010000009A

The interesting thing (at least in my mind) is that these messages were 
produced by all of the other postgres processes, not by the vacuum 
process.  (And by the way what do they mean?)



The second issue I noticed was that the vacuum process later just hung.  Since I didn't think the new vacuum was
supposedto hang (since I 
 
thought it tried a best effort and if it couldn't lock something it 
would just skip it).

2001-12-31 22:18:04 [19945]  NOTICE:  --Relation xyf_files--
2001-12-31 22:21:51 [20673]  DEBUG:  recycled transaction log file 
000000010000009C
2001-12-31 22:21:51 [20673]  DEBUG:  recycled transaction log file 
000000010000009D
2001-12-31 22:21:51 [20673]  DEBUG:  recycled transaction log file 
000000010000009B
2001-12-31 22:31:54 [20711]  DEBUG:  recycled transaction log file 
000000010000009F
2001-12-31 22:31:54 [20711]  DEBUG:  recycled transaction log file 
000000010000009E
2002-01-01 07:30:58 [19945]  ERROR:  Query was cancelled.

It hung until I cancelled the vacuum with a ^c.  So then I tried to 
rerun the vacuum and it hung in the same spot, this time for 1.5 hours 
before I killed it.

Thinking that maybe there was some sort of problem with this table I ran 
a vacuum full (after restarting the database to make sure no other 
processes would be locking the full vacuum) and it ran to completion.

Now after the full vacuum the the regular vacuum runs almost instantly 
with no further problems.

Is there a bug here, I don't know, but I thought it was interesting 
enough to post what I just saw.

thanks,
--Barry





Re: problems with new vacuum (??)

От
Tom Lane
Дата:
Barry Lind <barry@xythos.com> writes:
> But while this vacuum was running the rest of the system was performing 
> very poorly.  Opperations that usually are subsecond, where taking 
> minutes to complete.

Is this any different from the behavior of 7.1 vacuum?  Also, what
platform are you on?

I've noticed on a Linux 2.4 box (RH 7.2, typical commodity-grade PC
hardware) that vacuum, pgbench, or almost any I/O intensive operation
drives interactive performance into the ground.  I have not had an
opportunity to try to characterize the problem, but I suspect Linux's
disk I/O scheduler is not bright enough to prioritize interactive
operations.

> 2001-12-31 22:16:40 [20655]  DEBUG:  recycled transaction log file 
> 000000010000009A

> The interesting thing (at least in my mind) is that these messages were 
> produced by all of the other postgres processes, not by the vacuum 
> process.

No surprise, as they're coming from the checkpoint process(es).

> The second issue I noticed was that the vacuum process later just hung. 

You sure you just didn't wait long enough?

There was a deadlock condition found in 7.2b4 recently, but I am not
convinced that it could affect VACUUM.  Anyway, if you can replicate
the problem then please attach to the stuck process with gdb and provide
a stack backtrace.
        regards, tom lane


Re: problems with new vacuum (??)

От
Barry Lind
Дата:
Tom,

The platform is Redhat 7.0 with a 2.2.19 kernal.

The behavior is different from the 7.1 vacuum, in 7.1, processes would 
just hang since they needed to access the table being vacuumed.  So they 
would hang as long as the vacuum took on a particular table.  In 7.2 
they proceed, and don't hang, but take a long time.  So you could say 
that 7.2 is better than 7.1, but my expectations where higher of 7.2.  I 
was expecting vacuum to be a benign background process that could run in 
parallel with other transactions.  That doesn't yet seem to be the case.  I will continue to monitor my system and see
ifthis is reproducable.  I will then try to get a backtrace if I find I have a reproducable case.
 

thanks,
--Barry

Tom Lane wrote:

> Barry Lind <barry@xythos.com> writes:
> 
>>But while this vacuum was running the rest of the system was performing 
>>very poorly.  Opperations that usually are subsecond, where taking 
>>minutes to complete.
>>
> 
> Is this any different from the behavior of 7.1 vacuum?  Also, what
> platform are you on?
> 
> I've noticed on a Linux 2.4 box (RH 7.2, typical commodity-grade PC
> hardware) that vacuum, pgbench, or almost any I/O intensive operation
> drives interactive performance into the ground.  I have not had an
> opportunity to try to characterize the problem, but I suspect Linux's
> disk I/O scheduler is not bright enough to prioritize interactive
> operations.
> 
> 
>>2001-12-31 22:16:40 [20655]  DEBUG:  recycled transaction log file 
>>000000010000009A
>>
> 
>>The interesting thing (at least in my mind) is that these messages were 
>>produced by all of the other postgres processes, not by the vacuum 
>>process.
>>
> 
> No surprise, as they're coming from the checkpoint process(es).
> 
> 
>>The second issue I noticed was that the vacuum process later just hung. 
>>
> 
> You sure you just didn't wait long enough?
> 
> There was a deadlock condition found in 7.2b4 recently, but I am not
> convinced that it could affect VACUUM.  Anyway, if you can replicate
> the problem then please attach to the stuck process with gdb and provide
> a stack backtrace.
> 
>             regards, tom lane
> 
> 




Re: problems with new vacuum (??)

От
Andrew McMillan
Дата:
On Wed, 2002-01-02 at 13:31, Barry Lind wrote:
> Over the last two days I have been struggling with running vacuum on a 
> 7.2b4 database that is running in a production environment.  This was 
> essentially the first time I ran vacuum on this database since it was 
> upgraded to 7.2.  This database is characterized by one large table that 
> has many inserts and deletes, however generally contains zero rows.  So 
> over the course of the last few weeks this table had grown in size to 
> about 2.5G (or more correctly the corresponding toast table grew that 
> large).
> 
> So the first problem I had was that the vaccum (regular vacuum not full 
> vacuum) took a very long time on this table (2+ hours).  Now I would 
> expect it to take a while, so that in and of itself isn't a problem. 
> But while this vacuum was running the rest of the system was performing 
> very poorly.  Opperations that usually are subsecond, where taking 
> minutes to complete.  At first I thought there was some sort of locking 
> problem, but these opperations did complete, but after a very long time.

Is it possible that you waited until a point when the work that vacuum
has to do is being undone faster by the new transactions coming
through?  This might be complicated by the fact that (from your vague
description) the table is heavily toasted.

Also, as a suggestion, if you can know there are zero records in the
table very often, why not TRUNCATE it at those times?  That should be a
_lot_ quicker than vacuuming it!

Cheers,                Andrew.
-- 
--------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/        PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201    MOB: +64(21)635-694    OFFICE: +64(4)499-2267      Are you enrolled at
http://schoolreunions.co.nz/yet?
 



Re: problems with new vacuum (??)

От
Hannu Krosing
Дата:
Tom Lane wrote:
> 
> Barry Lind <barry@xythos.com> writes:
> > But while this vacuum was running the rest of the system was performing
> > very poorly.  Opperations that usually are subsecond, where taking
> > minutes to complete.
> 
> Is this any different from the behavior of 7.1 vacuum?  Also, what
> platform are you on?
> 
> I've noticed on a Linux 2.4 box (RH 7.2, typical commodity-grade PC
> hardware) that vacuum, pgbench, or almost any I/O intensive operation
> drives interactive performance into the ground.

They drive each other to the ground too ;(

When I tried to run the new vacuum concurrently with a pgbench in hope 
to make it perform better for large number of updates (via removing the 
need to scan large number of dead tuples) 1 concurrent vacuum was able
to 
make 128 pgbench backends more than twice as slow as they were without
vacuum. 
And this is an extra slowdown from another 2-3X slowdown due to dead
tuples 
(got from comparing speed on VACUUM FULL db and db aftre doing ~10k 
pgbench transactions)

> I have not had an
> opportunity to try to characterize the problem, but I suspect Linux's
> disk I/O scheduler is not bright enough to prioritize interactive
> operations.

Have you any ideas how to distinguish between interactive and
non-interactive
disk I/O coming from postgresql backends ?

Can I for example nice the vacuum'ing backend without getting the 
"reverse priority" effects ?

> > 2001-12-31 22:16:40 [20655]  DEBUG:  recycled transaction log file
> > 000000010000009A
> 
> > The interesting thing (at least in my mind) is that these messages were
> > produced by all of the other postgres processes, not by the vacuum
> > process.
> 
> No surprise, as they're coming from the checkpoint process(es).
> 
> > The second issue I noticed was that the vacuum process later just hung.
> 
> You sure you just didn't wait long enough?
> 
> There was a deadlock condition found in 7.2b4 recently, but I am not
> convinced that it could affect VACUUM.  Anyway, if you can replicate
> the problem then please attach to the stuck process with gdb and provide
> a stack backtrace.
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


Re: problems with new vacuum (??)

От
"Matthew T. O'Connor"
Дата:
On Tuesday 01 January 2002 11:23 pm, Barry Lind wrote:
> Tom,
>
> The platform is Redhat 7.0 with a 2.2.19 kernal.

Is this and IDE based system?  If so do you have the drives running in DMA 
mode?

What are the results of "/sbin/hdparm /dev/hd(?)" (a,b,c,d ... which ever 
drive you are running the database on.)

The 2.2 linux kernel defaults to DMA off.  You can try to enable dma by 
issuing /sbin/hdparm -d1 /dev/hd(?)  You can also test the disk speed with 
/sbin/hdparm -tT /dev/hd(?).

In my experience enabling this feature can make a huge improvement in I/O 
intensive applications.  Other options can help also, but I find dma to have 
the largest impact.  I find linux almost unusable without it.


Re: problems with new vacuum (??)

От
Tom Lane
Дата:
Hannu Krosing <hannu@tm.ee> writes:
> Have you any ideas how to distinguish between interactive and
> non-interactive disk I/O coming from postgresql backends ?

I don't see how.  For one thing, the backend that originally dirtied
a buffer is not necessarily the one that writes it out.  Even assuming
that we could assign a useful priority to different I/O requests,
how do we tell the kernel about it?  There's no portable API for that
AFAIK.

One thing that would likely help a great deal is to have the WAL files
on a separate disk spindle, but since what I've got is a one-disk
system, I can't test that on this PC.
        regards, tom lane


Re: problems with new vacuum (??)

От
Bruce Momjian
Дата:
> In my experience enabling this feature can make a huge improvement in I/O 
> intensive applications.  Other options can help also, but I find dma to have 
> the largest impact.  I find linux almost unusable without it.

Oh, I should mention my BSD/OS data point is with one SCSI disk, soft
updates and tagged queuing enabled.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: problems with new vacuum (??)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Barry Lind <barry@xythos.com> writes:
> > But while this vacuum was running the rest of the system was performing 
> > very poorly.  Opperations that usually are subsecond, where taking 
> > minutes to complete.
> 
> Is this any different from the behavior of 7.1 vacuum?  Also, what
> platform are you on?
> 
> I've noticed on a Linux 2.4 box (RH 7.2, typical commodity-grade PC
> hardware) that vacuum, pgbench, or almost any I/O intensive operation
> drives interactive performance into the ground.  I have not had an
> opportunity to try to characterize the problem, but I suspect Linux's
> disk I/O scheduler is not bright enough to prioritize interactive
> operations.

Just as a data point, I have not seen pgbench dramatically affect
performance on BSD/OS.  Interactive sessions are just slightly slower
when then need to access the disk.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: problems with new vacuum (??)

От
Don Baccus
Дата:
Bruce Momjian wrote:

>>In my experience enabling this feature can make a huge improvement in I/O 
>>intensive applications.  Other options can help also, but I find dma to have 
>>the largest impact.  I find linux almost unusable without it.
>>
> 
> Oh, I should mention my BSD/OS data point is with one SCSI disk, soft
> updates and tagged queuing enabled.


If Tom's system is IDE-based and he's not explicitly enabled DMA then 
this alone would explain the difference you two are seeing, just as the 
poster above is implying.  I have one system with an older 15GB disk 
that causes a kernel panic if I try to enable DMA, and I see the kind of 
system performance issues described by Tom on that system.

On my main server downtown (SCSI) and my normal desktop (two IDE drives 
that do work properly with DMA enabled) things run much, much better 
when there's a lot of disk I/O going on.  These are all Linux systems, 
not BSD...


-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org



Re: problems with new vacuum (??)

От
Tom Lane
Дата:
Don Baccus <dhogaza@pacifier.com> writes:
> If Tom's system is IDE-based and he's not explicitly enabled DMA then 
> this alone would explain the difference you two are seeing,

It is IDE, but DMA is on:

[root@rh1 root]# hdparm -v /dev/hda

/dev/hda:multcount    = 16 (on)I/O support  =  0 (default 16-bit)unmaskirq    =  0 (off)using_dma    =  1
(on)keepsettings=  0 (off)nowerr       =  0 (off)readonly     =  0 (off)readahead    =  8 (on)geometry     =
9729/255/63,sectors = 156301488, start = 0
 

[root@rh1 root]# hdparm -i /dev/hda

/dev/hda:
Model=ST380021A, FwRev=3.10, SerialNo=3HV0CZ2LConfig={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5%
}RawCHS=16383/16/63,TrkSize=0, SectSize=0, ECCbytes=4BuffType=unknown, BuffSize=2048kB, MaxMultSect=16,
MultSect=16CurCHS=16383/16/63,CurSects=-66060037, LBA=yes, LBAsects=156301488IORDY=on/off, tPIO={min:240,w/IORDY:120},
tDMA={min:120,rec:120}PIOmodes: pio0 pio1 pio2 pio3 pio4DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4
*udma5AdvancedPM=noDriveSupports : Reserved : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5
 

This is an out-of-the-box RH 7.2 install (kernel 2.4.7-10) on recent
Dell hardware.  If anyone can suggest further tuning of the hdparm
settings, I'm all ears.  Don't know a darn thing about disk tuning
for Linux.
        regards, tom lane


Re: problems with new vacuum (??)

От
Hannu Krosing
Дата:

Tom Lane wrote:

>Hannu Krosing <hannu@tm.ee> writes:
>
>>Have you any ideas how to distinguish between interactive and
>>non-interactive disk I/O coming from postgresql backends ?
>>
>
>I don't see how.  For one thing, the backend that originally dirtied
>a buffer is not necessarily the one that writes it out.  Even assuming
>that we could assign a useful priority to different I/O requests,
>how do we tell the kernel about it?  There's no portable API for that
>AFAIK.
>
>One thing that would likely help a great deal is to have the WAL files
>on a separate disk spindle, but since what I've got is a one-disk
>system, I can't test that on this PC.
>
If you have enough memory you can put WAL files on a RAM disk for testing :)

It is totally to the countrary of their intended use, but could reveal 
something
interesting while testing

----------------
Hannu