Обсуждение: Best practice - Vacuum. Replication suggestions and pg vs mysql question

Поиск
Список
Период
Сортировка

Best practice - Vacuum. Replication suggestions and pg vs mysql question

От
David B
Дата:
Environment. PG v8. (Opteron 2CPU. Raid 5 disks 1TB. 12GB Ram)
Environment would be one master feeding 3 slaves. Similar configs.
New transactions coming into master. Cust Service Reps using that box.
Analysis being done on slave boxes.


Hi Folks,

I'm looking at PG vs MySql for a high volume site.
Few 10's of millions inserts per day.
Also some deletes (1% - say 250,000 per day).
Also updates (perhaps 10% say 2.5m per day)
Lots of indexes on master table.

When I test vacuum it seems very slow.
I understand I need to do vacuums on a regular basis.
Two reasons right...1. space and 2. better performance.

#1
I'm thinking with that much activity I should do it nightly and not
wait for Saturday nights since the #rows affected will be very large
by that time. Any thoughts?


#2
How does vacuum work in a replication environment.
If I implement replication so that users can get to 'copy' for their
analysis I presume the same overhead will apply to replicated box?
Does the same lock on table on slave box apply?
Will delay hit other tables not being vacuumed...I'd suspect it would.
E.g. I vacuum main table and it takes 15minutes. Are inserts to other
tables delayed by the 15minutes the vacuum takes on slave?

#3
15minute lock is a long time.
I don't want to tell the CSR's they will have to put customers on hold
each night for 15mins if customer updates are the basis of their call.
Any typical workarounds for such a problem?
I can imagine writing to a temp table and applying the changes after
the lock is released but tnose changes would notbe visiable to other
parts of the app...and we don't have access to the source code to
include looking at both tables.


#4
Does Mysql have a similar issue. Does it use something like 'vacuum'.
And if so does it suffer the same delays. Any thoughts?


#5
Speaking of replication...I see lots of technologies I can use.
Slony appears very active group. Which can be a good thing or mean its
still a work in progress!!
Any pro's con's or suggestions on the most popular.


Thanks in advance.
I'll loop back with results of what I find as I get around to actually
doing some tests.

David

Re: Best practice - Vacuum. Replication suggestions and pg vs mysql question

От
Tom Lane
Дата:
David B <davidb999@gmail.com> writes:
> I'm looking at PG vs MySql for a high volume site.
> Few 10's of millions inserts per day.
> Also some deletes (1% - say 250,000 per day).
> Also updates (perhaps 10% say 2.5m per day)
> Lots of indexes on master table.

> When I test vacuum it seems very slow.
> I understand I need to do vacuums on a regular basis.

> I'm thinking with that much activity I should do it nightly

At least.  But you are missing the point of vacuum being "slow".  It's a
background activity and you needn't care how long it takes, as long as
it doesn't put too much load on the system.  In that light, you probably
want to play around with the vacuum cost delay settings to find
something that doesn't hurt response time too much while it's running.
See
http://www.postgresql.org/docs/8.0/static/runtime-config.html#RUNTIME-CONFIG-RESOURCE
section 16.4.3.4.

> How does vacuum work in a replication environment.

You still have to vacuum on every machine.

> 15minute lock is a long time.

There's no lock, unless you are using VACUUM FULL which you shouldn't.

> Does Mysql have a similar issue. Does it use something like 'vacuum'.

AFAIK they don't have vacuum, but mysql doesn't have a great reputation
for being fast in concurrent-write scenarios, so you're probably not
going to find it to be the winner for your problem.

> Slony appears very active group. Which can be a good thing or mean its
> still a work in progress!!

Slony-I is production grade now, Slony-II is just starting ...

            regards, tom lane

Re: Best practice - Vacuum. Replication suggestions and pg vs mysql question

От
Christopher Browne
Дата:
Martha Stewart called it a Good Thing when davidb999@gmail.com (David B) wrote:
> Environment. PG v8. (Opteron 2CPU. Raid 5 disks 1TB. 12GB Ram)
> Environment would be one master feeding 3 slaves. Similar configs.
> New transactions coming into master. Cust Service Reps using that box.
> Analysis being done on slave boxes.
>
>
> Hi Folks,
>
> I'm looking at PG vs MySql for a high volume site.
> Few 10's of millions inserts per day.
> Also some deletes (1% - say 250,000 per day).
> Also updates (perhaps 10% say 2.5m per day)
> Lots of indexes on master table.
>
> When I test vacuum it seems very slow.
> I understand I need to do vacuums on a regular basis.
> Two reasons right...1. space and 2. better performance.
>
> #1
> I'm thinking with that much activity I should do it nightly and not
> wait for Saturday nights since the #rows affected will be very large
> by that time. Any thoughts?

We have cron jobs that run every few minutes to vacuum certain tables
that are highly updated.

I'd certainly suggest vacuuming more than once a week, particularly in
view that this doesn't lock usual operations.

> #2
> How does vacuum work in a replication environment.
> If I implement replication so that users can get to 'copy' for their
> analysis I presume the same overhead will apply to replicated box?
> Does the same lock on table on slave box apply?

Yes, "slave" servers will need similar vacuum regimens.


> Will delay hit other tables not being vacuumed...I'd suspect it would.
> E.g. I vacuum main table and it takes 15minutes. Are inserts to other
> tables delayed by the 15minutes the vacuum takes on slave?

That's only the case if you're abusively using VACUUM FULL all the
time, which is unnecessary.

You normally only need to use VACUUM, which doesn't block normal
database operations.

> #3
> 15minute lock is a long time.
> I don't want to tell the CSR's they will have to put customers on hold
> each night for 15mins if customer updates are the basis of their call.
> Any typical workarounds for such a problem?

Um, we run systems 24x7, and VACUUMs _don't_ lead to any such
problems.

If you are having problems of "everything hanging up" for 15 minutes,
you're presumably doing a VACUUM FULL which is the WRONG thing to do.

> #4
> Does Mysql have a similar issue. Does it use something like 'vacuum'.
> And if so does it suffer the same delays. Any thoughts?

MySQL(tm) doesn't have similar functionality, so it doesn't have
similar issues.  It tends to suffer much worse if you hit it with a
lot of concurrent load, but that's a long story...

> #5
> Speaking of replication...I see lots of technologies I can use.
> Slony appears very active group. Which can be a good thing or mean its
> still a work in progress!!
> Any pro's con's or suggestions on the most popular.

Slony-I is certainly seeing active efforts; we're preparing to get a
1.1 release ready, which will be, in some ways, a fair bit better than
1.0.5, which people were already pretty happy with.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com').
http://linuxdatabases.info/info/linuxdistributions.html
Rules of the Evil Overlord  #142. "If I have children and subsequently
grandchildren, I will keep  my three-year-old granddaughter near me at
all times. When  the hero enters to  kill me, I will ask  him to first
explain to her  why it is necessary to kill  her beloved grandpa. When
the hero launches  into an explanation of morality  way over her head,
that will be  her cue to pull the  lever and send him into  the pit of
crocodiles. After  all, small children like crocodiles  almost as much
as Evil Overlords  and it's important to spend  quality time with your
grandkids. <http://www.eviloverlord.com/>

Re: Best practice - Vacuum. Replication suggestions and pg vs mysql

От
Ron Mayer
Дата:
Tom Lane wrote:
> David B <davidb999@gmail.com> writes:
>>15minute lock is a long time.
> There's no lock, unless you are using VACUUM FULL which you shouldn't.

Or, I believe, if he has any GIST indexes (such as tsearch or
postgis ones).  At least it seems normal vacuum locks GIST indexes
for quite some time here.

I ended up stopping using vacuum_cost_delay to minimize the
time those GIST indexes seemed locked.

    Ron

PS: If I'm right, I wonder if VACUUM or at least autovacuum
should automatically force vacuum_cost_delay to zero while
holding locks (i.e. while doing GIST indexes) to reduce the
time those locks are held.

Re: Best practice - Vacuum. Replication suggestions and pg vs mysql

От
Tom Lane
Дата:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Tom Lane wrote:
>> There's no lock, unless you are using VACUUM FULL which you shouldn't.

> Or, I believe, if he has any GIST indexes (such as tsearch or
> postgis ones).  At least it seems normal vacuum locks GIST indexes
> for quite some time here.

Good point --- gist (and rtree) indexes still don't have any support for
concurrent updates, so they have to be locked while VACUUM scans them.

> PS: If I'm right, I wonder if VACUUM or at least autovacuum
> should automatically force vacuum_cost_delay to zero while
> holding locks (i.e. while doing GIST indexes) to reduce the
> time those locks are held.

Perhaps.  VACUUM going at full tilt would cause a system-wide
performance impact, though, while the lock only affects processes
trying to use or update that index.  So you could argue it either
way.

The best solution would be to fix GIST ;-)

            regards, tom lane