Re: 24x7x365 high-volume ops ideas

Поиск
Список
Период
Сортировка
От Dann Corbit
Тема Re: 24x7x365 high-volume ops ideas
Дата
Msg-id D425483C2C5C9F49B5B7A41F89441547055634@postal.corporate.connx.com
обсуждение исходный текст
Ответ на 24x7x365 high-volume ops ideas  ("Ed L." <pgsql@bluepolka.net>)
Список pgsql-general
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Ed L.
> Sent: Wednesday, November 03, 2004 5:10 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] 24x7x365 high-volume ops ideas
>
>
>
> I have a few high-volume, fairly large clusters that I'm
> struggling to keep
> up 24x7x365.  I want to ask for advice from anyone with
> similar experience
> or hard-won wisdom.
>
> Generally these are clusters with 100-200 queries/second,
> maybe 10GB-30GB of
> data (always increasing), and maybe 10% writes.  A little
> regular routine
> downtime for maintenance would do wonders for these systems, but
> unfortunately, the requirement is 100% uptime all the time, and any
> downtime at all is a liability.  Here are some of the issues:
>
> 1)  Big tables.  When the tables grow large enough, it takes
> too long to
> vacuum them.  In some cases there's just too much data.  In
> other cases,
> it's dead space, but both reindex and vacuum full block
> production queries
> (a lesser version of downtime).  In the past, we have taken a
> PR hit for
> downtime to dump/reload (we've found it to be faster than
> vacuum full).
> Async replication helps with cluster moves from one server to
> another, but
> still don't have a low-to-zero downtime solution for regular maint.

This is a tough problem.  Related to those below, of course.

> 2)  Big tables, part 2.  Of course, customers want all data that ever
> existed online and quickly available via sub-second queries.
> I assume at
> some point this data is going to be too much for one table
> (how much is too
> much?).  This is a little vague, I know, but what sorts of segmenting
> strategies to folks employ to deal with data that cannot be
> retired but
> gets too expensive to vacuum, etc.

This is an opportunity for savings.  Create a history table that
contains everything more than about 2 years old.
For people who want access to everything create a view with union all
against current and history.
Once they see how fast the current data and how rarely they really need
the old stuff, they will shy away from using it.

E.g.
Table inventory has current data
Table inventory_hist has old data
View  inventory_all has "SELECT * FROM inventory UNION ALL SELECT * FROM
inventory_hist"

> 3)  Simple restarts for configuration changes (shared_buffers,
> max_connections, etc).  When we have to break client
> connections, we have
> to notify the customer and take a PR hit.  Maybe pgpool is a possible
> solution?

There's the time-tested method of throwing more hardware at it.  Would
an 8-CPU machine with 64 GB ram and Ultra320 stripped raid array help?
The hardware is always less expensive than the software and the data --
usually by orders of magnitude.  If a $25,000-$50,000 machine will save
one man-year of effort over its lifetime, then it is a splendid idea.

> Are these issues for Oracle, DB2, etc as well?
Yes, but not as pronounced.  For instance with SQL*Server you do "UPDATE
STATISTICS" to get the equivalent of a vacuum full.

В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Bohmer
Дата:
Сообщение: Re: PostgreSQL on Linux PC vs MacOS X
Следующее
От: Edmund Lian
Дата:
Сообщение: UTF-8 and =, LIKE problems