Re: database split

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: database split
Дата
Msg-id 5541699C.4070704@BlueTreble.com
обсуждение исходный текст
Ответ на database split  (Dave Owens <dave@teamunify.com>)
Ответы Re: database split  (Dave Owens <dave@teamunify.com>)
Список pgsql-general
On 4/29/15 6:04 PM, Dave Owens wrote:
>
> A. Text-processing a dump of the original database, filtering only the
> rows in which I am interested, while loading the dump into the new database.
>
> B. Copying the original database (CREATE DATABASE smalldb WITH TEMPLATE
> bigdb), then modifying the schema so that each FK constraint now
> includes ON DELETE CASCADE, then deleting rows from the few core tables
> while letting postgresql cascade the deletes down the hierarchy.  Then,
> of course, restoring the schema to its original state.

This would be quite slow, but will work.

> C. Generating the myriad SELECTs necessary to export only the required
> data from every table, then importing those results (either using dblink
> or COPY FROMs) into the new database that has been populated with schema
> from the original database.  Carefully ordering the imports to avoid
> missing FKs.

D. Combine A and B. FKs are added AFTER data is loaded in a SQL dump. So
after the COPY commands are done you could run relevant DELETES. Brute
force, but it'd do the job and be faster than cascade deletes.

E. A spin on C would be to put triggers on the tables to silently drop
data that you don't want.

All that said, my guess is you're doing this to support horizontal
scale-out, which means you'll probably need to do this more than once,
and it'd presumably be nice for you and your customers if this didn't
require downtime. I would look at having a way to create a partial
replica using londiste (or BDR if it'd support it). The trick there is
having a way to identify whether you want each individual row on a
replica. If you add some kind of cluster_id field to every table that
makes doing that filtering pretty easy; IIRC londiste supports that out
of the box.

The other option is having functions that would let you determine
whether you want a row. I know you could get londiste to do this, but
you'd have to implement some of the filtering in python (Skype actually
does, or at least did this).

One more thing to consider: You can scale PG to some serious database
sizes if you're willing to spend money on hardware. Given what kind of a
server you can get for $20k now-a-days, I'd seriously investigate that
if you haven't already.

BTW, there's also https://github.com/citusdata/pg_shard. I've never
really looked at it, but it might make all of this a lot easier.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: Partition Help
Следующее
От: "Takashi Ohnishi"
Дата:
Сообщение: Re: Pg_bulkload and speed