Обсуждение: Partitions implementation with views

Поиск
Список
Период
Сортировка

Partitions implementation with views

От
Jonathan Gardner
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've seen a lot about partitions recently, and I had a "bright idea". I am
by no means a database expert, so perhaps this is utter nonsense.

A partition, as I understand it, contains only a select subset of a table.
Usually, it is data that is related to each other somehow. I guess an
example would be for an internet host who wants to provide a common
shopping cart functionality for all of its customers, but put their
specific data on a specific partition for ease of management and query
speed. They can't put the data into seperate databases because they also
need to examine the complete data set occasionally.

The common response is "Use partial indexes". But I imagine that they want
the ability to move partitions onto seperate OS partitions (hence the name,
"partition").

So here is a possible implementation.

Say we want to divide up the data in a table into N partitions.

Step 1: We create N identical tables. These tables are the "partitions".
They have the exact same columns in the exact same sequence as the
partitioned table.

Step 2: We will create a function that will tell us which partition a
specific row should belong in based on the data in that row.

Step 3: We create a view:

SELECT * FROM partition_1 UNION ALL SELECT * FROM partition_2 UNION ALL ...
SELECT * FROM partition_N;

Step 4: On that view, we create a rule for insert, update, and delete so
that the operation is applied to the appropriate partition, using the
function mentioned in Step 2.

Now that view is the partitioned table for all intents and purposes. The
partition tables are the partitions themselves.

Is this what they are looking for, or is it something completely different?

- --
Jonathan Gardner
jgardner@jonathangardner.net
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/yuSXWgwF3QvpWNwRAmblAJwKS0Lgk/wSC+AmH5fgX7yoicvfOACfYXXx
Hfk/9R84NCKJyAkuXCuG8Ak=
=Ifsm
-----END PGP SIGNATURE-----



Re: Partitions implementation with views

От
Greg Stark
Дата:
Jonathan Gardner <jgardner@jonathangardner.net> writes:

> Usually, it is data that is related to each other somehow. I guess an 
> example would be for an internet host who wants to provide a common 
> shopping cart functionality for all of its customers, but put their 
> specific data on a specific partition for ease of management and query 
> speed. They can't put the data into seperate databases because they also 
> need to examine the complete data set occasionally.

The most commonly cited example is date-based partitions. Separating financial
records by fiscal year for example. So for example the current year may be on
the super-fast raid 0+1 15k rpm SCSI disks, but the previous 7 years may be on
your slow but big raid5 farm of IDE drives. And at year-end you want to create
a new year, drop all the hundreds of gigabytes of data from the 7th year from
the archive quickly, and move the current year to the archive.

> The common response is "Use partial indexes". But I imagine that they want 
> the ability to move partitions onto seperate OS partitions (hence the name, 
> "partition").


Separate OS partitions is a reasonable use of partitioned tables, but the
biggest advantage is being able to drop and load partitions very quickly, and
without impacting performance at all. loading or dropping millions of records
becomes a simple DDL operation instead of a huge DML operation.

...

> Now that view is the partitioned table for all intents and purposes. The 
> partition tables are the partitions themselves.
> 
> Is this what they are looking for, or is it something completely different?

That's the idea, though the purpose of having it as a built-in feature is to
hide all the details you're describing from the user. The user can already do
all the above if they wanted to.

And there's more work to do:

1) The optimizer needs to know about the partitions to be able to check the  query to see if it needs all the
partitionsor only a small subset. Often  partitioned tables are used when most queries only need a single partition
whichdrastically affects the costs for plans.
 

2) You want to be able to manage indexes across all the partitions in a single  operation. Any index on the table where
theleading columns of the index  are the partition key can automatically create a separate index on every  table.
 

3) You want DDL commands to move around the partitions in various ways.  Creating new partitions, moving tables into
thepartitioned table and  moving partitions out into separate tables of their own. splitting a  partition into two
partitions,merging two into one, etc.
 

4) I always managed to avoid them, and they don't seem very useful to me, but  Oracle also supports "global indexes"
whichare indexes that span all the  partitions without having the partition key as the leading columns.
 


-- 
greg



Re: Partitions implementation with views

От
Hannu Krosing
Дата:
Greg Stark kirjutas E, 01.12.2003 kell 18:15:
> Separate OS partitions is a reasonable use of partitioned tables, but the
> biggest advantage is being able to drop and load partitions very quickly, and
> without impacting performance at all. loading or dropping millions of records
> becomes a simple DDL operation instead of a huge DML operation.

How does that mix with foreign key checks ?

------------------
Hannu



Re: Partitions implementation with views

От
Greg Stark
Дата:
Hannu Krosing <hannu@tm.ee> writes:

> Greg Stark kirjutas E, 01.12.2003 kell 18:15:
> > Separate OS partitions is a reasonable use of partitioned tables, but the
> > biggest advantage is being able to drop and load partitions very quickly, and
> > without impacting performance at all. loading or dropping millions of records
> > becomes a simple DDL operation instead of a huge DML operation.
> 
> How does that mix with foreign key checks ?

That's a good question. I don't know how it mixes in Oracle as the partitioned
tables were always the ones at the end of the chain of references. That is,
there were foreign key constraints on them but no other tables referenced
them. Perhaps that may be common as partitioning is useful on BIG tables and
that makes them likely to be on the "many" side of all the "one-to-many"
relationships.

Or perhaps one often has lots of tables partitioned on the same key (like
fiscal year) and design things so you never have references across years. Then
you would be able to disable constraints and drop the old year without risking
any broken references.

As I say I'm not sure, but I from what I'm reading now It seems they don't mix
at all well in Oracle. It looks like if you have any foreign key references
from any non-empty tables to your partitioned table then you're basically
barred from removing any partitions. I guess you're expected to disable the
constraints while you do the operation.

That does make a lot of sense if you think of a partitioned table as just a
bunch of syntactic sugar over a view with a big union of all the partition
tables. You can't expect the database to recheck everything whenever you want
to redefine the view.

Alternatively you can think of partitioned tables as a bunch of syntactic
sugar over inherited tables. In which case it's no coincidence that foreign
keys and inherited tables don't mix very well either.

I do think it would be possible to design something better than just ruling
them incompatible. Presumably you would want an index on the foreign key
target columns to make the constraints fast. Therefore presumably the
partition key is the leading columns of the foreign key target columns.
Therefore all you really need to verify the partition drop is safe is an index
on the partition key columns in the referencing table and you can do a simple
index lookup to see if any records with the same leading columns exist to
verify the foreign key reference.

So for example:

Table: invoice
--------------
invoice_FY integer
invoice_id integer
...
"invoice_pkey" primary key, btree (invoice_FY,invoice_id)


Table payment
-------------
payment_FY integer
payment_id integer
...
invoice_FY integer
invoice_id integer
...   "invoice_idx" btree (invoice_FY,invoice_id)   "$1" FOREIGN KEY (invoice_FY,invoice_id) REFERENCES
invoice(invoice_FY,invoice_id)ON DELETE CASCADE
 

In this situation when you drop a partition from invoice for FY 2002 only one
fast lookup to check for "EXISTS (select 1 WHERE invoice_FY = 2002)" would be
necessary.


-- 
greg