Обсуждение: Partitions implementation with views
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've seen a lot about partitions recently, and I had a "bright idea". I am by no means a database expert, so perhaps this is utter nonsense. A partition, as I understand it, contains only a select subset of a table. Usually, it is data that is related to each other somehow. I guess an example would be for an internet host who wants to provide a common shopping cart functionality for all of its customers, but put their specific data on a specific partition for ease of management and query speed. They can't put the data into seperate databases because they also need to examine the complete data set occasionally. The common response is "Use partial indexes". But I imagine that they want the ability to move partitions onto seperate OS partitions (hence the name, "partition"). So here is a possible implementation. Say we want to divide up the data in a table into N partitions. Step 1: We create N identical tables. These tables are the "partitions". They have the exact same columns in the exact same sequence as the partitioned table. Step 2: We will create a function that will tell us which partition a specific row should belong in based on the data in that row. Step 3: We create a view: SELECT * FROM partition_1 UNION ALL SELECT * FROM partition_2 UNION ALL ... SELECT * FROM partition_N; Step 4: On that view, we create a rule for insert, update, and delete so that the operation is applied to the appropriate partition, using the function mentioned in Step 2. Now that view is the partitioned table for all intents and purposes. The partition tables are the partitions themselves. Is this what they are looking for, or is it something completely different? - -- Jonathan Gardner jgardner@jonathangardner.net Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/yuSXWgwF3QvpWNwRAmblAJwKS0Lgk/wSC+AmH5fgX7yoicvfOACfYXXx Hfk/9R84NCKJyAkuXCuG8Ak= =Ifsm -----END PGP SIGNATURE-----
Jonathan Gardner <jgardner@jonathangardner.net> writes: > Usually, it is data that is related to each other somehow. I guess an > example would be for an internet host who wants to provide a common > shopping cart functionality for all of its customers, but put their > specific data on a specific partition for ease of management and query > speed. They can't put the data into seperate databases because they also > need to examine the complete data set occasionally. The most commonly cited example is date-based partitions. Separating financial records by fiscal year for example. So for example the current year may be on the super-fast raid 0+1 15k rpm SCSI disks, but the previous 7 years may be on your slow but big raid5 farm of IDE drives. And at year-end you want to create a new year, drop all the hundreds of gigabytes of data from the 7th year from the archive quickly, and move the current year to the archive. > The common response is "Use partial indexes". But I imagine that they want > the ability to move partitions onto seperate OS partitions (hence the name, > "partition"). Separate OS partitions is a reasonable use of partitioned tables, but the biggest advantage is being able to drop and load partitions very quickly, and without impacting performance at all. loading or dropping millions of records becomes a simple DDL operation instead of a huge DML operation. ... > Now that view is the partitioned table for all intents and purposes. The > partition tables are the partitions themselves. > > Is this what they are looking for, or is it something completely different? That's the idea, though the purpose of having it as a built-in feature is to hide all the details you're describing from the user. The user can already do all the above if they wanted to. And there's more work to do: 1) The optimizer needs to know about the partitions to be able to check the query to see if it needs all the partitionsor only a small subset. Often partitioned tables are used when most queries only need a single partition whichdrastically affects the costs for plans. 2) You want to be able to manage indexes across all the partitions in a single operation. Any index on the table where theleading columns of the index are the partition key can automatically create a separate index on every table. 3) You want DDL commands to move around the partitions in various ways. Creating new partitions, moving tables into thepartitioned table and moving partitions out into separate tables of their own. splitting a partition into two partitions,merging two into one, etc. 4) I always managed to avoid them, and they don't seem very useful to me, but Oracle also supports "global indexes" whichare indexes that span all the partitions without having the partition key as the leading columns. -- greg
Greg Stark kirjutas E, 01.12.2003 kell 18:15: > Separate OS partitions is a reasonable use of partitioned tables, but the > biggest advantage is being able to drop and load partitions very quickly, and > without impacting performance at all. loading or dropping millions of records > becomes a simple DDL operation instead of a huge DML operation. How does that mix with foreign key checks ? ------------------ Hannu
Hannu Krosing <hannu@tm.ee> writes: > Greg Stark kirjutas E, 01.12.2003 kell 18:15: > > Separate OS partitions is a reasonable use of partitioned tables, but the > > biggest advantage is being able to drop and load partitions very quickly, and > > without impacting performance at all. loading or dropping millions of records > > becomes a simple DDL operation instead of a huge DML operation. > > How does that mix with foreign key checks ? That's a good question. I don't know how it mixes in Oracle as the partitioned tables were always the ones at the end of the chain of references. That is, there were foreign key constraints on them but no other tables referenced them. Perhaps that may be common as partitioning is useful on BIG tables and that makes them likely to be on the "many" side of all the "one-to-many" relationships. Or perhaps one often has lots of tables partitioned on the same key (like fiscal year) and design things so you never have references across years. Then you would be able to disable constraints and drop the old year without risking any broken references. As I say I'm not sure, but I from what I'm reading now It seems they don't mix at all well in Oracle. It looks like if you have any foreign key references from any non-empty tables to your partitioned table then you're basically barred from removing any partitions. I guess you're expected to disable the constraints while you do the operation. That does make a lot of sense if you think of a partitioned table as just a bunch of syntactic sugar over a view with a big union of all the partition tables. You can't expect the database to recheck everything whenever you want to redefine the view. Alternatively you can think of partitioned tables as a bunch of syntactic sugar over inherited tables. In which case it's no coincidence that foreign keys and inherited tables don't mix very well either. I do think it would be possible to design something better than just ruling them incompatible. Presumably you would want an index on the foreign key target columns to make the constraints fast. Therefore presumably the partition key is the leading columns of the foreign key target columns. Therefore all you really need to verify the partition drop is safe is an index on the partition key columns in the referencing table and you can do a simple index lookup to see if any records with the same leading columns exist to verify the foreign key reference. So for example: Table: invoice -------------- invoice_FY integer invoice_id integer ... "invoice_pkey" primary key, btree (invoice_FY,invoice_id) Table payment ------------- payment_FY integer payment_id integer ... invoice_FY integer invoice_id integer ... "invoice_idx" btree (invoice_FY,invoice_id) "$1" FOREIGN KEY (invoice_FY,invoice_id) REFERENCES invoice(invoice_FY,invoice_id)ON DELETE CASCADE In this situation when you drop a partition from invoice for FY 2002 only one fast lookup to check for "EXISTS (select 1 WHERE invoice_FY = 2002)" would be necessary. -- greg