Обсуждение: Query plan and Inheritance. Weird behavior
I have a database that makes fairly extensive use of table Inheritance. Structure is one parent table and 5 child tables as follows: tbl_objects (parent table) -> tbl_viewers -> tbl_documents -> tbl_icons -> tbl_massemails -> tbl_formats I have two questions: First, if I create an index on the parent table will queries to the child tables use that index? Secondly, I tried to use explain to find out but I got very strange results. It appears to read all the child tables even when you specify only the parent table. In this case this appears to make the select do 6 queries instead of only 1. Obviously a huge performance hit. And none of them uses the index though the table only has 420 rows at the moment so that might be why its just doing a scan (though IMHO 'explain' should explain that it isn't using the available index and why). I can't say that I'm reading these results properly but here they are: "EXPLAIN select * from tbl_objects where id = 1;" Gives: NOTICE: QUERY PLAN: Result (cost=0.00..27.25 rows=6 width=138) -> Append (cost=0.00..27.25 rows=6 width=138) -> Seq Scan on tbl_objects (cost=0.00..12.24 rows=1 width=73) -> Seq Scan on tbl_viewers tbl_objects (cost=0.00..1.07 rows=1 width=83) -> Seq Scan on tbl_documents tbl_objects (cost=0.00..11.56 rows=1 width=78) -> Seq Scan on tbl_massemails tbl_objects (cost=0.00..0.00 rows=1 width=138) -> Seq Scan on tbl_formats tbl_objects (cost=0.00..1.12 rows=1 width=80) -> Seq Scan on tbl_icons tbl_objects (cost=0.00..1.25 rows=1 width=89) Can anyone tell me if these results are making any sense and why postgres is doing 6 reads when I only need one? John Lange
On 22 Jan 2003, John Lange wrote: > I have a database that makes fairly extensive use of table Inheritance. > > Structure is one parent table and 5 child tables as follows: > > tbl_objects (parent table) > -> tbl_viewers > -> tbl_documents > -> tbl_icons > -> tbl_massemails > -> tbl_formats > > I have two questions: > > First, if I create an index on the parent table will queries to the > child tables use that index? AFAIK no since indices aren't inherited. > Secondly, I tried to use explain to find out but I got very strange > results. It appears to read all the child tables even when you specify > only the parent table. In this case this appears to make the select do 6 > queries instead of only 1. Obviously a huge performance hit. And none of > them uses the index though the table only has 420 rows at the moment so > that might be why its just doing a scan (though IMHO 'explain' should > explain that it isn't using the available index and why). It seems reasonable to me since given the # of rows and the estimated row width the table is probably only like 5 or 6 pages. Reading the index is unlikely to make life much better given an index read, seek in heap file, read heap file page. > I can't say that I'm reading these results properly but here they are: > > "EXPLAIN select * from tbl_objects where id = 1;" This gets any rows in tbl_objects that have id=1 and any rows in any subtables that have id=1. Is that the intended effect?
> This gets any rows in tbl_objects that have id=1 and any rows in any > subtables that have id=1. Is that the intended effect? It is the intended result, but not the expected implementation. Doing more investigation I think I figured out why Postgres does what it does. Creating child tables by inheriting from another table doesn't really do what I consider to be 'true' inheritance, at least not in the way I expected as a programmer. Postgres seems to create "child" tables by first fully duplicating the parent table and then adding the new columns to it. It then links the tables internally some how so that a query on a parent table also queries the child tables. IHO this seems like inheritance by 'brute force' and a parent table that has many children will cause a significant performance hit. When I say "as a programmer" what I mean is I had expected it to be done entirely the opposite way. In other words, child tables would simply be linked internally to the parent table and a new table created which only contains the new columns. In this way the parent table would not need to know, nor would it care about child tables in any way (just like inheritance in most programming languages). If done this way a select on a parent table would only require the retrieval of a single row and a select on a child table would only require the retrieval of two rows (one in the child table and one in the parent table). I don't pretend to know the intricacies of Postgres performance but this is the way I'm interpreting the data from the explains. At this time, now that I (think I) understand how the inheritance is implemented I'm considering abandoning it in Postgres and solving the issue entirely pragmatically. I hoping someone on the list will tell me where I'm going wrong here or what wrong assumptions I'm making. John Lange On Wed, 2003-01-22 at 18:59, Stephan Szabo wrote: > > On 22 Jan 2003, John Lange wrote: > > > I have a database that makes fairly extensive use of table Inheritance. > > > > Structure is one parent table and 5 child tables as follows: > > > > tbl_objects (parent table) > > -> tbl_viewers > > -> tbl_documents > > -> tbl_icons > > -> tbl_massemails > > -> tbl_formats > > > > I have two questions: > > > > First, if I create an index on the parent table will queries to the > > child tables use that index? > > AFAIK no since indices aren't inherited. > > > Secondly, I tried to use explain to find out but I got very strange > > results. It appears to read all the child tables even when you specify > > only the parent table. In this case this appears to make the select do 6 > > queries instead of only 1. Obviously a huge performance hit. And none of > > them uses the index though the table only has 420 rows at the moment so > > that might be why its just doing a scan (though IMHO 'explain' should > > explain that it isn't using the available index and why). > > It seems reasonable to me since given the # of rows and the estimated > row width the table is probably only like 5 or 6 pages. Reading the index > is unlikely to make life much better given an index read, seek in heap > file, read heap file page. > > > I can't say that I'm reading these results properly but here they are: > > > > "EXPLAIN select * from tbl_objects where id = 1;" > > This gets any rows in tbl_objects that have id=1 and any rows in any > subtables that have id=1. Is that the intended effect? > >
On 22 Jan 2003, John Lange wrote: > Creating child tables by inheriting from another table doesn't really do > what I consider to be 'true' inheritance, at least not in the way I > expected as a programmer. > > Postgres seems to create "child" tables by first fully duplicating the > parent table and then adding the new columns to it. It then links the > tables internally some how so that a query on a parent table also > queries the child tables. That pretty much sums up my understanding of it. [snip] > In this way the parent table would not need to know, nor would it care > about child tables in any way (just like inheritance in most programming > languages). If done this way a select on a parent table would only > require the retrieval of a single row and a select on a child table > would only require the retrieval of two rows (one in the child table and > one in the parent table). As opposed to needing one row from a select on a child table and effectively a union all when selecting from the parent. There are up and down sides of both implementations, and I haven't played with it enough to speak meaningfully on it. > I don't pretend to know the intricacies of Postgres performance but this > is the way I'm interpreting the data from the explains. As a side note, for a better understanding of timings, explain analyze is much better than plain explain which only gives the plan and estimates.
> On 22 Jan 2003, John Lange wrote: >> In this way the parent table would not need to know, nor would it care >> about child tables in any way (just like inheritance in most programming >> languages). If done this way a select on a parent table would only >> require the retrieval of a single row and a select on a child table >> would only require the retrieval of two rows (one in the child table and >> one in the parent table). No, it'd require the retrieval of N rows: you're failing to think about multiple levels of inheritance or multi-parent inheritance, both of which are supported reasonably effectively by the current model. My guess is that this scheme would crash and burn just on locking considerations. (When you want to update a child row, what locks do you have to get in what order? With pieces of the row scattered through many tables, it'd be pretty messy.) You may care to look in the pghackers archives for prior discussions. The variant scheme that's sounded most interesting to me so far is to store *all* rows of an inheritance hierarchy in a single physical table. This'd require giving up multiple inheritance, but few people seem to use that, and the other benefits (like being able to enforce uniqueness constraints over the whole hierarchy with just a standard unique index) seem worth it. No one's stepped up to bat to do the legwork on the idea yet, though. One bit that looks pretty tricky is ALTER TABLE ADD COLUMN. regards, tom lane
On Thu, 2003-01-23 at 00:07, Tom Lane wrote: > > On 22 Jan 2003, John Lange wrote: > >> In this way the parent table would not need to know, nor would it care > >> about child tables in any way (just like inheritance in most programming > >> languages). If done this way a select on a parent table would only > >> require the retrieval of a single row and a select on a child table > >> would only require the retrieval of two rows (one in the child table and > >> one in the parent table). > > No, it'd require the retrieval of N rows: you're failing to think about > multiple levels of inheritance or multi-parent inheritance, both of > which are supported reasonably effectively by the current model. Lets not be too nit-picky here. In the case of multiple layers of inheritance you are still only selecting two rows (at a time), one child, one parent. However if the parent also has a parent, then the process repeats, once for every layer. This is entirely reasonable and efficient compared to the current model where a select on a parent table requires the same select to be executed on EVERY child table. If it's a large expensive JOIN of some kind then this is verging on un-workable. > My guess is that this scheme would crash and burn just on locking > considerations. (When you want to update a child row, what locks do you > have to get in what order? With pieces of the row scattered through > many tables, it'd be pretty messy.) You lock the parent on down to the last child. I'm not a database developer but that seems fairly straight forward? The choice between the schema I've suggested and the way it is currently implemented is a trade off between more efficient selects vs. more efficient updates. If you are selecting on the parent table more than updating then my idea is vastly more efficient. If you INSERT a lot then the current way is marginally better. With apologies to the developers, I don't feel the current implementation is really usable for the simple fact that expensive operations performed on the parent table causes them to be repeated for every child table. And, as an added penalty, indexes on parent tables are NOT inherited to the children so the child operations can be even more expensive. This solution is not that large and I've already got 6 child tables. It just so happens that I do a LOT of selects on the parent so I'm going to have to make a decision on where to go from here. Solving this programmatically is not really that hard but I've gone a ways down this path now so I'm not anxious to redo the entire database schema since we do have customers already using this. > You may care to look in the pghackers archives for prior discussions. I will, thanks. > The variant scheme that's sounded most interesting to me so far is to > store *all* rows of an inheritance hierarchy in a single physical table. Unless I'm not understanding I don't think that works. In my case for example, a single parent has 4-5 children so the only columns they have in common are the ones in the parent. Combining them all into a single table means a big de-normalized table (loads of empty columns). If you are going to go this route then you might as well just do it. It doesn't need to be implemented on the DBMS. Regards, John Lange > This'd require giving up multiple inheritance, but few people seem to > use that, and the other benefits (like being able to enforce uniqueness > constraints over the whole hierarchy with just a standard unique index) > seem worth it. No one's stepped up to bat to do the legwork on the idea > yet, though. One bit that looks pretty tricky is ALTER TABLE ADD > COLUMN. > > regards, tom lane
> I don't see how performance would be significantly better if you stored > the common columns of all rows (parent and children) in the parent > table, in contrast with how it is done now, storing entire rows of child > tables in the child table and omitting them from the parent table. Well there are a couple of points. Firstly, from the simple standpoint of database normalization you shouldn't have tables that have the same columns. The way it is implemented, child tables are copies of parent tables. But more importantly it is bad for performance because selecting from a parent table causes the same select to be done on all the child tables. In my case selecting from the parent causes six selects to be done (one for every child table). I would have assumed that child tables only contained the new columns unique to it, not duplicates of the columns already in the parent table. An insert to a child table would actually cause two inserts to be done (assuming only one level of inheritance), one to the parent, and then one to the child. Therefore, selects from the parent table would only require a single select (because the common data is all stored in the parent table). Selects to a child would require two selects to get the entire row (one to the parent, one to the child). Similar to a view. As I said previously, performance would depend on what operation you were mostly doing. I think I have more or less covered this in my previous postings. John Lange On Tue, 2003-01-28 at 17:52, Andras Kadinger wrote: > I see. > > I don't see how performance would be significantly better if you stored > the common columns of all rows (parent and children) in the parent > table, in contrast with how it is done now, storing entire rows of child > tables in the child table and omitting them from the parent table. > > Hmm, reviewing your posts to pgsql-performance, I must admit I cannot > really see what you feel you are losing performance-wise. > > As the discussion on pgsql-performance seems to have died off, would you > be willing to explain to me? > > Regards, > Andras > > John Lange wrote: > > > > No, the keyword ONLY will limit selects to that table ONLY. I need to > > return the rows which are common to all tables. Postgres is doing the > > work in the correct way, however, the issue is the underlaying design > > which is terribly inefficient requiring a separate table scan for every > > child table. > > > > Thanks for the suggestion. > > > > John Lange > > > > On Fri, 2003-01-24 at 14:30, Andras Kadinger wrote: > > > Hi John, > > > > > > Isn't the keyword ONLY is what you are after? > > > > > > "EXPLAIN select * from tbl_objects where id = 1;" - this will select > > > from table tbl_objects and all it's descendant tables. > > > > > > "EXPLAIN select * from tbl_objects ONLY where id = 1;" - this will > > > select from table tbl_objects only. > > > > > > Regards, > > > Andras Kadinger
John Lange <lists@darkcore.net> writes: > Firstly, from the simple standpoint of database normalization you > shouldn't have tables that have the same columns. The way it is > implemented, child tables are copies of parent tables. There is no copied data though. Or are you saying that if any table in the database has, say, a timestamp column, then it's a failure of normalization for any other one to have a timestamp column? Don't think I buy that. > But more importantly it is bad for performance because selecting from a > parent table causes the same select to be done on all the child tables. So? The same amount of data gets scanned either way. To the extent that the planner fails to generate an optimal plan in such cases, we have a performance problem --- but that's just an implementation shortcoming, not a fundamental limitation AFAICS. The only real disadvantage I can see to the current storage scheme is that we can't easily make an index that covers both a parent and all its children; the index would have to include a table pointer as well as a row pointer. This creates problems for foreign keys and unique constraints. But there is more than one way to attack that. regards, tom lane
John Lange wrote: > > > I don't see how performance would be significantly better if you stored > > the common columns of all rows (parent and children) in the parent > > table, in contrast with how it is done now, storing entire rows of child > > tables in the child table and omitting them from the parent table. > > Well there are a couple of points. > > Firstly, from the simple standpoint of database normalization you > shouldn't have tables that have the same columns. The way it is > implemented, child tables are copies of parent tables. As Tom pointed out, only the schema is copied, but not the data. This has the following advantages: - if you select from child tables, PostgreSQL will only have to scan rows that belong to that child (and further down), and not all rows in all tables of the inheritance hierarchy; so if you have 100 million rows in the whole hierarchy, but only have say 1 million in the child you are currently interested in, you only have to scan those 1 million rows, and not the whole 100 million. - all columns of rows are stored together, so to read a row only one disk access is needed (your way it would probably need roughly one random disk access per each inheritance level upwards, both for reads/selects and writes/inserts/updates; with a sizable inheritance hierarchy this would be a considerable performance hit) - it doesn't really cost much in terms of disk space, only some bookkeeping information is needed I don't think inheritance really fits into 'database normalization' itself, but still there are cases where it is more convenient/efficient than with traditional database normalization, where you would have to either go create completely separate tables for each type (and do UNIONs of SELECTs if you are interested in more than one child only), or what's even more cumbersome, create a table with common columns ('parent' here) and then go create children and children of children that each link upwards to their respective parents through some kind of key: in a select, you would have to explicitly specify all tables upwards the inheritance hierarchy, and specify the respective joins for them. So I think whether you should choose more traditional database normalization or use inheritance depends on what you want to do. > But more importantly it is bad for performance because selecting from a > parent table causes the same select to be done on all the child tables. > In my case selecting from the parent causes six selects to be done (one > for every child table). 'causes the same select to be done on all the child tables' - I don't agree with that, and I hope this is where the misunderstanding lies. Consider this: CREATE TABLE parent ( id integer NOT NULL, text text); CREATE TABLE child1 ( child1field text) INHERITS (parent); CREATE TABLE child2 ( child2field text) INHERITS (parent); CREATE TABLE child3 ( child3field text) INHERITS (parent); CREATE TABLE child4 ( child4field text) INHERITS (parent); CREATE TABLE othertable ( id integer NOT NULL, othertext text); ALTER TABLE ONLY parent ADD CONSTRAINT parent_pkey PRIMARY KEY (id); ALTER TABLE ONLY child1 ADD CONSTRAINT child1_pkey PRIMARY KEY (id); ALTER TABLE ONLY child2 ADD CONSTRAINT child2_pkey PRIMARY KEY (id); ALTER TABLE ONLY child3 ADD CONSTRAINT child3_pkey PRIMARY KEY (id); ALTER TABLE ONLY child4 ADD CONSTRAINT child4_pkey PRIMARY KEY (id); ALTER TABLE ONLY othertable ADD CONSTRAINT othertable_pkey PRIMARY KEY (id); Then I filled all tables with 10000 rows of synthetic data and ANALYZEd just to make sure the optimizer considers indexes to be valuable. First I tried this: johnlange=# explain select * from parent where id=13; QUERY PLAN ---------------------------------------------------------------------------------------------- Result (cost=0.00..15.07 rows=5 width=36) -> Append (cost=0.00..15.07 rows=5 width=36) -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child1_pkey on child1 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child2_pkey on child2 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child3_pkey on child3 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child4_pkey on child4 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) (12 rows) The planner has rightly chosen to use indexes, and as a result the query will be pretty fast. Also, at first sight this might look like the multiple selects you mention, but actually it isn't; here's another example to show that: inh=# explain select * from parent natural join othertable where parent.id=13; QUERY PLAN ---------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..30.20 rows=5 width=72) -> Append (cost=0.00..15.07 rows=5 width=36) -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child1_pkey on child1 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child2_pkey on child2 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child3_pkey on child3 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using child4_pkey on child4 parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using othertable_pkey on othertable (cost=0.00..3.01 rows=1 width=36) Index Cond: ("outer".id = othertable.id) (14 rows) As you can see, the planner decided to use the indexes on parent and children here too, retrieved and then collated the resulting rows first and only then performed the join against othertable. In other words, it is not peforming 5 separate selects with their respective joins; it collects all qualifying rows first from the inheritance hierarchy, and only then performs the join; so the extra cost compared to the non-inheriting case is pretty much only the added cost of consulting five indexes instead of just one - unless you have inheritance hierarchies consisting of several dozen tables or more (and even then) I don't think this added cost would be significant. > This is entirely reasonable and efficient compared to the current model > where a select on a parent table requires the same select to be executed > on EVERY child table. If it's a large expensive JOIN of some kind then > this is verging on un-workable. Please show us a join that you would like to use and let us see how well the planner handles it. Regards, Andras PS (John, don't look here :) ): I have found some queries with plans that are less efficient than I think they could be. Changing the where clause in the above query to refer to othertable gives: johnlange=# explain select * from parent natural join othertable where othertable.id=13; QUERY PLAN ----------------------------------------------------------------------------------------------- Hash Join (cost=3.02..978.08 rows=5 width=72) Hash Cond: ("outer".id = "inner".id) -> Append (cost=0.00..725.00 rows=50000 width=36) -> Seq Scan on parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child1 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child2 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child3 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child4 parent (cost=0.00..145.00 rows=10000 width=36) -> Hash (cost=3.01..3.01 rows=1 width=36) -> Index Scan using othertable_pkey on othertable (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) (11 rows) While: johnlange=# explain select * from ONLY parent natural join othertable where othertable.id=13; QUERY PLAN ----------------------------------------------------------------------------------------- Nested Loop (cost=0.00..6.04 rows=1 width=72) -> Index Scan using othertable_pkey on othertable (cost=0.00..3.01 rows=1 width=36) Index Cond: (id = 13) -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (parent.id = "outer".id) (5 rows) Similarly, as a somewhat more real-life example: johnlange=# explain select * from parent natural join othertable where othertable.othertext='apple'; QUERY PLAN -------------------------------------------------------------------------------------------------- Hash Join (cost=131.37..1234.50 rows=250 width=72) Hash Cond: ("outer".id = "inner".id) -> Append (cost=0.00..725.00 rows=50000 width=36) -> Seq Scan on parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child1 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child2 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child3 parent (cost=0.00..145.00 rows=10000 width=36) -> Seq Scan on child4 parent (cost=0.00..145.00 rows=10000 width=36) -> Hash (cost=131.25..131.25 rows=50 width=36) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=36) Index Cond: (othertext = 'alma'::text) (11 rows) What's more strange, that it still does it with enable_seqscan set to off: johnlange=# explain select * from parent natural join othertable where othertable.othertext='apple'; QUERY PLAN -------------------------------------------------------------------------------------------------- Hash Join (cost=100000131.37..500001234.50 rows=250 width=72) Hash Cond: ("outer".id = "inner".id) -> Append (cost=100000000.00..500000725.00 rows=50000 width=36) -> Seq Scan on parent (cost=100000000.00..100000145.00 rows=10000 width=36) -> Seq Scan on child1 parent (cost=100000000.00..100000145.00 rows=10000 width=36) -> Seq Scan on child2 parent (cost=100000000.00..100000145.00 rows=10000 width=36) -> Seq Scan on child3 parent (cost=100000000.00..100000145.00 rows=10000 width=36) -> Seq Scan on child4 parent (cost=100000000.00..100000145.00 rows=10000 width=36) -> Hash (cost=131.25..131.25 rows=50 width=36) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=36) Index Cond: (othertext = 'apple'::text) (11 rows) While: johnlange=# explain select * from ONLY parent natural join othertable where othertable.othertext='apple'; QUERY PLAN -------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..282.55 rows=50 width=72) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=36) Index Cond: (othertext = 'apple'::text) -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 width=36) Index Cond: (parent.id = "outer".id) (5 rows) If I try to make it more efficient and get rid of the seq scans by pushing the condition into a subselect, I get an even more interesting plan: johnlange=# explain select * from parent where id in (select id from othertable where othertext='alma'); QUERY PLAN --------------------------------------------------------------------------------------------------------------- Result (cost=0.00..6563171.43 rows=25000 width=36) -> Append (cost=0.00..6563171.43 rows=25000 width=36) -> Seq Scan on parent (cost=0.00..1312634.29 rows=5000 width=36) Filter: (subplan) SubPlan -> Materialize (cost=131.25..131.25 rows=50 width=4) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=4) Index Cond: (othertext = 'alma'::text) -> Seq Scan on child1 parent (cost=0.00..1312634.29 rows=5000 width=36) Filter: (subplan) SubPlan -> Materialize (cost=131.25..131.25 rows=50 width=4) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=4) Index Cond: (othertext = 'alma'::text) -> Seq Scan on child2 parent (cost=0.00..1312634.29 rows=5000 width=36) Filter: (subplan) SubPlan -> Materialize (cost=131.25..131.25 rows=50 width=4) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=4) Index Cond: (othertext = 'alma'::text) -> Seq Scan on child3 parent (cost=0.00..1312634.29 rows=5000 width=36) Filter: (subplan) SubPlan -> Materialize (cost=131.25..131.25 rows=50 width=4) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=4) Index Cond: (othertext = 'alma'::text) -> Seq Scan on child4 parent (cost=0.00..1312634.29 rows=5000 width=36) Filter: (subplan) SubPlan -> Materialize (cost=131.25..131.25 rows=50 width=4) -> Index Scan using othertable_text on othertable (cost=0.00..131.25 rows=50 width=4) Index Cond: (othertext = 'alma'::text) (32 rows) johnlange=# select version(); version -------------------------------------------------------------------------------------------------------- PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.1 (Mandrake Linux 9.1 3.2.1-2mdk) (1 row)
On Wed, 2003-01-29 at 01:52, Andras Kadinger wrote: > John Lange wrote: > > > > > I don't see how performance would be significantly better if you stored > > > the common columns of all rows (parent and children) in the parent > > > table, in contrast with how it is done now, storing entire rows of child > > > tables in the child table and omitting them from the parent table. > > > > Well there are a couple of points. > > > > Firstly, from the simple standpoint of database normalization you > > shouldn't have tables that have the same columns. The way it is > > implemented, child tables are copies of parent tables. > > As Tom pointed out, only the schema is copied, but not the data. I guess you are right, strictly speaking this isn't a violation of normalization since no data is duplicated. > This has the following advantages: > - if you select from child tables, PostgreSQL will only have to scan > rows that belong to that child (and further down), and not all rows in > all tables of the inheritance hierarchy; so if you have 100 million rows > in the whole hierarchy, but only have say 1 million in the child you are > currently interested in, you only have to scan those 1 million rows, and > not the whole 100 million. > - all columns of rows are stored together, so to read a row only one > disk access is needed (your way it would probably need roughly one > random disk access per each inheritance level upwards, both for > reads/selects and writes/inserts/updates; with a sizable inheritance > hierarchy this would be a considerable performance hit) > - it doesn't really cost much in terms of disk space, only some > bookkeeping information is needed > > I don't think inheritance really fits into 'database normalization' > itself, but still there are cases where it is more convenient/efficient > than with traditional database normalization, where you would have to > either go create completely separate tables for each type (and do UNIONs > of SELECTs if you are interested in more than one child only), or what's > even more cumbersome, create a table with common columns ('parent' here) > and then go create children and children of children that each link > upwards to their respective parents through some kind of key: in a > select, you would have to explicitly specify all tables upwards the > inheritance hierarchy, and specify the respective joins for them. > > So I think whether you should choose more traditional database > normalization or use inheritance depends on what you want to do. > > > But more importantly it is bad for performance because selecting from a > > parent table causes the same select to be done on all the child tables. > > In my case selecting from the parent causes six selects to be done (one > > for every child table). > > 'causes the same select to be done on all the child tables' - I don't > agree with that, and I hope this is where the misunderstanding lies. > > Consider this: > > CREATE TABLE parent ( id integer NOT NULL, text text); > CREATE TABLE child1 ( child1field text) INHERITS (parent); > CREATE TABLE child2 ( child2field text) INHERITS (parent); > CREATE TABLE child3 ( child3field text) INHERITS (parent); > CREATE TABLE child4 ( child4field text) INHERITS (parent); > > CREATE TABLE othertable ( id integer NOT NULL, othertext text); > > ALTER TABLE ONLY parent ADD CONSTRAINT parent_pkey PRIMARY KEY (id); > ALTER TABLE ONLY child1 ADD CONSTRAINT child1_pkey PRIMARY KEY (id); > ALTER TABLE ONLY child2 ADD CONSTRAINT child2_pkey PRIMARY KEY (id); > ALTER TABLE ONLY child3 ADD CONSTRAINT child3_pkey PRIMARY KEY (id); > ALTER TABLE ONLY child4 ADD CONSTRAINT child4_pkey PRIMARY KEY (id); > > ALTER TABLE ONLY othertable ADD CONSTRAINT othertable_pkey PRIMARY KEY > (id); > > Then I filled all tables with 10000 rows of synthetic data and ANALYZEd > just to make sure the optimizer considers indexes to be valuable. > > First I tried this: > > johnlange=# explain select * from parent where id=13; > QUERY > PLAN > ---------------------------------------------------------------------------------------------- > Result (cost=0.00..15.07 rows=5 width=36) > -> Append (cost=0.00..15.07 rows=5 width=36) > -> Index Scan using parent_pkey on parent (cost=0.00..3.01 > rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child1_pkey on child1 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child2_pkey on child2 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child3_pkey on child3 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child4_pkey on child4 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > (12 rows) > > The planner has rightly chosen to use indexes, and as a result the query > will be pretty fast. > > Also, at first sight this might look like the multiple selects you > mention, but actually it isn't; here's another example to show that: > > inh=# explain select * from parent natural join othertable where > parent.id=13; > QUERY > PLAN > ---------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..30.20 rows=5 width=72) > -> Append (cost=0.00..15.07 rows=5 width=36) > -> Index Scan using parent_pkey on parent (cost=0.00..3.01 > rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child1_pkey on child1 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child2_pkey on child2 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child3_pkey on child3 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using child4_pkey on child4 parent > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using othertable_pkey on othertable (cost=0.00..3.01 > rows=1 width=36) > Index Cond: ("outer".id = othertable.id) > (14 rows) > > As you can see, the planner decided to use the indexes on parent and > children here too, retrieved and then collated the resulting rows first > and only then performed the join against othertable. > > In other words, it is not peforming 5 separate selects with their > respective joins; it collects all qualifying rows first from the > inheritance hierarchy, and only then performs the join; so the extra > cost compared to the non-inheriting case is pretty much only the added > cost of consulting five indexes instead of just one - unless you have > inheritance hierarchies consisting of several dozen tables or more (and > even then) I don't think this added cost would be significant. > > > This is entirely reasonable and efficient compared to the current model > > where a select on a parent table requires the same select to be executed > > on EVERY child table. If it's a large expensive JOIN of some kind then > > this is verging on un-workable. > > Please show us a join that you would like to use and let us see how well > the planner handles it. Ok, your reply here is very informative. Firstly, I can see from your example that I likely don't have my keys and constraints implemented properly. However, the issue of indexes is not necessarily that relevant since you may not be selecting rows based on columns that have indexes. So the issue of indexes aside, I think some of the misunderstanding is related to my assumption that the appended operations are relatively more expensive than scanning the same number of rows in a single select. Here is the way it looks on my system when I select a single object. db_drs0001=> explain select * from tbl_objects where id = 1; NOTICE: QUERY PLAN: Result (cost=0.00..153.70 rows=6 width=111) -> Append (cost=0.00..153.70 rows=6 width=111) -> Seq Scan on tbl_objects (cost=0.00..144.35 rows=1 width=107) -> Seq Scan on tbl_viewers tbl_objects (cost=0.00..1.06 rows=1 width=97) -> Seq Scan on tbl_documents tbl_objects (cost=0.00..4.91 rows=1 width=111) -> Seq Scan on tbl_formats tbl_objects (cost=0.00..1.11 rows=1 width=100) -> Seq Scan on tbl_massemails tbl_objects (cost=0.00..1.01 rows=1 width=90) -> Seq Scan on tbl_icons tbl_objects (cost=0.00..1.25 rows=1 width=110) db_drs0001=> select version(); version --------------------------------------------------------------- PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3 (1 row) The only question here is, if a select requires the scanning of all rows to return a result set, is it dramatically less efficient to have it scanning 100,000 rows spread over 6 tables or to scan 100,000 rows in a single table? (At the moment I have no where near that amount of data. Side question, what technique do you use to generate data to fill your tables for testing?) I'm now starting to see that it likely isn't that much different either way so the benefits of the way it's implemented probably out weigh the negatives. Your end up scanning the same number of rows either way. On the topic of proper indexes, if you would indulge me, can you show me where I have gone wrong in that regard? My biggest point of confusion here is with regards to the sequences that are used in the parent table. Here is the schema as produced by pg_dump. The original create used the keyword "serial" or "bigserial" as the case may be. I've edited some of the columns out just to keep the example shorter: CREATE SEQUENCE "tbl_objects_id_seq" start 1 increment 1 maxvalue 9223372036854775807 minvalue 1 cache 1; CREATE TABLE "tbl_objects" ( "id" bigint DEFAULT nextval('"tbl_objects_id_seq"'::text) NOT NULL, "name" text DEFAULT '' NOT NULL, "description" text DEFAULT '' NOT NULL, "status" smallint DEFAULT '1' NOT NULL, "class" text ); CREATE TABLE "tbl_viewers" ( "exec" text DEFAULT '' NOT NULL ) INHERITS ("tbl_objects"); CREATE TABLE "tbl_documents" ( "filename" text DEFAULT '' NOT NULL ) INHERITS ("tbl_objects"); CREATE TABLE "tbl_massemails" ( "from" text DEFAULT '' NOT NULL, "subject" text DEFAULT '' NOT NULL, "message" text DEFAULT '' NOT NULL ) INHERITS ("tbl_objects"); CREATE TABLE "tbl_icons" ( "format_id" bigint DEFAULT '0' NOT NULL ) INHERITS ("tbl_documents"); CREATE TABLE "tbl_formats" ( "viewer_id" bigint DEFAULT '0' NOT NULL, "extension" text DEFAULT '' NOT NULL, "contenttype" text DEFAULT '' NOT NULL, "upload_class" text ) INHERITS ("tbl_objects"); CREATE UNIQUE INDEX tbl_objects_id_key ON tbl_objects USING btree (id); Thanks very much for taking the time to look into this with me. It has been most informative. John Lange > > Regards, > Andras > > PS (John, don't look here :) ): I have found some queries with plans > that are less efficient than I think they could be. > > Changing the where clause in the above query to refer to othertable > gives: > > johnlange=# explain select * from parent natural join othertable where > othertable.id=13; > QUERY > PLAN > ----------------------------------------------------------------------------------------------- > Hash Join (cost=3.02..978.08 rows=5 width=72) > Hash Cond: ("outer".id = "inner".id) > -> Append (cost=0.00..725.00 rows=50000 width=36) > -> Seq Scan on parent (cost=0.00..145.00 rows=10000 width=36) > -> Seq Scan on child1 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child2 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child3 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child4 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Hash (cost=3.01..3.01 rows=1 width=36) > -> Index Scan using othertable_pkey on othertable > (cost=0.00..3.01 rows=1 width=36) > Index Cond: (id = 13) > (11 rows) > > While: > > johnlange=# explain select * from ONLY parent natural join othertable > where othertable.id=13; > QUERY > PLAN > ----------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..6.04 rows=1 width=72) > -> Index Scan using othertable_pkey on othertable (cost=0.00..3.01 > rows=1 width=36) > Index Cond: (id = 13) > -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 > width=36) > Index Cond: (parent.id = "outer".id) > (5 rows) > > Similarly, as a somewhat more real-life example: > > johnlange=# explain select * from parent natural join othertable where > othertable.othertext='apple'; > QUERY > PLAN > -------------------------------------------------------------------------------------------------- > Hash Join (cost=131.37..1234.50 rows=250 width=72) > Hash Cond: ("outer".id = "inner".id) > -> Append (cost=0.00..725.00 rows=50000 width=36) > -> Seq Scan on parent (cost=0.00..145.00 rows=10000 width=36) > -> Seq Scan on child1 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child2 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child3 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Seq Scan on child4 parent (cost=0.00..145.00 rows=10000 > width=36) > -> Hash (cost=131.25..131.25 rows=50 width=36) > -> Index Scan using othertable_text on othertable > (cost=0.00..131.25 rows=50 width=36) > Index Cond: (othertext = 'alma'::text) > (11 rows) > > What's more strange, that it still does it with enable_seqscan set to > off: > > johnlange=# explain select * from parent natural join othertable where > othertable.othertext='apple'; > QUERY > PLAN > -------------------------------------------------------------------------------------------------- > Hash Join (cost=100000131.37..500001234.50 rows=250 width=72) > Hash Cond: ("outer".id = "inner".id) > -> Append (cost=100000000.00..500000725.00 rows=50000 width=36) > -> Seq Scan on parent (cost=100000000.00..100000145.00 > rows=10000 width=36) > -> Seq Scan on child1 parent (cost=100000000.00..100000145.00 > rows=10000 width=36) > -> Seq Scan on child2 parent (cost=100000000.00..100000145.00 > rows=10000 width=36) > -> Seq Scan on child3 parent (cost=100000000.00..100000145.00 > rows=10000 width=36) > -> Seq Scan on child4 parent (cost=100000000.00..100000145.00 > rows=10000 width=36) > -> Hash (cost=131.25..131.25 rows=50 width=36) > -> Index Scan using othertable_text on othertable > (cost=0.00..131.25 rows=50 width=36) > Index Cond: (othertext = 'apple'::text) > (11 rows) > > While: > > johnlange=# explain select * from ONLY parent natural join othertable > where othertable.othertext='apple'; > QUERY > PLAN > -------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..282.55 rows=50 width=72) > -> Index Scan using othertable_text on othertable > (cost=0.00..131.25 rows=50 width=36) > Index Cond: (othertext = 'apple'::text) > -> Index Scan using parent_pkey on parent (cost=0.00..3.01 rows=1 > width=36) > Index Cond: (parent.id = "outer".id) > (5 rows) > > If I try to make it more efficient and get rid of the seq scans by > pushing the condition into a subselect, I get an even more interesting > plan: > > johnlange=# explain select * from parent where id in (select id from > othertable where othertext='alma'); > QUERY > PLAN > --------------------------------------------------------------------------------------------------------------- > Result (cost=0.00..6563171.43 rows=25000 width=36) > -> Append (cost=0.00..6563171.43 rows=25000 width=36) > -> Seq Scan on parent (cost=0.00..1312634.29 rows=5000 > width=36) > Filter: (subplan) > SubPlan > -> Materialize (cost=131.25..131.25 rows=50 width=4) > -> Index Scan using othertable_text on > othertable (cost=0.00..131.25 rows=50 width=4) > Index Cond: (othertext = 'alma'::text) > -> Seq Scan on child1 parent (cost=0.00..1312634.29 rows=5000 > width=36) > Filter: (subplan) > SubPlan > -> Materialize (cost=131.25..131.25 rows=50 width=4) > -> Index Scan using othertable_text on > othertable (cost=0.00..131.25 rows=50 width=4) > Index Cond: (othertext = 'alma'::text) > -> Seq Scan on child2 parent (cost=0.00..1312634.29 rows=5000 > width=36) > Filter: (subplan) > SubPlan > -> Materialize (cost=131.25..131.25 rows=50 width=4) > -> Index Scan using othertable_text on > othertable (cost=0.00..131.25 rows=50 width=4) > Index Cond: (othertext = 'alma'::text) > -> Seq Scan on child3 parent (cost=0.00..1312634.29 rows=5000 > width=36) > Filter: (subplan) > SubPlan > -> Materialize (cost=131.25..131.25 rows=50 width=4) > -> Index Scan using othertable_text on > othertable (cost=0.00..131.25 rows=50 width=4) > Index Cond: (othertext = 'alma'::text) > -> Seq Scan on child4 parent (cost=0.00..1312634.29 rows=5000 > width=36) > Filter: (subplan) > SubPlan > -> Materialize (cost=131.25..131.25 rows=50 width=4) > -> Index Scan using othertable_text on > othertable (cost=0.00..131.25 rows=50 width=4) > Index Cond: (othertext = 'alma'::text) > (32 rows) > > johnlange=# select version(); > > version > -------------------------------------------------------------------------------------------------------- > PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.1 > (Mandrake Linux 9.1 3.2.1-2mdk) > (1 row)
John Lange wrote: > > On Wed, 2003-01-29 at 01:52, Andras Kadinger wrote: > > As Tom pointed out, only the schema is copied, but not the data. > > I guess you are right, strictly speaking this isn't a violation of > normalization since no data is duplicated. [...] > Ok, your reply here is very informative. Firstly, I can see from your > example that I likely don't have my keys and constraints implemented > properly. > > However, the issue of indexes is not necessarily that relevant since you > may not be selecting rows based on columns that have indexes. Granted, now I see that was not strictly related to your point, I just wanted to avoid most avoidable objections against performance of inheritance, and I wasn't 100% sure you seeing seq scans was not part of you thinking performance of this method would be suboptimal so just to be sure, I explicitly went for an example with indexes. > So the issue of indexes aside, I think some of the misunderstanding is > related to my assumption that the appended operations are relatively > more expensive than scanning the same number of rows in a single select. I see. Well, the Append step itself I suppose is not doing much else than iterates over its subnodes and asks each of them to return their rows, and forwards the rows upwards to the rest of the plan as it receives them - it doesn't buffer them, or collect them all before forwarding them upwards (I think the Materialize step that were to be seen in an example in my last PS is the one that does that). So I don't think the Append incurs any significant costs much more than a few CPU cycles for that iteration and row forwarding (pass of one pointer to in-memory row I guess) steps - I think these are minuscule compared to the cost of any disk I/O, and in most non-CPU-bound queries are hidden by disk throughput/latency anyway. > Here is the way it looks on my system when I select a single object. > > db_drs0001=> explain select * from tbl_objects where id = 1; > NOTICE: QUERY PLAN: > > Result (cost=0.00..153.70 rows=6 width=111) > -> Append (cost=0.00..153.70 rows=6 width=111) > -> Seq Scan on tbl_objects (cost=0.00..144.35 rows=1 width=107) > -> Seq Scan on tbl_viewers tbl_objects (cost=0.00..1.06 rows=1 > width=97) > -> Seq Scan on tbl_documents tbl_objects (cost=0.00..4.91 rows=1 > width=111) > -> Seq Scan on tbl_formats tbl_objects (cost=0.00..1.11 rows=1 > width=100) > -> Seq Scan on tbl_massemails tbl_objects (cost=0.00..1.01 rows=1 > width=90) > -> Seq Scan on tbl_icons tbl_objects (cost=0.00..1.25 rows=1 > width=110) > > db_drs0001=> select version(); > version > --------------------------------------------------------------- > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3 > (1 row) > > The only question here is, if a select requires the scanning of all rows > to return a result set, is it dramatically less efficient to have it > scanning 100,000 rows spread over 6 tables or to scan 100,000 rows in a > single table? I see. Well, the Append step itself I suppose isn't doing much more than iterates over its subnodes (the seq scans in the case above) and asks each of them to return rows, and forwards the rows upwards to the rest of the plan as it receives them - it doesn't buffer them, or collect them all before forwarding them upwards (I think the Materialize step that were to be seen in an example in my last PS does that). So aside for any costs of consulting indexes, I don't think the Append step - which is the added step when scanning multiple tables versus scanning one table - incurs any significant costs much more than a few CPU cycles for those iteration and row forwarding (pass of one pointer to in-memory row I guess) steps - I think these are minuscule compared to the cost of any disk I/O, and in most non-CPU-bound queries are hidden by disk throughput/latency anyway. > (At the moment I have no where near that amount of data. Side question, > what technique do you use to generate data to fill your tables for > testing?) For this occasion I just went and created a dozen-line PHP script that simply inserted 10000 rows with consecutive ids into each table. I suggest you to try to populate your test database with test data on the order of your expected working data set and use EXPLAIN ANALYZE to make estimates of expected performance of the database. > I'm now starting to see that it likely isn't that much different either > way so the benefits of the way it's implemented probably out weigh the > negatives. Your end up scanning the same number of rows either way. Aside from extreme cases where child rows are considerably much more wider than parent rows and thus result in considerably more data needed to be read in case of a sequential scan, yes. > On the topic of proper indexes, if you would indulge me, can you show me > where I have gone wrong in that regard? Hmm, I think you should only have gone and created indexes for child tables by hand, as indexes are not inherited. Also, don't forget, PostgreSQL has an advanced query cost estimation subsystem, which decides for or against using an index based on, among others, statistics collected on distribution of values in the index to determine its selectivity (so don't forget to ANALYZE/VACUUM ANALYZE after inserting/changing a lot of rows that significantly change distribution of values - this includes initial table fillup), and also it accounts for costs of accessing index pages, so with less than say a couple of thousand rows or with not very selective indexes it will (rightly) decide not to use the index but do a seq scan instead - probably the reason for why you don't see an index scan on tbl_objects above despite there being an index on the primary key. > My biggest point of confusion > here is with regards to the sequences that are used in the parent table. Child tables inherit the "nextval('...')" default value, so as a result they will all draw from the same one sequence, which sequence exists outside of the tables; as a result as long as you use that default value, it is guaranteed that the column in question will have unique values among all tables parent and children; they won't be consecutive - but that's not a drawback of inheritance either, as a sequence is not guaranteed to provide consecutive numbers with single tables either due to transaction concurrency (rolled back transactions don't 'put back' numbers into the sequence, so in the case of rolled back transactions there will be numbers drawn from the sequence that never actually get into any table - this is nicely documented with sequences and transactions). > Here is the schema as produced by pg_dump. The original create used the > keyword "serial" or "bigserial" as the case may be. I've edited some of > the columns out just to keep the example shorter: > > CREATE SEQUENCE "tbl_objects_id_seq" start 1 increment 1 maxvalue > 9223372036854775807 minvalue 1 cache 1; > > CREATE TABLE "tbl_objects" ( > "id" bigint DEFAULT nextval('"tbl_objects_id_seq"'::text) NOT NULL, > "name" text DEFAULT '' NOT NULL, > "description" text DEFAULT '' NOT NULL, > "status" smallint DEFAULT '1' NOT NULL, > "class" text > ); > > CREATE TABLE "tbl_viewers" ( > "exec" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_documents" ( > "filename" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_massemails" ( > "from" text DEFAULT '' NOT NULL, > "subject" text DEFAULT '' NOT NULL, > "message" text DEFAULT '' NOT NULL ) > INHERITS ("tbl_objects"); > > CREATE TABLE "tbl_icons" ( > "format_id" bigint DEFAULT '0' NOT NULL ) > INHERITS ("tbl_documents"); > > CREATE TABLE "tbl_formats" ( > "viewer_id" bigint DEFAULT '0' NOT NULL, > "extension" text DEFAULT '' NOT NULL, > "contenttype" text DEFAULT '' NOT NULL, > "upload_class" text ) > INHERITS ("tbl_objects"); > > CREATE UNIQUE INDEX tbl_objects_id_key ON tbl_objects USING btree (id); Hmm, I wonder whether you have a specific goal with or reason for explicitly specifying NOT NULL and empty string ('') as default value for all these text fields? If it's just because your frontend makes it inconvenient for you to treat a NULL as empty string, you might want to consider allowing NULLs and using the coalesce() function in your select - this would incur a few CPU cycles per returned result row, but will spare you a few bytes in storage - I think 4 or 8 per column - for each NULL value. Whether this is worth it or not depends on the percentage of empty/NULL values in your data though. > Thanks very much for taking the time to look into this with me. It has > been most informative. You're welcome! Regards, Andras