Обсуждение: Shall I apply normalization in the following case?
For example, for the following table, measurement (without normalization) =========== id | value | measurement_unit | measurement_type ------------------------------------------------ 1 0.23 mm width 2 0.38 mm width 2 0.72 mm width If I normalize to the following format, I will encounter several problem compared to table without normalization measurement (normalization) =========== id | value | measurement_unit_id | measurement_type_id ------------------------------------------------------ 1 0.23 1 1 2 0.38 1 1 2 0.72 1 1 measurement_unit_id =================== id | value ---------- 1 | mm measurement_type_id =================== id | value ---------- 1 | width (1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower comparedto non-normalization table. One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type", I need to join measurement+ measurement_unit_id + measurement_type_id. For non-normalization table, I need NOT to join. Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn intoun-normalize solution. Please advice if you realize I miss out several important points. Thanks Yan Cheng CHEOK
You can always normalize and not use an artificial key.
You'd end up with:
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
And so on. You'd benefit from a normalized structure, you'd have constraints checking for valid units and types and you wouldn't need join to get the resulting information.
--
Jorge Godoy <jgodoy@gmail.com>
You'd end up with:
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width
measurement_unit_id
===================
value
----------
mm
m
cm
in
cm²
m³
cm
in
cm²
m³
measurement_type_id
===================
value
----------
width
area
depth
volumedepth
And so on. You'd benefit from a normalized structure, you'd have constraints checking for valid units and types and you wouldn't need join to get the resulting information.
--
Jorge Godoy <jgodoy@gmail.com>
On Wed, Feb 3, 2010 at 23:20, Yan Cheng Cheok <yccheok@yahoo.com> wrote:
For example, for the following table,
measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width
If I normalize to the following format, I will encounter several problem compared to table without normalization
measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1
measurement_unit_id
===================
id | value
----------
1 | mm
measurement_type_id
===================
id | value
----------
1 | width
(1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower compared to non-normalization table.
One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type", I need to join measurement + measurement_unit_id + measurement_type_id.
For non-normalization table, I need NOT to join.
Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn into un-normalize solution. Please advice if you realize I miss out several important points.
Thanks
Yan Cheng CHEOK
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
> For example, for the following table, > > > measurement (without normalization) > =========== > id | value | measurement_unit | measurement_type > ------------------------------------------------ > 1 0.23 mm width > 2 0.38 mm width > 2 0.72 mm width > > > If I normalize to the following format, I will encounter several problem compared to table without normalization > > > > measurement (normalization) > =========== > id | value | measurement_unit_id | measurement_type_id > ------------------------------------------------------ > 1 0.23 1 1 > 2 0.38 1 1 > 2 0.72 1 1 > > > measurement_unit_id > =================== > id | value > ---------- > 1 | mm > > > measurement_type_id > =================== > id | value > ---------- > 1 | width > 1) foreign key constraints are important, so you don't have things misspelled or spelled differently and to define the "official" value. 2) querying on an int is quicker then querying on a string, so if you query on the values without the join you will have better performance. 3) You might want to have more information in the other tables one day, such as unit conversion information or descriptions, etc.. 4) depending on the size of the string, it might take less space for an int. Though a varchar with mm only takes 3 bytes, width takes 6 bytes, while a regular int takes 4. 5) As Jorge mentioned you can make the value your pk instead of a serial int and then you have it normalized and readable. For the specific design that you are showing, there is no real benefit to normalization, other then it would make it more scalable. Sim
Sim Zacks wrote: > 2) querying on an int is quicker then querying on a string, so if you > query on the values without the join you will have better performance. The point of Jorge's approach is that it allows a query without a join. With the OP's normalization, using integer keys, a join was required, obviating any advantage to the integer key. The assertion that a query involving integer keys is always faster than one with string keys is unsafe at best. The first rule of performance optimization is that optimizations don't. You have to test and measure to find out when and if they do. Misuse of surrogate keys to obtain optimization is a prime example. I've seen situations in the field several times when such abuse forces extra joins in queries and an increase in the number of indexes to maintain. Without measurement under representative workloads, especially concurrent activity, it's impossible to know whether the cost outweighs the benefit. -- Lew
On Wed, 2010-02-03 at 17:20 -0800, Yan Cheng Cheok wrote: > For example, for the following table, > > > measurement (without normalization) > =========== > id | value | measurement_unit | measurement_type > ------------------------------------------------ > 1 0.23 mm width > 2 0.38 mm width > 2 0.72 mm width That looks normalized to me. Can you describe the redundancy that I don't see? Do units of "mm" always imply "width" (I would think they might also be used for "length"). Regards, Jeff Davis
On Wed, 2010-02-03 at 23:56 -0200, Jorge Godoy wrote: > You can always normalize and not use an artificial key. > > You'd end up with: > > > measurement (normalization) > =========== > id | value | measurement_unit_id | measurement_type_id > ------------------------------------------------------ > 1 0.23 mm width > 2 0.38 mm width > 2 0.72 mm width How is this different? It looks like you just added some FKs, which don't change the normal form. Regards, Jeff Davis