Re: Query to return normalized floats
От | Kip Warner |
---|---|
Тема | Re: Query to return normalized floats |
Дата | |
Msg-id | 1460420220.13062.32.camel@thevertigo.com обсуждение исходный текст |
Ответ на | Re: Query to return normalized floats (Andreas Kretschmer <akretschmer@spamfence.net>) |
Ответы |
Re: Query to return normalized floats
(Kip Warner <kip@thevertigo.com>)
|
Список | pgsql-novice |
On Sun, 2016-02-28 at 09:21 +0100, Andreas Kretschmer wrote: > Kip Warner <kip@thevertigo.com> wrote: > > > The some_value_* fields contain floating point data and are at > > least a > > dozen in number. > > maybe a wrong table design, but that's another question. Hey Andreas. I am very sorry for taking so long to acknowledge and respond to your thoughtful answer. The software I am working on had another aspect of it that sucked me away for the last few weeks before I could return to the database schema. > > I would like to be able to perform queries on the table on the > > normalized versions of these values. By normalized I don't mean in > > the database nomenclature, but where all some_value_i's are in the > > range of [0,1]. > > > > To do this, I must find the min() and max() of each row's > > some_value_i within the table and divide each some_value_i by the > > absolute difference of these two values. > > > > As an example, if a row contained the lowest some_value_3 of -4.0 > > and the largest row 1.5, then the normalized version of any > > some_value_3 field is some_value_3 / (1.5 - -4.0) or some_value_3 / > > 5.5. > > > > I am having difficulty expressing this as a query to just list > > every row in the table for starters (e.g. SELECT * FROM my_table;). > > I considered creating a VIEW, my_table_normalized, but I'm not sure > > if that is the appropriate strategy here. > > simple example: > > test=*# select * from bla; > id | col1 | col2 > ----+-------------------+-------------------- > 1 | -27.3061781190336 | 9.23637737520039 > 2 | -34.9188138730824 | 4.02728125452995 > 3 | 27.7425193693489 | -1.71850152313709 > 4 | 18.2173402048647 | 1.78571328520775 > 5 | -49.3932147044688 | 3.25902994722128 > 6 | -21.3868645019829 | 0.0395399890840054 > 7 | 48.6888256389648 | -0.219368590041995 > 8 | -26.943267416209 | -2.84633947536349 > 9 | -47.2060812171549 | 1.46993971429765 > 10 | -16.008263733238 | 5.15772333368659 > 11 | 12.651920504868 | 4.91552650928497 > 12 | 38.5760291945189 | 6.94333815015852 > 13 | -47.87487979047 | -4.18941779062152 > 14 | -2.24363747984171 | 0.686697596684098 > 15 | -2.6916132774204 | 7.83255377784371 > 16 | -4.97196828946471 | 4.0004417207092 > 17 | 35.3446557652205 | 2.0218435768038 > 18 | -9.86138512380421 | 2.61743502691388 > 19 | -48.4832897316664 | -8.32880898378789 > 20 | -27.9842584393919 | -9.57530088722706 > (20 rows) > > test=*# create view my_factor as select max(col1) - min(col1) as c1, > max(col2) - min(col2) as c2 from bla; > CREATE VIEW > test=*# select id, col1 / f.c1, col2 / f.c2 from bla cross join > my_factor f; > id | ?column? | ?column? > ----+---------------------+--------------------- > 1 | -0.278401407876724 | 0.490991672638172 > 2 | -0.356016389451263 | 0.214084102351125 > 3 | 0.282850145370229 | -0.0913529085052159 > 4 | 0.185735738582485 | 0.094925782819407 > 5 | -0.503590815724457 | 0.173245039690613 > 6 | -0.218050770835282 | 0.00210188525087517 > 7 | 0.496409184275543 | -0.0116612982096414 > 8 | -0.274701334942333 | -0.151307046381315 > 9 | -0.481291794622778 | 0.078139743503564 > 10 | -0.163212996764598 | 0.274176671625737 > 11 | 0.128993243417117 | 0.261301859446679 > 12 | 0.393303698204536 | 0.369097219997987 > 13 | -0.48811056155476 | -0.222703032242957 > 14 | -0.0228751101831245 | 0.0365037923307268 > 15 | -0.0274424682438878 | 0.416366560632055 > 16 | -0.0506919337327751 | 0.212657353847013 > 17 | 0.360358080250589 | 0.107478107407462 > 18 | -0.100542210268818 | 0.139138836546109 > 19 | -0.494313633381836 | -0.442746727197808 > 20 | -0.285314807291989 | -0.509008327361828 > (20 rows) > > test=*# > > is that okay? Almost. I realized that the normalization is not calculated correctly and should be calculated as... norm(x) = (x - min)/(max - min) This also ensures the values are in the range of [0.0, 1.0]. It would also be nice if the view contained the fully normalized results. I've started with this... CREATE VIEW my_view AS SELECT id, (col1 - MIN(col1) / (MAX(col1) - MIN(col1)) AS col1_norm, (col2 - MIN(col2) / (MAX(col2) - MIN(col2)) AS col2_norm FROM my_table GROUP_BY id; The problem is when I try to see what it's populated with via... SELECT * FROM my_view; I get an ERROR: division by zero. I can see how that would happen if MAX(x) - MIN(x) was zero, but I've made sure it wasn't from the sample rows in my_table. I suspect my_view schema is not declared correctly. > (you should define aliases for the columns...) Yes. Hopefully I'm using the AS keyword correctly here. > create indexes on the columns, so it should use indexes for the > min/max. Done. -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com
Вложения
В списке pgsql-novice по дате отправления: