Re: type design guidance needed
От | Evgeni E. Selkov |
---|---|
Тема | Re: type design guidance needed |
Дата | |
Msg-id | 200009230441.XAA09037@juju.mcs.anl.gov обсуждение исходный текст |
Ответ на | type design guidance needed (Brook Milligan <brook@biology.nmsu.edu>) |
Список | pgsql-hackers |
Brook, I have been contemplating such data type for years. I believe I have assembled the most important parts, but I did not have time to complete the whole thing. The idea is that hte units of measurement can be treated as arithmetic expressions. One can assign each of the few existing base units a fixed position in a bit vector, parse the expression, then evaluate it to obtain three things: scale factor, numerator and quotient, the latter two being bit vectors. So, if you assign the base units as 'm' => 1, 'kg' => 2, 's' => 4, 'K' => 8, 'mol' => 16, 'A' => 32, 'cd' => 64, the unit, umol/min/mg, will be represented as (0.01667, 00010000,00000110). Such structure is compact enough to be stashed into an atomic type. In fact, one needs more than just a plain bit vector to represent exponents: umol/min/ml => (0.01667, '00010000', '00000103') (because ml is a m^3) Here I use the whole charater per bit for clarity, but one does not need more than two or three bits -- you normally don't have kg^4 or m^7 in your units. I considered other alternatives, but none seemed as good as an atomic type. I can bet you will see performance problems and indexing nightmare with non-atomic solutions well before you hit the space constraints with the atomic type. You are even likely to see the space problems with the non-atomic storage: pointers can easily cost more than compacted units. There are numerous benefits to the atomic type. The units can be re-assembled on the output, the operators can be written to work on non-normalized units and discard the incompatible ones, and the chances that you screw up the unit integrity are none. So, if that makes sense, I will be willing to funnel more energy into this project, and I would aprreciate any co-operation. In the meanwhile, you might want to check out what I have done so far. 1. A perl parser for the units of measurement that computes units as algebraic expressions. I have done it in perl for theease of prototyping, but it is flex- and bison-generated and can be ported to c and included into the data type. Get it from http://wit.mcs.anl.gov/~selkovjr/Unit.tgz This is a regular perl extension; do a perl Makefile.PL; make; make install type of thing, but first you need to build and install my version of bison, http://wit.mcs.anl.gov/~selkovjr/camel-1.24.tar.gz There is a demo script that you can run as follows perl browse.pl units 2. The postgres extension, seg, to which I was planning to add the units of measurement. It has its own use already, andit exemplifies the use of the yacc parser in an extension. Please see the README in http://wit.mcs.anl.gov/~selkovjr/pg_extensions/ as well as a brief description in http://wit.mcs.anl.gov/EMP/seg-type.html and a running demo in http://wit.mcs.anl.gov/EMP/indexing.html (search for seg) Food for thought. --Gene
В списке pgsql-hackers по дате отправления: