Обсуждение: (another ;-)) PostgreSQL-derived project ...
~ I have been searching for a PostgreSQL-derived project with a "less-is-best" Philosophy. Even though I have read about quite a bit of PG forks out there, what I have in mind is more like a baseline than a fork. ~ My intention is not wrapping the same thing in a different package or code add-ons/value-added features on top of PG, but ridding PG of quite a bit of its internal capabilities and just use its very baseline. ~ All I would need PG for is raw data warehousing, memory, I/O-subsystem management, MVCC/transaction management ... No fanciness whatsoever. What do you need to, say, format dates in the database if formatting/pretty-printing and internalization can be taken care more appropriately in the calling environment say Python or Java? All is needed is to store a long representing the date. Why are arrays needed in a the DB proper when serialization and marshaling/casting can be taken care of in the calling environment. If you are using say, java, all you need PG to do is to faithfully store a sequence of bytes and you would do the (de)serialization very naturally indeed. ~ There used to be a postgresql-base-<version> package with the bare minimum of source code to build and run PostgreSQL which I think would be a good starting point, but I don't find it in the mirror sites anymore ~ http://wwwmaster.postgresql.org/download/mirrors-ftp ~ Where can I find it? ~ I know the result will not be a SQL-compliant DBMS anymore, yet I wonder how much faster would SQL+client code doing such things as formatting "on-the-fly" work. ~ Do you know of such tests even in a regular PG installation? ~ Do you see any usefulness in such a project? ~ Do you know of such a project? Anyone interested? Any suggestions to someone embarking in it? ~ It would be great if PG developers see any good in it and do it themselves ;-) ~ lbrtchx
> ~ > I have been searching for a PostgreSQL-derived project with a > "less-is-best" Philosophy. Even though I have read about quite a bit > of PG forks out there, what I have in mind is more like a baseline > than a fork. > ~ > My intention is not wrapping the same thing in a different package or > code add-ons/value-added features on top of PG, but ridding PG of > quite a bit of its internal capabilities and just use its very > baseline. > ~ > All I would need PG for is raw data warehousing, memory, > I/O-subsystem management, MVCC/transaction management ... No fanciness > whatsoever. What do you need to, say, format dates in the database if > formatting/pretty-printing and internalization can be taken care more > appropriately in the calling environment say Python or Java? All is > needed is to store a long representing the date. Why are arrays needed > in a the DB proper when serialization and marshaling/casting can be > taken care of in the calling environment. If you are using say, java, > all you need PG to do is to faithfully store a sequence of bytes and > you would do the (de)serialization very naturally indeed. > ~ > There used to be a postgresql-base-<version> package with the bare > minimum of source code to build and run PostgreSQL which I think would > be a good starting point, but I don't find it in the mirror sites > anymore > ~ > http://wwwmaster.postgresql.org/download/mirrors-ftp > ~ > Where can I find it? > ~ > I know the result will not be a SQL-compliant DBMS anymore, yet I > wonder how much faster would SQL+client code doing such things as > formatting "on-the-fly" work. > ~ > Do you know of such tests even in a regular PG installation? > ~ > Do you see any usefulness in such a project? > ~ > Do you know of such a project? Anyone interested? Any suggestions to > someone embarking in it? > ~ > It would be great if PG developers see any good in it and do it themselves ;-) > ~ > lbrtchx Doesn't Yahoo! have some super modified mega-high-performant version of Postgres? Last I heard (which was like 2008) they were planning on putting it online. I think it involved a columnar oriented table format or something. Did this ever happen?
On Sep 24, 2011, at 22:54, Albretch Mueller <lbrtchx@gmail.com> wrote:
Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx
I can't tell if you mean this as a humorous post of if you just having something in your eye ;-)
I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.
As a developer I still have to deal with dates and arrays so while PostgreSQL could do less the work still has to be done. I happier having a group of programmers more skilled than myself doing it instead of me. Plus, by having it in the DB I avoid considerable considerable overhead and can now use those features within my SQL statements/queries.
I can see where adding complexity could weigh into a do/ignore decision but once it's in and tested why would you want to remove a feature? If it had serious performance implications maybe, but even then arguing for a runtime enable/disable flag would be best so everyone could decide based upon their unique circumstances.
Not a project developer but unless and until you can identify meaningful areas of performance degradation you are simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance; but if most/all of the overhead is in areas that you deem critical/core then nothing you would do would have a meaningful impact; and improving the core areas would improve not only your own situation but the core project as well.
"Premature optimization is the root of all evil." (someone not me)
David J.
Based on your description, I suggest you might want to look at SQLite. It provides a number of compile-time options where you can exclude various features you don't want from the binary, when simply ignoring the extra features isn't good enough. -- Darren Duncan Albretch Mueller wrote: > ~ > I have been searching for a PostgreSQL-derived project with a > "less-is-best" Philosophy. Even though I have read about quite a bit > of PG forks out there, what I have in mind is more like a baseline > than a fork. > ~ > My intention is not wrapping the same thing in a different package or > code add-ons/value-added features on top of PG, but ridding PG of > quite a bit of its internal capabilities and just use its very > baseline. > ~ > All I would need PG for is raw data warehousing, memory, > I/O-subsystem management, MVCC/transaction management ... No fanciness > whatsoever. What do you need to, say, format dates in the database if > formatting/pretty-printing and internalization can be taken care more > appropriately in the calling environment say Python or Java? All is > needed is to store a long representing the date. Why are arrays needed > in a the DB proper when serialization and marshaling/casting can be > taken care of in the calling environment. If you are using say, java, > all you need PG to do is to faithfully store a sequence of bytes and > you would do the (de)serialization very naturally indeed. > ~ > There used to be a postgresql-base-<version> package with the bare > minimum of source code to build and run PostgreSQL which I think would > be a good starting point, but I don't find it in the mirror sites > anymore > ~ > http://wwwmaster.postgresql.org/download/mirrors-ftp > ~ > Where can I find it? > ~ > I know the result will not be a SQL-compliant DBMS anymore, yet I > wonder how much faster would SQL+client code doing such things as > formatting "on-the-fly" work. > ~ > Do you know of such tests even in a regular PG installation? > ~ > Do you see any usefulness in such a project? > ~ > Do you know of such a project? Anyone interested? Any suggestions to > someone embarking in it? > ~ > It would be great if PG developers see any good in it and do it themselves ;-) > ~ > lbrtchx >
> ~ > I have been searching for a PostgreSQL-derived project with a > "less-is-best" Philosophy. Even though I have read about quite a bit > of PG forks out there, what I have in mind is more like a baseline > than a fork. > ~ > My intention is not wrapping the same thing in a different package or > code add-ons/value-added features on top of PG, but ridding PG of > quite a bit of its internal capabilities and just use its very > baseline. > ~ > All I would need PG for is raw data warehousing, memory, > I/O-subsystem management, MVCC/transaction management ... No fanciness > whatsoever. What do you need to, say, format dates in the database if > formatting/pretty-printing and internalization can be taken care more > appropriately in the calling environment say Python or Java? All is > needed is to store a long representing the date. Why are arrays needed > in a the DB proper when serialization and marshaling/casting can be > taken care of in the calling environment. If you are using say, java, > all you need PG to do is to faithfully store a sequence of bytes and > you would do the (de)serialization very naturally indeed. Maybe you let us in a little more on what you're trying to accomplish. What it looks like to me right now is that you're looking for a non-sql compliant SQL database where a lot of the data integrity is actually coded in the application :-) I bet there are database systems out there that do exactly or nearly what you want. Maybe an object oriented one may suit you better than a relational system? For me, I sure don't use all that postgresql has to offer, but I like that it does a lot of things for me and I code most of what my application does inside the database using views, stored procedures and triggers. That approach strips down on application complexity. My apps don't have to do any post-processing of the data - I query the records I need and the app merely displays them. Yes, sometimes I wish postgresql was more "high performance" - but then, I drive an old, paid for, practical car and not a formula one racer without a boot or spare tire. My point being: postgresql does what it does very reliably and although not the best performer on the market, it is a database I would trust my payroll with - and there are few where I'd make that statement. Never had any data loss ever, never had it crash on me. Give good hardware to postgresql and you will get good performance with exceptional stability and integrity. Uwe
~ Well, at least I thought you would tell me where the postgresql-base is to be found. The last version I found is: ~ http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2 ~ and I wondered what that is and why there are no postgresql-base after "8.3beta2" ~ > I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them. > Plus, by having it in the DB I avoid considerable considerable overhead ~ Can you or do you know of anyone who has made those kinds of imaginations falsifiable? ~ > ... and can now use those features within my SQL statements/queries. ~ For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) word in modern hardware) more efficient than comparing sequences of string characters? ~ > simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance ~ I never said that ~ > ... you might want to look at SQLite. It provides a number of compile-time options where you can exclude various featuresyou don't want from the binary ~ I couldn't find the compile options you mentioned: sqlite.org/ {faq.html, custombuild.html, docs.html} ~ > ... you're looking for a non-sql compliant SQL database where a lot of the data integrity is actually coded in the application:-) ~ First past of your statement I acknowledged, but how is it exactly that "lot of the data integrity is actually coded in the application" ~ > That approach strips down on application complexity. My apps don't have to do any post-processing of the data - I query the records I need and the app merely displays them. ~ Again have you actually tested those assumptions? ~ > My point being: postgresql does what it does very reliably ~ I never said otherwise ~ lbrtchx
On Sep 25, 2011, at 2:11, Albretch Mueller <lbrtchx@gmail.com> wrote: > ~ > Well, at least I thought you would tell me where the postgresql-base > is to be found. The last version I found is: > ~ > http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2 > ~ > and I wondered what that is and why there are no postgresql-base > after "8.3beta2" > ~ >> I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them. >> Plus, by having it in the DB I avoid considerable considerable overhead > ~ > Can you or do you know of anyone who has made those kinds of > imaginations falsifiable? No; not worth my effort. > ~ >> ... and can now use those features within my SQL statements/queries. > ~ > For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) > word in modern hardware) more efficient than comparing sequences of > string characters? The ARRAY_AGG() function in particular has been very useful in queries I write. > ~ >> simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance > ~ > I never said that Your whole post implies this otherwise there is no meaningful reason to look for something excluding features (assuming properand correct implementation). > >> That approach strips down on application complexity. My apps don't have to do any post-processing of the data - I querythe records I need and the app merely displays them. > ~ > Again have you actually tested those assumptions? > Is this the best response you can come up with? The crux of the counter-argument is that by having PostgreSQL handle 'advanced'features application code avoids the need to do so. The principle of code-reuse and the fact the features areexecuted by the same program holding the data make this a de-facto truth (and yes, one that we are probably taking forgranted). But, if you really feel a bare-bones implementation of PostgreSQL is worthwhile you are the one that needsto test (and state explicitly) your own underlying assumptions to see whether they hold and thus make such an endeavorworthwhile. David J.
On 25 Sep 2011, at 8:11, Albretch Mueller wrote: >> ... and can now use those features within my SQL statements/queries. > ~ > For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) > word in modern hardware) more efficient than comparing sequences of > string characters? Data types aren't stored in the database as character strings (unless you define your columns as text, of course). When data in the database gets compared to data in a query (for example, when you use a WHERE clause that compares a datecolumn to a given date), the data in the query gets transformed to the appropriate type (text to date, in this case)- just once. That's efficient enough that the difference in performance between a numerical value and the string representationdoesn't matter. I don't know what you're trying to say in the above, but you seem to base your hypothesis on wrong assumptions. Alban Hertroys -- The scale of a problem often equals the size of an ego.
On Sep 25, 2011, at 2:11 AM, Albretch Mueller wrote: > For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) > word in modern hardware) more efficient than comparing sequences of > string characters? What on earth makes you think the db engine compares numbers as strings??? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
On Sun, Sep 25, 2011 at 06:11:36AM +0000, Albretch Mueller wrote: > ~ > Well, at least I thought you would tell me where the postgresql-base > is to be found. The last version I found is: > ~ > http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2 Notwithstanding the rest of your post, I'm surpised you missed the website: http://www.postgresql.org/download/ There's a source code link, as well as several others. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Вложения
Albretch Mueller <lbrtchx@gmail.com> writes: > Well, at least I thought you would tell me where the postgresql-base > is to be found. The last version I found is: > http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2 > and I wondered what that is and why there are no postgresql-base > after "8.3beta2" We stopped bothering because the split tarballs weren't really good for anything separately. They were never independently buildable pieces, and were only meant to ease downloading the distribution over unreliable internet connections. That concern was obsolete some years ago. The only part of the PG distribution that's ever been meant to be separately buildable is libpq and some of the client-side tools. If you want to start stripping down the server, you're on your own. Now, having said that, there has been some interest in pushing lesser-used chunks like the geometric datatypes out into extensions. I don't see how that's going to result in any significant performance gain, though. regards, tom lane
> > ... you're looking for a non-sql compliant SQL database where a lot of > > the data integrity is actually coded in the application :-) > > ~ > First past of your statement I acknowledged, but how is it exactly > that "lot of the data integrity is actually coded in the application" > ~ Take dates for example: you'd have to code very carefully to catch all the different ways dates are represented on this planet. Your application has to handle this since all the database knows at this point is an absolute time (i.e. seconds since epoch or something the like) and your app has to convert every occurrence of a date to this format or the database won't find anything or even worse store something wrong. Same goes for numbers: if everything is stored in a byte sequence, how does the database know you try to compare the number 1 with the character "1"? Again, your app would have to handle that. To me, that's moving data integrity into the application. > > > That approach strips down on application complexity. My apps don't have > > to do any post-processing > > of the data - I query the records I need and the app merely displays them. > ~ > Again have you actually tested those assumptions? Yes, I have, as have many others. Simple example: program a website like, say Facebook. So you have thousands of users from all over the world. Your website code handles all the data conversions. Now Apple comes along and sells an iPhone which silly enough a lot of people like and try to use to access your website. You now face the problem that you need a second website doing the same thing as the first website except solely made for touch-screen devices. You will be forced to rewrite a lot of your code because all the data conversion is in the code. Even worse, if you'd have to make an iphone or android app in lieu of the second website, you'd have to recode everything you did in a different language - i.e. objective C. If you leave these things to the database, you "simply" write a second client for a different platform and you don't have to fuzz around to get the conversions correct because the application receives the data already converted. Sure this all depends on what application you need this specialized database engine for. If it's an application for a very defined environment you can dictate how data is to be input and train users. If it's an application for the big wild world you will have problems with users doing stupid things beyond your control like writing "P.O. Box 1" into the zipcode field where you expected a 10 digit number. I rather have the database catch those cases and reject storing the bad input. That saves me a lot of validation code in my app.
On Sat, Sep 24, 2011 at 11:11 PM, Albretch Mueller <lbrtchx@gmail.com> wrote: >> I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them. >> Plus, by having it in the DB I avoid considerable considerable overhead > ~ > Can you or do you know of anyone who has made those kinds of > imaginations falsifiable? > ~ My own experience here is that while it is generally possible to create additional overhead by mis-use of advanced features, *in general* you save more overhead and get clearer code by pushing what you can into the database within reason. >> ... and can now use those features within my SQL statements/queries. > ~ > For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) > word in modern hardware) more efficient than comparing sequences of > string characters? > ~ >> simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance > ~ > I never said that > ~ >> ... you might want to look at SQLite. It provides a number of compile-time options where you can exclude various featuresyou don't want from the binary > ~ > I couldn't find the compile options you mentioned: sqlite.org/ > {faq.html, custombuild.html, docs.html} > ~ >> ... you're looking for a non-sql compliant SQL database where a lot of the data integrity is actually coded in the application:-) > ~ > First past of your statement I acknowledged, but how is it exactly > that "lot of the data integrity is actually coded in the application" I can give you a good example. Some years ago, I was working on an accounting application someone else wrote which stored all monetary values as double-precision floats and then handled arbitrary precision math in the front-end of the application. This meant: 1) To detect if an invoice was closed, it would retrieve all gl lines associated with the invoice and an AR/AP account and see if these totalled to 0 in the middleware. This performed ok for a small database, but for a large one, it didn't work so well....... Had the application used NUMERIC types, this could have been more easily done with HAVING clause, and this could have been done far more efficiently on the db server. 2) It made the application relatively sensitive to rounding errors--- sum() with group by would return different numbers with different groupings in sufficiently large databases. So here you get a case where the application was made less robust and performed quite a bit worse by not using arbitrary math capabilities of PostgreSQL. > ~ >> That approach strips down on application complexity. My apps don't have to do any post-processing > of the data - I query the records I need and the app merely displays them. > ~ > Again have you actually tested those assumptions? In general my experience is that it is far easier to tune performance of an app as is described here (where all presentation is done in db) than it is an app where a lot of it is done in middle-ware or front-end. For example, consider the following: I need to determine all of the years that have dates in a database table with, say, 50M records. If I have a database query which does this all at once, when it performs badly, I can tune it, and there are fewer tradeoffs I have to make. > ~ >> My point being: postgresql does what it does very reliably > ~ > I never said otherwise I'd add it performs remarkably well IMHO as well as reliably. Best Wishes, Chris Travers
On 9/25/11, David Johnston <polobo@yahoo.com> wrote: > On Sep 25, 2011, at 2:11, Albretch Mueller <lbrtchx@gmail.com> wrote: >> Can you or do you know of anyone who has made those kinds of >> imaginations falsifiable? > > No; not worth my effort. ~ ;-) ~ >>> That approach strips down on application complexity. My apps don't have >>> to do any post-processing of the data - I query the records I need and >>> the app merely displays them. >> ~ >> Again have you actually tested those assumptions? > > Is this the best response you can come up with? The crux of the > counter-argument is that by having PostgreSQL handle 'advanced' features > application code avoids the need to do so. The principle of code-reuse and > the fact the features are executed by the same program holding the data make > this a de-facto truth (and yes, one that we are probably taking for > granted). ~ First, I wonder what kind of technical person would say there are "de-facto truth(s)". I thought only politicians would talk like that. Now, in a sense you are right, I am talking from the background of my own experiences (and so are you). When I have developed relatively complicated and heavily accessed websites I only use DBMS when I need to actually persist any data. For example there are options in Tomcat (the java-based web serverrr) to offload session handling to a DBMS (which is great when you need to stat(istically trace and infer users' navigation) and establish transactions offloading a timed-out session to an actual database hitting thhard drivevee (some user got distracted ...) ...) and that sounds great, but anything dealing with I/O would invariably slow your apps, so what I do is use in-mem (lite) DBMS such as Hypersonic SQL (hsqldb.org) to keep sessions off the I/O subsystem and the speed increase is --very-- noticeable ~ Since any piece of code engaging the I/O such as database access code should be as fast and simple as possible; yes, I would design, say, java code wrappers doing anything that is not strictly INSERT and SELECT raw data ... let me deal with the "semantics" and "business intelligence" of the data myself ~ > But, if you really feel a bare-bones implementation of PostgreSQL > is worthwhile you are the one that needs to test (and state explicitly) your > own underlying assumptions to see whether they hold and thus make such an > endeavor worthwhile. ~ you are right and I am up to it ~ On 9/25/11, Alban Hertroys <haramrae@gmail.com> wrote: >>> ... and can now use those features within my SQL statements/queries. >> ~ >> For what exactly? Isn't a comparison on 4 numeric bytes (1 (or 1/2) >> word in modern hardware) more efficient than comparing sequences of >> string characters? > > Data types aren't stored in the database as character strings (unless you > define your columns as text, of course). ~ I was talking about text and any formatting option in NUMERIC or DATE data ~ On 9/25/11, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Albretch Mueller <lbrtchx@gmail.com> writes: >> Well, at least I thought you would tell me where the postgresql-base >> is to be found. The last version I found is: >> >> http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2 >> and I wondered what that is and why there are no postgresql-base >> after "8.3beta2" > > We stopped bothering because the split tarballs weren't really good for > anything separately ... ~ Thank you for answering my question ~ > Now, having said that, there has been some interest in pushing > lesser-used chunks like the geometric datatypes out into extensions ... ~ Yes, that is it, extensions! It would be great if PG could be built in a more modular way, with just the features you need. Clusters of dependencies will have to be checked ... I think that would enrich PG development ~ On 9/25/11, Uwe Schroeder <uwe@oss4u.com> wrote: > >> > ... you're looking for a non-sql compliant SQL database where a lot of >> > the data integrity is actually coded in the application :-) >> >> ~ >> First past of your statement I acknowledged, but how is it exactly >> that "lot of the data integrity is actually coded in the application" >> ~ > > Take dates for example: you'd have to code very carefully to catch all the > different ways dates are represented on this planet. ~ Yeah! And java does an exceptionally good job at that (including internationalization) ~ http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html, text/DateFormat.html} ~ So, you would ultimately just have to store a long value into the DBMS ~ I am amazed to read that you/the PC community were still running regression tests in ASCII http://www.postgresql.org/docs/9.1/static/release-9-1.html (E.1.3.12. Source Code) * Run regression tests (postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the default encoding (Peter Eisentraut) Regression tests were previously always run with SQL_ASCII encoding. ~ > Same goes for numbers: if everything is stored in a byte sequence, how does > the database know you try to compare the number 1 with the character "1"? ~ This is something I would do with wrapping code using input and output bound command objects which are indexed after the same column index the DBMS uses ~ > To me, that's moving data integrity into the application. ~ Not exactly! Integrity is still a matter of the DBMS which can now handle it in an easier way in someone write a date in bangla and somebody else in Ukranian this is still the same date/time value ultimately determined by the rotation of our planet around the sun ... and all we need for that is a long value. Now, aren't we easying things for the DBMS? ~ >> > That approach strips down on application complexity. My apps don't have >> > to do any post-processing >> >> of the data - I query the records I need and the app merely displays them. >> ~ >> Again have you actually tested those assumptions? > > Yes, I have, as have many others. Simple example: program a website like, > say > Facebook. So you have thousands of users from all over the world. Your > website > code handles all the data conversions. Now Apple comes along and sells an > iPhone which silly enough a lot of people like and try to use to access your > website. You now face the problem that you need a second website doing the > same thing as the first website except solely made for touch-screen devices. > You will be forced to rewrite a lot of your code because all the data > conversion is in the code. ~ Well, the code you will have to write either way, regardless of where you keep it and in order to not even have to restart the application server cold I would code command objects (like function pointers in C) to handle those cases. It is ultimately a strings of characters you are dealing with ~ > Sure this all depends on what application you need this specialized database > engine for. If it's an application for a very defined environment you can > dictate how data is to be input and train users. If it's an application for > the big wild world you will have problems with users doing stupid things > beyond your control like writing "P.O. Box 1" into the zipcode field where > you > expected a 10 digit number. I rather have the database catch those cases and > reject storing the bad input. ~ You see this is just an implementation issue I would rather try to do validation as close to the client as possible and do it in an incremental way if necessary ~ On 9/25/11, Chris Travers <chris.travers@gmail.com> wrote: > ... > So here you get a case where the application was made less robust and > performed quite a bit worse by not using arbitrary math capabilities > of PostgreSQL. ~ this example is not representing what we have been discussing here and rounding/decimal representation is something base 2 arithmetic was not design for anyway ~ thank you and I think I will go for it anyway. Now, (and this is a question for developers) since refactoring code is something that has crossed your collective/cultural mind, is there a way I could do my own perusing with the code that you would find useful? ~ lbrtchx
On Sun, Sep 25, 2011 at 09:41:19PM +0000, Albretch Mueller wrote: > I am amazed to read that you/the PC community were still running regression tests > > *in ASCII*: > > http://www.postgresql.org/docs/9.1/static/release-9-1.html (E.1.3.12. Source Code) > * Run regression tests (postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the default encoding (Peter Eisentraut) > Regression tests were previously always run > > *with SQL_ASCII* encoding. Quite obviously you have got no clue and didn't bother checking either. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
> First, I wonder what kind of technical person would say there are > "de-facto truth(s)". I thought only politicians would talk like that. Well, politicians and Microsoft, Oracle etc. :-) > Now, in a sense you are right, I am talking from the background of my > own experiences (and so are you). When I have developed relatively > complicated and heavily accessed websites I only use DBMS when I need > to actually persist any data. For example there are options in Tomcat > (the java-based web serverrr) to offload session handling to a DBMS > (which is great when you need to stat(istically trace and infer users' > navigation) and establish transactions offloading a timed-out session > to an actual database hitting thhard drivevee (some user got > distracted ...) ...) and that sounds great, but anything dealing with > I/O would invariably slow your apps, so what I do is use in-mem (lite) > DBMS such as Hypersonic SQL (hsqldb.org) to keep sessions off the I/O > subsystem and the speed increase is --very-- noticeable > ~ > Since any piece of code engaging the I/O such as database access code > should be as fast and simple as possible; yes, I would design, say, > java code wrappers doing anything that is not strictly INSERT and > SELECT raw data ... let me deal with the "semantics" and "business > intelligence" of the data myself So you're keeping a lot in memory, which to me suggests plenty of hardware is available. One of my current apps chews up 8Gb of memory just for the app and I can't afford to get a 64Gb or more server. If I wanted to keep permanently accessed data in memory, I'd need somewhere around 1/2 a terrabyte of memory - so obviously not an option (or maybe really bad database design :-) ) That said, just considering the cost/effort it takes to strip Postgresql down, why don't you go with a server that has 1TB of solid state discs? That strips down the I/O bottleneck considerably without any effort. > > Data types aren't stored in the database as character strings (unless you > > define your columns as text, of course). > > ~ > I was talking about text and any formatting option in NUMERIC or DATE data In my experience "data formatting" goes both ways, in and out. Out is obviously not a major issue because errors don't cause data corruption. In, however, is a different issue. Errors in "inwards" conversion will cause data corruption. So unless you have an application that does very little "in" and a lot of "out", you still have to code a lot of data conversion which otherwise someone else (the postgresql developers) have already done for you. > > Take dates for example: you'd have to code very carefully to catch all > > the different ways dates are represented on this planet. > > ~ > Yeah! And java does an exceptionally good job at that (including > internationalization) Maybe it does. I never coded Java because I don't like to use technology where Oracle can come sue me :-) I do know however that a lot of languages have quirks with dates and internationalization (python you mentioned earlier being one of them) > http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html, > text/DateFormat.html} > ~ > So, you would ultimately just have to store a long value into the DBMS Yes, a long value - which can represent pretty much any valid and invalid date ever devised, so again you don't really know what's in the database when you leave the validation to the application. > This is something I would do with wrapping code using input and > output bound command objects which are indexed after the same column > index the DBMS uses Which still depends on your use case. Your assumption is that every piece of code is coded in Java - which is fine if that's what your application calls for. It's going to be a major hassle when you ever have to re-code in a different language though. > > To me, that's moving data integrity into the application. > > ~ > Not exactly! Integrity is still a matter of the DBMS which can now > handle it in an easier way in someone write a date in bangla and > somebody else in Ukranian this is still the same date/time value > ultimately determined by the rotation of our planet around the sun ... > and all we need for that is a long value. Now, aren't we easying > things for the DBMS? I agree to disagree on this one. The date value the database stores in this case is a long. Any "long" can be converted into a valid date - but is it really the date that was entered in the first place? If I give a date representation, i.e. 12/31/2010 to the database, I personally don't really care how the database stores the date underneath. All that interests me is that the next time I ask for that field I get 12/31/2010 back. There is no error that can be made other than user error if you ask the database to store a specific date representation. There are errors you can make in your own conversion code which can lead to a different "long" stored than intended. So again data integrity is at least partially in the application and not the database. > > Yes, I have, as have many others. Simple example: program a website > > like, say > > Facebook. So you have thousands of users from all over the world. Your > > website > > code handles all the data conversions. Now Apple comes along and sells an > > iPhone which silly enough a lot of people like and try to use to access > > your website. You now face the problem that you need a second website > > doing the same thing as the first website except solely made for > > touch-screen devices. You will be forced to rewrite a lot of your code > > because all the data conversion is in the code. > > ~ > Well, the code you will have to write either way, regardless of where > you keep it and in order to not even have to restart the application > server cold I would code command objects (like function pointers in C) > to handle those cases. It is ultimately a strings of characters you > are dealing with With the right design, you will have to rewrite the visual layer, not the application logic. Errors in the visual layer are of little consequence (except disgruntled users). So yes, if you use some kind of middleware that does all the converting and validating for you, the difference is negligible. But then, why write your own when the database already provides that functionality? > You see this is just an implementation issue I would rather try to do > validation as close to the client as possible and do it in an > incremental way if necessary This one I can agree on. My background is government and financial industry and neither use a system with client side validation - at least not that I have seen. Actually I've seen military systems which handle 98% of the "application" inside the database and just import/export text files without any GUI whatsoever. As I've said earlier, it all depends on what you're trying to do, what the requirements are and how sensitive the data is. In my way of thinking - being a bit paranoid as it is - I rather have nothing in memory and everything on at least 10 different computer systems spread out over the planet, just so that one of the systems survives the next asteroid impact with all my data intact :-) (i.e. one of my webservers doesn't even have the session id in memory - everything is in postgresql and replicated to different servers. Given, a big I/O hog because there's multiple records for every page ... but any cleaning woman can pull the plug and nothing will happen because the hot standbys someplace else will simply take over - people don't even have to log in again, the session is still valid on the standby)
On 09/25/11 7:48 PM, Uwe Schroeder wrote: > Which still depends on your use case. Your assumption is that every piece of > code is coded in Java - which is fine if that's what your application calls > for. It's going to be a major hassle when you ever have to re-code in a > different language though. its the old hammer and nail thing [1]. a pure Java programmer wants to see everything in Java as its the tool he knows. [1] - If your only tool is a hammer, the whole world looks like a nail. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 9/25/11, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Sun, Sep 25, 2011 at 09:41:19PM +0000, Albretch Mueller wrote: > >> I am amazed to read that you/the PC community were still running >> regression tests >> >> *in ASCII*: >> >> http://www.postgresql.org/docs/9.1/static/release-9-1.html (E.1.3.12. >> Source Code) >> * Run regression tests >> (postgresql.org/docs/9.1/static/install-procedure.html#BUILD) in the >> default encoding (Peter Eisentraut) >> Regression tests were previously always run >> >> *with SQL_ASCII* encoding. > > Quite obviously you have got no clue and didn't bother > checking either. ~ Karsten, you are right to some extent, but this is what I plainly read/it says and I can't keep a mental map of PG's code base and culture. ~ Now, a more insufferable one, when I said: ~ > ... if someone write a date in Bangla and somebody else in Ukrainian this is still the same date/time value ultimatelydetermined by the rotation of our planet around the sun ~ I meant by the rotation of our planet around the its own axis and even if it is primary school knowledge as a Physicist I more than enough know the difference even though the rotation of our planet around the sun does influence what we nominally consider to be the wall clock time (day light savings ...) ~ lbrtchx
On 9/26/11, Uwe Schroeder <uwe@oss4u.com> wrote: > In my experience "data formatting" goes both ways, in and out. Out is > obviously not a major issue because errors don't cause data corruption. In, > however, is a different issue. Errors in "inwards" conversion will cause > data > corruption. So unless you have an application that does very little "in" and > a > lot of "out", you still have to code a lot of data conversion which > otherwise > someone else (the postgresql developers) have already done for you. ~ Well, yes Uwe, but any code in the DBMS would slow a bit its main job which (to me) is store data and keeping its consistency as soon as possible ~ >> > Take dates for example: you'd have to code very carefully to catch all >> > the different ways dates are represented on this planet. >> >> ~ >> Yeah! And java does an exceptionally good job at that (including >> internationalization) > > Maybe it does. I never coded Java because I don't like to use technology > where > Oracle can come sue me :-) ~ Yeah! I felt like sh!t when I heard that Oracle had bought Sun, a -slightly- more open company, at least they like to keep that perception of themselves ... ~ >> http://download.oracle.com/javase/7/docs/api/java/ {util/Date.html, >> text/DateFormat.html} >> ~ >> So, you would ultimately just have to store a long value into the DBMS > > Yes, a long value - which can represent pretty much any valid and invalid > date > ever devised, so again you don't really know what's in the database when you > leave the validation to the application. ~ Not if you get that long value through java ;-) ~ >> This is something I would do with wrapping code using input and >> output bound command objects which are indexed after the same column >> index the DBMS uses > > Which still depends on your use case. Your assumption is that every piece of > code is coded in Java - which is fine if that's what your application calls > for. It's going to be a major hassle when you ever have to re-code in a > different language though. ~ Well, yes you are right ~ >> > To me, that's moving data integrity into the application. >> >> ~ >> Not exactly! Integrity is still a matter of the DBMS which can now >> handle it in an easier way in someone write a date in bangla and >> somebody else in Ukranian this is still the same date/time value >> ultimately determined by the rotation of our planet around the sun ... >> and all we need for that is a long value. Now, aren't we easying >> things for the DBMS? > > I agree to disagree on this one. The date value the database stores in this > case is a long. Any "long" can be converted into a valid date - but is it > really the date that was entered in the first place? ~ Again, not if you get that long value through java. It takes care of doing those checks for you. You could for example not enter 02/30/2011 as a date object in java and try to get a long out of it ~ > If I give a date > representation, i.e. 12/31/2010 to the database, I personally don't really > care how the database stores the date underneath. All that interests me is > that the next time I ask for that field I get 12/31/2010 back. ~ But "12/31/2010" is ultimately a representation of that long you would have inserted that was and can be represented in many different ways, depending on user preferences ~ > There is no > error that can be made other than user error if you ask the database to > store > a specific date representation. There are errors you can make in your own > conversion code which can lead to a different "long" stored than intended. ~ conversion code will not be mine and it is part of java and I would guess here python or any decent programming language ~ >> ~ >> Well, the code you will have to write either way, regardless of where >> you keep it and in order to not even have to restart the application >> server cold I would code command objects (like function pointers in C) >> to handle those cases. It is ultimately a strings of characters you >> are dealing with > > With the right design, you will have to rewrite the visual layer, not the > application logic. Errors in the visual layer are of little consequence > (except disgruntled users). So yes, if you use some kind of middleware that > does all the converting and validating for you, the difference is > negligible. > But then, why write your own when the database already provides that > functionality? ~ Because DBMS are I/O beasts and are way more likely to enter into delayed states and concurrency conflicts, so to me, the less you hassle them the better, for example I never handle http sessions with a DBMS because they are very volatile, temporary and user specific. An also, the hardware DBMSs sit on mechanically and magnetically wears off with usage ~ On 9/26/11, John R Pierce <pierce@hogranch.com> wrote: > its the old hammer and nail thing [1]. a pure Java programmer wants to > see everything in Java as its the tool he knows. ~ Well, not exactly. I am not religious about code. The most sophisticated code I have written in my life was in FORTRAN, but the most amount of code I have written has been ANSI C and C++ and later java, which I like because its hot spot technology enables it do run time optimizations (something you can't do with statically compiled code) which makes it faster than C on web servers ~ question: which kinds of code analysis tools do you use guys? ~ lbrtchx