Обсуждение: datatype conversion thoughts
I've been looking at a number of driver todo items that are all centered around converting between datatypes. Specifically adding COPY support, array handling, composite types (SQLInput/Ouptut), and binary data transfer. All of these center around converting pg datatypes to Java types and back. The current driver currently operates on everything (except bytea) as a String which is indeed easy to transform to and from with simple operations like Integer.parseInt, but this only works if you're really given an integer to begin with. The jdbc spec allows you to call getInt on a variety of types, like floats, so it needs a fallback to handle that. Also the driver allows you to call getInt on the deprecated pg money type. This means that every time you call getInt() it has to perform operations on the String it's given to determine if it's actually a money value "-$12.11" and strip away the dollars sign. This can't be good for performance. Basically what I'd like to see is a datatype parsed only as the type it actually is. We know what type the underlying data is, but at the moment it's not used. For the money example above it would be parsed as money and then converted to an int. This means you only pay the cost of parsing the money type when you actually have money. This also cleans up the other fallback parsing where say if we had a float column and called getInt on it we wouldn't bother trying Integer.parseInt on it first and then falling back to a float parsing method. One way of doing this would be provide a PGDataType abstract class that provided every getXXX method (getInt, getFloat, getBoolean, ...) which had a default implementation of saying "Invalid conversion...". Then for each datatype an implementation would be provided that could parse the underlying data and perform whatever conversions that are applicable on it. This would work the same way on the reverse side (converting from java -> pg) as the implementation would provide a method for converting to the format pg is expecting. There are some isses about when and how this conversion happens regarding excessive object creation or protecting a fixed set of these objects from multithreaded access, but I just wanted to throw this basic concept out there for any feedback. Kris Jurka
Kris, How would PGDataType relate to PGObject? I have always thought that PGObject should be extended and used for all datatypes (not just extensions). --Barry -----Original Message----- From: Kris Jurka [mailto:books@ejurka.com] Sent: Saturday, October 09, 2004 12:20 AM To: pgsql-jdbc@postgresql.org Subject: [JDBC] datatype conversion thoughts I've been looking at a number of driver todo items that are all centered around converting between datatypes. Specifically adding COPY support, array handling, composite types (SQLInput/Ouptut), and binary data transfer. All of these center around converting pg datatypes to Java types and back. The current driver currently operates on everything (except bytea) as a String which is indeed easy to transform to and from with simple operations like Integer.parseInt, but this only works if you're really given an integer to begin with. The jdbc spec allows you to call getInt on a variety of types, like floats, so it needs a fallback to handle that. Also the driver allows you to call getInt on the deprecated pg money type. This means that every time you call getInt() it has to perform operations on the String it's given to determine if it's actually a money value "-$12.11" and strip away the dollars sign. This can't be good for performance. Basically what I'd like to see is a datatype parsed only as the type it actually is. We know what type the underlying data is, but at the moment it's not used. For the money example above it would be parsed as money and then converted to an int. This means you only pay the cost of parsing the money type when you actually have money. This also cleans up the other fallback parsing where say if we had a float column and called getInt on it we wouldn't bother trying Integer.parseInt on it first and then falling back to a float parsing method. One way of doing this would be provide a PGDataType abstract class that provided every getXXX method (getInt, getFloat, getBoolean, ...) which had a default implementation of saying "Invalid conversion...". Then for each datatype an implementation would be provided that could parse the underlying data and perform whatever conversions that are applicable on it. This would work the same way on the reverse side (converting from java -> pg) as the implementation would provide a method for converting to the format pg is expecting. There are some isses about when and how this conversion happens regarding excessive object creation or protecting a fixed set of these objects from multithreaded access, but I just wanted to throw this basic concept out there for any feedback. Kris Jurka ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Barry Lind wrote: > Kris, > > How would PGDataType relate to PGObject? I have always thought that > PGObject should be extended and used for all datatypes (not just > extensions). We have the problem that PGobject is a public interface, so we probably need an adaption layer at a minimum to handle older PGobject implementations that don't know about binary formats, streaming, etc when they are introduced. -O
Kris Jurka wrote: > I've been looking at a number of driver todo items that are all centered > around converting between datatypes. Specifically adding COPY support, > array handling, composite types (SQLInput/Ouptut), and binary data > transfer. All of these center around converting pg datatypes to Java > types and back. > There are some isses about when and how this conversion happens regarding > excessive object creation or protecting a fixed set of these objects from > multithreaded access, but I just wanted to throw this basic concept out > there for any feedback. The basic concept is good. I was going in roughly the same direction with ParameterList and friends. To avoid threading/object creation issues you might want to look at a strategy/flyweight implementation where the per-type codec objects are stateless and are passed all the context they need, i.e: interface ParameterCodec { String getTextFormat(Object parameter); byte[] getBinaryFormat(Object parameter); void streamAsBinary(Object parameter, PGstream toStream); // etc. } This would also mean you can construct (for example) codec instances for Array types fairly painlessly; you have one instance per element type, configured to delegate to the element's codec for converting the elements. Also, separating the codec from storing the actual parameter data seems like a win when you need to handle Array/COPY/SQLOutput etc, since they all have different requirements. For the ResultSet side and binary data, I was looking at reading the entire DataRow message into a single array and passing array offsets to the codecs. That should help reduce the amount of garbage generation: we'd have one bytearray per row, not one per row per column. For results where we do not yet have a binary-format parser, I realized we can convert them to text on demand fairly easily: issue a "SELECT ?" passing the uninterpretable binary data as the parameter value (with correct type OID), and ask for results in text format. This should hopefully mean we can do binary format results by default without having to do an extra roundtrip to Describe the statement before every execution (to work out which result columns to make binary); you only pay the roundtrip cost when the user asks for a non-binary value, which should hopefully be rare (e.g. external PGobject implementations that haven't been updated for the changes). I've done some more detailed design, mostly in my head at the moment; if you want I can send you more details. -O
On Tue, 12 Oct 2004, Oliver Jowett wrote: > Barry Lind wrote: > > Kris, > > > > How would PGDataType relate to PGObject? I have always thought that > > PGObject should be extended and used for all datatypes (not just > > extensions). > > We have the problem that PGobject is a public interface, so we probably > need an adaption layer at a minimum to handle older PGobject > implementations that don't know about binary formats, streaming, etc > when they are introduced. > I had envisioned making PGDataType have some methods like supportsBinary() or similar to not require every datatype to support binary transfer. By making PGobject extend PGDataType it would not need to be changed much/at all to fit into this scheme. Kris Jurka
Hi, Oliver, On Tue, 12 Oct 2004 10:30:23 +1300 Oliver Jowett <oliver@opencloud.com> wrote: > To avoid threading/object creation issues you might want to look at a > strategy/flyweight implementation where the per-type codec objects are > stateless and are passed all the context they need, i.e: > > interface ParameterCodec { > String getTextFormat(Object parameter); > byte[] getBinaryFormat(Object parameter); > void streamAsBinary(Object parameter, PGstream toStream); > // etc. > } Sounds like a good idea, lightweight and stateless factories. Those instances By using an abstract superclass instead of an interface, we could gain default implementations (that use .toString() for getTextFormat or throw an exception for unsupported operations), and thus avoid code duplication. But on the other hand, an interface would be more flexible, and we could still provide an implementing class containing such default code that users could (but don't have to) use. > For the ResultSet side and binary data, I was looking at reading the > entire DataRow message into a single array and passing array offsets to > the codecs. That should help reduce the amount of garbage generation: > we'd have one bytearray per row, not one per row per column. Good idea. Less object instantiations and less garbage is always positive. > For results where we do not yet have a binary-format parser, I realized > we can convert them to text on demand fairly easily: issue a "SELECT ?" > passing the uninterpretable binary data as the parameter value (with > correct type OID), and ask for results in text format. This sounds genious. If we don't know how to deal, ask the server as it is guaranteed to have the knowledge. > This should > hopefully mean we can do binary format results by default without having > to do an extra roundtrip to Describe the statement before every > execution (to work out which result columns to make binary); you only > pay the roundtrip cost when the user asks for a non-binary value, which > should hopefully be rare (e.g. external PGobject implementations that > haven't been updated for the changes). I think this is a cost we can pay, all PGObject implementations I know are open source and so the new system can be adopted rather quick. > I've done some more detailed design, mostly in my head at the moment; if > you want I can send you more details. Please feel invited! :-) Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
Hi, Kris, [Sorry for the duplicate message - I really should learn how to configure my mailer, this message should have gone to the list directly.] On Sat, 9 Oct 2004 02:19:37 -0500 (EST) Kris Jurka <books@ejurka.com> wrote: > One way of doing this would be provide a PGDataType abstract class > that provided every getXXX method (getInt, getFloat, getBoolean, ...) > which had a default implementation of saying "Invalid conversion...". > Then for each datatype an implementation would be provided that could > parse the underlying data and perform whatever conversions that are > applicable on it. This would work the same way on the reverse side > (converting from java -> pg) as the implementation would provide a > method for converting to the format pg is expecting. That sounds great. I assume the PGDataType subclasses would be used like factories. When carefully crafted, this could even allow the PGDataType factory to directly instantiate classes like org.postgis.Point or JTS geometry classes to be returned by getObject(), without need to subclassing PGObject. This would allow to unify custom and built-in data types, including ability to override built-ins. <shameless rant - consume with a grain of salt> One of the most annoying aspects of the current PGObject approach is that the current architecture instantiates the custom data types and then passes the data to them via setValue. The jdbc driver only knows its a geometry object, but not the appropriate subclass (Point, Polygon etc.) which is encoded in the data itsself. So currently, PGGeometry and other PGObject subclasses are factories that create and remember the appropriate geometry subclass in setValue. Then the user calls getObject() to get the PGGeometry wrapper, and then the getGeometry() method to get the actual geometry class. Another drawback is lacking support for binary transfer, the getValue() and setValue() only work with slow and inefficient String representations. This even lead to the development of wkb4j that uses a hex encoded binary representation to speed-up reading, see http://wkb4j.sourceforge.net/ </rant> This would need a "backwards-mapping" from setObject classes to the PGDataType factory for writing data. This could be accomplished by combining a Map where the PGDataTypes register the classes they know to handle, and a probing scheme with auto-caching for the case someone puts subclasses into setObject. Your concept would render my "Register arbitrary types framework" demand obsolete (which I posted here on March 10, and then abandoned temporarily). BTW, this would even allow someone to transparently emulate other manufacturers drivers by writing e. G. a PGDataType implementation that handles oracle spatial GIS objects. This could ease the transition to PostgreSQL (provided the user has the appropriate license for the oracle java classes that would still be needed). So, I'd give a +10 on this suggestion. Or am I completely misleaded? Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com