Обсуждение: Bytea network traffic: binary vs text result format
Hi! There was some debate recently about using text or binary format. There's people who would like to use it but have trouble converting binary encoded results into native types and there's people that say there's not much performance difference. I'd like to stress that performance is very different over slow network. The biggest difference is for byte where the text format performance is about 3.5 times worse at inserts and updates and about 2.9 times worse at selects . Here's the reasoning: In text format bytea are escaped using PQescapeBytea. In an average binary stream about 2/3 would be escaped. Each escaped byte becomes of form \\ooo at upload and of form \ooo for download, so the size of the escaped stream is 1/3 + 2/3 * 5 = 11/3 = 3.6 and 1/3 + 2/3 * 4 = 3 respectively. Here are the results of my test. I inserted and selected an OpenOffice document of size 2Mb over a 2M/512K cable. text format: insert: 120.1s select: 24.9s binary format: insert: 33.5s select: 8.6s factor: insert: 3.6 select: 2.9 The difference between the test and the above calculation comes from the estimate that 2/3 of bytes are escaped where in fact 95 out of 256 are escaped (63%). So there is a need (people asking) and reason (performance) to use binary format. But there's a huge drawback - the conversions. It's easy for varchar, not too bad for basic types (int, float, bool), effort is needed for timestamp, date, time and numeric is a pain. So with all the above there should be a utility for conversion between binary format and native types and/or string format in libpq. Regards, Miha Puc
Hi, Miha D. Puc wrote: > So there is a need (people asking) and reason (performance) to use > binary format. You are aware that PostgreSQL itself *can* transfer values in binary format? Check the Documentation: "43.1.3. Formats and Format Codes": http://www.postgresql.org/docs/8.1/interactive/protocol.html This works since protocol version 3, AFAICT. The client needs to support that, though. But the php pgsql binding, just as an example, doesn't use that feature. > So with all the above there should be a utility for conversion between > binary format and native types and/or string format in libpq. There already are. Not in libpq, though. Most (if not all) internal types have those functions. See for example: # SELECT typinput, typoutput, typreceive, typsend # FROM pg_type WHERE typname='int4'; typinput | typoutput | typreceive | typsend ----------+-----------+------------+---------- int4in | int4out | int4recv | int4send (1 row) The input and output functions deal with the textual representation, while send and receive convert to a binary representation in network byte order. Hope that helps. Regards Markus
On 6/4/07, Markus Schiltknecht <markus@bluegap.ch> wrote:
There already are. Not in libpq, though. Most (if not all) internal
types have those functions.
Well, I don't think it would be puzzling that someone will ask for them even if those functions already exist in PostgreSQL because, as you stated, it's not in libpq. People will actually have to dig in the source to pinpoint where in the source tree those functions exist and how to include them in the program, which, I assume, is less ideal than when it's already included in libpq in the first place. There was a post before on a user who got disappointed because of the "crappy" support of libpq (not PostgreSQL) for binary formats.
In line with this, I'm aware of the issues that this may pose: http://archives.postgresql.org/pgsql-hackers/1999-08/msg00374.php this is already very old (8 years ago!). And they discuss about the issue that the representation might change from version to version so it's not done. There have been plans to incorporate CORBA, which IMHO is an overkill, to solve this problem but I don't think it's done yet because it's probably too complex (?). I'd rather recommend them to use ASN.1 (if that's feasible..).
--
Life is too short for dial-up.
>There was some debate recently about using text or binary format. >There's people who would like to use it but have trouble converting >binary encoded results into native types and there's people that say >there's not much performance difference. > >I'd like to stress that performance is very different over slow >network. The biggest difference is for byte where the text format >performance is about 3.5 times worse at inserts and updates and about >2.9 times worse at selects . Here's the reasoning: You're referring to the worst case of the text format - handling pure binary data - and yes, in this case, up to 4 times as many bytes can flow over the network. If the network is your constraining factor, then this will be significant. But in many other cases (fast local network), the disks are the limiting factor, or other data types dominate, and in those cases, the text format can actually be smaller. >So there is a need (people asking) and reason (performance) to use >binary format. But there's a huge drawback - the conversions. It's easy >for varchar, not too bad for basic types (int, float, bool), effort is >needed for timestamp, date, time and numeric is a pain. > >So with all the above there should be a utility for conversion between >binary format and native types and/or string format in libpq. And there's the rub... the "basic" types you mention all have standard C representations, but there is no standard C type for timestamp, date, time and numeric, so what would libpq convert to? Any chosen format will only suite a subset of users, and will result in double conversions for others who are constrained by the existing types used within their application. I'm not sure what the answer is - certainly documenting the wire format would be a good first step. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/