Обсуждение: Let's use libpq for everything

Поиск
Список
Период
Сортировка

Let's use libpq for everything

От
Heikki Linnakangas
Дата:
Hi everyone,

In the long-term, I believe psqlodbc would be better off relying on
libpq for all operations, and not touching the socket directly. That
would allow getting rid of all the authentication stuff, and a lot of
other code. In general, less code to maintain is good. This also ties
into my recent work in the PostgreSQL and libpq side to support other
SSL implementations than OpenSSL. If we ever get to do that in
PostgreSQL, psqlodbc will still be stuck with OpenSSL until someone gets
around to adding support for the new libraries. If we pull that off in
PostgreSQL side, *and* switch to doing everything through libpq in
psqlodbc, we have a lot to gain: if we can replace OpenSSL with native
Windows calls, we would no longer need to ship OpenSSL with the
installer, and we would no longer be vulnerable to whatever security
issues OpenSSL happens to have that week.

I went ahead and replaced all the backend-interactions with libpq calls.
I've pushed that to a development branch at:
git@github.com:hlinnaka/psqlodbc.git, branch "libpq-integration4". It's
not 100% complete - a few regression tests are failing - but it mostly
works. Not surprisingly, this makes a lot of code unnecessary:

  46 files changed, 1357 insertions(+), 7545 deletions(-)


The principle is simple: instead of sending FE/BE protocol messages
directly, just call the corresponding libpq functions. There are a few
problems that are worth mentioning:

0. A few regression tests fail currently. Need to investigate; they are
probably some minor bugs, nothing insurmountable.

1. The code structure in some places doesn't make it easy to switch to
libpq. We have code where we send the Parse message in one function, and
in a completely different place in the codebase we send the Sync and
process the results. The corresponding libpq functions couple those two
operations. Some refactoring of the code is required to bend it to the
libpq way. I tried to change as little as possible, to avoid unraveling
the whole ball of spaghetti that the psqlodbc code is, but some changes
were necessary to the way statements are prepared and executed.

2. Using libpq functions requires more round-trips for some operations,
because when speaking the protocol directly, you can "pipeline" some
operations, and libpq doesn't expose functions to do that. I haven't
investigated that thoroughly, but I think we will need an extra
round-trip for some cases where a query is prepared, described, and
executed. We currently send a command sequence: Parse, Describe
(Params), Bind, Execute, Sync. With libpq, the equivalent is to do
PQprepare + PQdescribePrepared + PQexecute, but that requires three
round-trips vs. one with the current approach. That sounds like a lot,
but I believe it only happens when preparing a "permanent" prepared
query for the first time. This probably needs more testing and comparing
to make sure none of the common cases regress too much, but I don't
think it's a show-stopper.

3. I'm not sure if I got all the less common "modes" like
"DisallowPremature=1/0" and "Parse=1/0" to work the same they did
before. I think we need to have a discussion on which modes we still
need to support, and which are just legacy that we could remove.

Does anyone object to this as a goal? If not, I'm going to continue
working on this, fixing the regression failures, and looking closely at
the number of round-trips.

- Heikki


Re: Let's use libpq for everything

От
Michael Paquier
Дата:


On Tue, Nov 4, 2014 at 2:22 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
In the long-term, I believe psqlodbc would be better off relying on libpq for all operations, and not touching the socket directly. That would allow getting rid of all the authentication stuff, and a lot of other code. In general, less code to maintain is good. This also ties into my recent work in the PostgreSQL and libpq side to support other SSL implementations than OpenSSL. If we ever get to do that in PostgreSQL, psqlodbc will still be stuck with OpenSSL until someone gets around to adding support for the new libraries. If we pull that off in PostgreSQL side, *and* switch to doing everything through libpq in psqlodbc, we have a lot to gain: if we can replace OpenSSL with native Windows calls, we would no longer need to ship OpenSSL with the installer, and we would no longer be vulnerable to whatever security issues OpenSSL happens to have that week.
+1.
 
I went ahead and replaced all the backend-interactions with libpq calls. I've pushed that to a development branch at: git@github.com:hlinnaka/psqlodbc.git, branch "libpq-integration4". It's not 100% complete - a few regression tests are failing - but it mostly works. Not surprisingly, this makes a lot of code unnecessary:

 46 files changed, 1357 insertions(+), 7545 deletions(-)
Ugh. That's... Neat.
 
The principle is simple: instead of sending FE/BE protocol messages directly, just call the corresponding libpq functions. There are a few problems that are worth mentioning:
[...]
Does anyone object to this as a goal?
Reducing the number of extra dependencies that pgodbc has is definitely worth it. People have complained many times in the past that the msi installer bundles vulnerable versions of SSL... And this increases the work of maintainers.
--
Michael

Re: Let's use libpq for everything

От
Craig Ringer
Дата:
On 11/04/2014 01:22 AM, Heikki Linnakangas wrote:
> Hi everyone,
>
> In the long-term, I believe psqlodbc would be better off relying on
> libpq for all operations, and not touching the socket directly.

About +1000 from me after the recent fun I had debugging SSPI issues,
where I found it was sending version 1 protocol packets on the wire...

> I went ahead and replaced all the backend-interactions with libpq calls.
> I've pushed that to a development branch at:
> git@github.com:hlinnaka/psqlodbc.git, branch "libpq-integration4". It's
> not 100% complete - a few regression tests are failing - but it mostly
> works. Not surprisingly, this makes a lot of code unnecessary:
>
>  46 files changed, 1357 insertions(+), 7545 deletions(-)

That's what I like to see.

> 2. Using libpq functions requires more round-trips for some operations,
> because when speaking the protocol directly, you can "pipeline" some
> operations, and libpq doesn't expose functions to do that.

That's something that needs addressing in libpq, IMO.

It's been discussed on -hackers (I think) recently. I initially thought
we needed an API like JDBC's batch API, but was convinced that we really
should have something more generic than that.

The current async API:

http://www.postgresql.org/docs/8.1/static/libpq-async.html

doesn't permit pipelining, so we'd want to enhance it to allow that, by
lifting the restriction that:

"PQsendQuery may not be called again (on the same connection) until
PQgetResult has returned a null pointer, indicating that the command is
done."

and the same for the other variants.

So PGresult would have to carry more info and the global state less,
allowing libpq to keep track of the ordering of queries and wait until a
given result is ready.

It probably isn't that hard to do.

> 3. I'm not sure if I got all the less common "modes" like
> "DisallowPremature=1/0" and "Parse=1/0" to work the same they did
> before. I think we need to have a discussion on which modes we still
> need to support, and which are just legacy that we could remove.

I'd like to see a lot of that complexity and variation simply removed.

People who want the old stuff can use the old driver.

> Does anyone object to this as a goal?

I think it's absolutely great.

I don't use ODBC or have current support customers who use it, so I'm
going to be hard pressed to find time allocation for testing and helping
with this, but I'll see if I can dig some up. Or perhaps time to work on
pipeline support for libpq.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Let's use libpq for everything

От
Heikki Linnakangas
Дата:
On 11/18/2014 09:01 AM, Craig Ringer wrote:
> On 11/04/2014 01:22 AM, Heikki Linnakangas wrote:
>> I went ahead and replaced all the backend-interactions with libpq calls.
>> I've pushed that to a development branch at:
>> git@github.com:hlinnaka/psqlodbc.git, branch "libpq-integration4". It's
>> not 100% complete - a few regression tests are failing - but it mostly
>> works. Not surprisingly, this makes a lot of code unnecessary:
>>
>>   46 files changed, 1357 insertions(+), 7545 deletions(-)
>
> That's what I like to see.

I spent some time adding regression tests, and making it easier to run
them with different combinations of configuration options. I've pushed
those changes to the master branch.

I fixed the remaining regression test failures, and some new ones
revealed by the new tests, and pushed a new version of this patch to the
above github branch. It now passes all regression tests, in all
combinations of UseDeclareFetch, UseServerSidePrepare, and Protocol
(i.e. rollback on error behavior - it doesn't really control the
protocol anymore)

>> 2. Using libpq functions requires more round-trips for some operations,
>> because when speaking the protocol directly, you can "pipeline" some
>> operations, and libpq doesn't expose functions to do that.
>
> That's something that needs addressing in libpq, IMO.

I spent some time analyzing the situation with Wireshark, and managed to
eliminate the worst cases of extra round-trips. It turns out that the
driver doesn't really do much more pipelining than libpq already allows.
So this isn't such a big deal.

To measure the round-trips, I wrote a little hack. I patched the server
to count the number of Sync messages and Query messages it receives;
those are the messages that tell the server to send a reply back, i.e.
the number of round-trips. When a backend exits, it prints out the
counter, together with the application_name, to the log.

I then hacked the driver to use an application_name that includes the
test program's name, and the current UseServerSidePrepare,
UseDeclareFetch, and Protocol settings. Finally, I ran the regression
suite, with all combinations of those, with current git master and this
libpq-integration4 branch, and tabulated the results. See attached.

There is little difference in the number of round-trips. Some test cases
need a few extra round-trips, while others need less. I'm not sure where
all those little differences come from, but it seems acceptable. There
is a quite significant increase in round-trips in the insertreturning
test case, 808 vs 608, but that's not very representative of real-world
usage. In a loop, it prepares a statement with SQLPrepare, and calls
SQLNumResultCols on it, before SQLExecute. It then runs SQLExecute, and
closes the prepared statement, after just one execution. So I think
that's acceptable too.

>> 3. I'm not sure if I got all the less common "modes" like
>> "DisallowPremature=1/0" and "Parse=1/0" to work the same they did
>> before. I think we need to have a discussion on which modes we still
>> need to support, and which are just legacy that we could remove.
>
> I'd like to see a lot of that complexity and variation simply removed.
>
> People who want the old stuff can use the old driver.

Agreed. I'm going to spend a little time testing those modes, and see if
there are issues. If not, or they're easy to fix, we might as well not
touch them right now. Otherwise, we'll need to discuss the fate of those
options.

If anyone has any test cases in mind that are not covered by the current
regression tests, please speak up. I'm relying heavily on the regression
suite, and if there's some major functionality or usage pattern that's
not covered, it's likely to get broken.

Overall, I think this looks *very* good. I'm actually in favor of
committing this fairly soon, in the next week or so, so that the
09.04.0000 version would use libpq-only code. Any objections?

- Heikki


Вложения

Re: Let's use libpq for everything

От
Michael Paquier
Дата:
On Fri, Nov 28, 2014 at 1:14 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Overall, I think this looks *very* good. I'm actually in favor of committing
> this fairly soon, in the next week or so, so that the 09.04.0000 version
> would use libpq-only code. Any objections?
Not really, that's quite a lot of cleanup related to the socket handling!

Here are some comments:
1) statement.c has a NOTUSED block with SOCK_get_int still used. I
think you should simply remove it.
2) configure.ac, as well as the windows installers still have
references to USE_LIBPQ, USE_GSS and USE_SSPI
3) There are traces of USE_SSPI in connection.c and dlg_wingui.c
4) There are some whitespaces (expected with such a large patch btw)
5) In lobj.c, there are mentions of using lo_lseek64 and lo_tell64. I
am not sure that this is a good thing at this state: those functions
have been added in 9.3 and we still have many users with PG <= 9.2. It
may be better to add a note about updating to those functions when the
minimum support requirement is 9.3.
6) I am getting many regression failures after applying this patch and
running the tests on OSX, please see attached.
Regards,
--
Michael

Вложения

Re: Let's use libpq for everything

От
Heikki Linnakangas
Дата:
On 11/28/2014 09:09 AM, Michael Paquier wrote:
> On Fri, Nov 28, 2014 at 1:14 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> Overall, I think this looks *very* good. I'm actually in favor of committing
>> this fairly soon, in the next week or so, so that the 09.04.0000 version
>> would use libpq-only code. Any objections?
> Not really, that's quite a lot of cleanup related to the socket handling!
>
> Here are some comments:
> 1) statement.c has a NOTUSED block with SOCK_get_int still used. I
> think you should simply remove it.
> 2) configure.ac, as well as the windows installers still have
> references to USE_LIBPQ, USE_GSS and USE_SSPI
> 3) There are traces of USE_SSPI in connection.c and dlg_wingui.c
> 4) There are some whitespaces (expected with such a large patch btw)

Thanks, fixed all of these.

> 5) In lobj.c, there are mentions of using lo_lseek64 and lo_tell64. I
> am not sure that this is a good thing at this state: those functions
> have been added in 9.3 and we still have many users with PG <= 9.2. It
> may be better to add a note about updating to those functions when the
> minimum support requirement is 9.3.

Well, it's just a comment.

> 6) I am getting many regression failures after applying this patch and
> running the tests on OSX, please see attached.

Attached is a new version [1], with a lot of small fixes here and there.
It passes all the regression tests for me. Can you try again with this
version? If it's still failing on OS X, will need to investigate.

[1] Also available at
https://github.com/hlinnaka/psqlodbc/tree/libpq-integration4.

- Heikki


Вложения

Re: Let's use libpq for everything

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 11/28/2014 09:09 AM, Michael Paquier wrote:
>> 5) In lobj.c, there are mentions of using lo_lseek64 and lo_tell64. I
>> am not sure that this is a good thing at this state: those functions
>> have been added in 9.3 and we still have many users with PG <= 9.2. It
>> may be better to add a note about updating to those functions when the
>> minimum support requirement is 9.3.

> Well, it's just a comment.

Not qualified to review this patch in total, but as far as that goes:
libpq provides an inexpensive way to check the server version, so you
could easily have a code path using the newer functions when available.
The big problem is that odbc_lo_lseek/odbc_lo_tell are declared as
taking/returning int4 not int8; unless you can widen those values without
breaking the library ABI, there is zero point in using the newer
server-side functions.

            regards, tom lane


Re: Let's use libpq for everything

От
Michael Paquier
Дата:


On Wed, Dec 10, 2014 at 1:20 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> 6) I am getting many regression failures after applying this patch and
>> running the tests on OSX, please see attached.
>
>
> Attached is a new version [1], with a lot of small fixes here and there. It
> passes all the regression tests for me. Can you try again with this version?
> If it's still failing on OS X, will need to investigate.

With this version regression tests are passing on all my OSX/Linux dev machines. At least I do not see failures directly related to it.

Few comments:
1) Here an error message would be nice:
+               /*
+                * XXX: we don't check the result here. Should we? We're rolling back,
+                * so it's not clear what else we can do on error. Giving an error
+                * message to the application would be nice though.
+                */
+               if (pgres != NULL)
+               {
+                       PQclear(pgres);
+                       pgres = NULL;
+               }
2) Is this planned to be updated after this patch?
+
+       /*
+        * Update our copy of the transaction status.
+        *
+        * XXX: Once we stop using the socket directly, and do everything with
+        * libpq, we can get rid of the transaction_status field altogether
+        * and always ask libpq for it.
+        */
+       LIBPQ_update_transaction_status(self);
3) Did you do some performance tests here?
+       /*
+        * XXX: We need to do Prepare + Describe as two different round-trips
+        * to the server, while without libpq we send a Parse and Describe
+        * message followed by a single Sync.
+        */

Except that this looks good, and numbers speak by themselves:
$ git diff master --stat | tail -n1
 53 files changed, 1699 insertions(+), 8171 deletions(-)

Regards,
--
Michael

Re: Let's use libpq for everything

От
Heikki Linnakangas
Дата:
On 12/10/2014 07:57 AM, Michael Paquier wrote:
> On Wed, Dec 10, 2014 at 1:20 AM, Heikki Linnakangas <hlinnakangas@vmware.com>
> wrote:
>>> 6) I am getting many regression failures after applying this patch and
>>> running the tests on OSX, please see attached.
>>
>>
>> Attached is a new version [1], with a lot of small fixes here and there.
> It
>> passes all the regression tests for me. Can you try again with this
> version?
>> If it's still failing on OS X, will need to investigate.
>
> With this version regression tests are passing on all my OSX/Linux dev
> machines. At least I do not see failures directly related to it.

Great!

> Few comments:
> 1) Here an error message would be nice:
> +               /*
> +                * XXX: we don't check the result here. Should we? We're
> rolling back,
> +                * so it's not clear what else we can do on error. Giving
> an error
> +                * message to the application would be nice though.
> +                */
> +               if (pgres != NULL)
> +               {
> +                       PQclear(pgres);
> +                       pgres = NULL;
> +               }

Yeah. I left it as it is for now, with the XXX comment. (the
corresponding code in master also ignores errors, so this isn't a
regression)

> 2) Is this planned to be updated after this patch?
> +
> +       /*
> +        * Update our copy of the transaction status.
> +        *
> +        * XXX: Once we stop using the socket directly, and do everything
> with
> +        * libpq, we can get rid of the transaction_status field altogether
> +        * and always ask libpq for it.
> +        */
> +       LIBPQ_update_transaction_status(self);

That idea mentioned in the above comment make sense, but I'm not going
to immediately follow up on it. Besides updating the in-trans and
in-error-trans flags in the connection object,
LIBPQ_update_transaction_status also calls CC_on_abort and CC_on_commit
functions when the state changes, so we can't just directly remove
LIBPQ_update_transaction_status.

> 3) Did you do some performance tests here?
> +       /*
> +        * XXX: We need to do Prepare + Describe as two different
> round-trips
> +        * to the server, while without libpq we send a Parse and Describe
> +        * message followed by a single Sync.
> +        */

Not specifically, but I did analyze the number of round-trips performed
by the regression suite earlier.

I've merged this with latest changes in the master branch, and did some
error handling fixes. Latest version is again attached here, and also
available in github
(https://github.com/hlinnaka/psqlodbc/tree/libpq-integration4).

- Heikki

Вложения

Re: Let's use libpq for everything

От
Michael Paquier
Дата:
On Tue, Dec 23, 2014 at 5:53 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I've merged this with latest changes in the master branch, and did some
> error handling fixes. Latest version is again attached here, and also
> available in github
> (https://github.com/hlinnaka/psqlodbc/tree/libpq-integration4).

Some minor comments after scanning the code:
1) The description below should be prepareParametersNoDesc.
+/*
+ * Process the original SQL query.
+ */
+RETCODE        prepareParametersNoDesc(StatementClass *stmt)
+{
+       QueryParse      query_org, *qp;
+       QueryBuild      query_crt, *qb;
+
+inolog("prepareParametersNoSync\n");
2) In psqlodbc.vcproj, you can safely remove the references to USE_LIBPQ.
3) s/Extrace/Extract
ISTM that this code is clean, I haven't found flaws in the new logic
where code is replaced with libpq stuff.
Regards,
--
Michael


Re: Let's use libpq for everything

От
Heikki Linnakangas
Дата:
On 12/24/2014 09:23 AM, Michael Paquier wrote:
> On Tue, Dec 23, 2014 at 5:53 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> I've merged this with latest changes in the master branch, and did some
>> error handling fixes. Latest version is again attached here, and also
>> available in github
>> (https://github.com/hlinnaka/psqlodbc/tree/libpq-integration4).
>
> Some minor comments after scanning the code:
> 1) The description below should be prepareParametersNoDesc.
> +/*
> + * Process the original SQL query.
> + */
> +RETCODE        prepareParametersNoDesc(StatementClass *stmt)
> +{
> +       QueryParse      query_org, *qp;
> +       QueryBuild      query_crt, *qb;
> +
> +inolog("prepareParametersNoSync\n");
> 2) In psqlodbc.vcproj, you can safely remove the references to USE_LIBPQ.
> 3) s/Extrace/Extract
> ISTM that this code is clean, I haven't found flaws in the new logic
> where code is replaced with libpq stuff.

Thanks, fixed those.

Well, I think this is about as much testing and review that we're going
to get for this patch before commit. I have no doubt that there are
bugs, or subtle differences in behaviour that might cause problems for
some, but let's fix them as we bump into them. So, committed.

- Heikki