Обсуждение: Tips/advice for implementing integrated RESTful HTTP API
On 8/31/14 12:40 AM, Dobes Vandermeer wrote: > The background workers can apparently only connect to a single database > at a time, but I want to expose all the databases via the API. I think the term "background worker" should be taken as a hint that "foreground protocol endpoint" was not one of the envisioned use cases.I think this sort of thing should be done as a separateprocess in front of the postmaster.
Em domingo, 31 de agosto de 2014, Peter Eisentraut <peter_e@gmx.net> escreveu:
On 8/31/14 12:40 AM, Dobes Vandermeer wrote:
> The background workers can apparently only connect to a single database
> at a time, but I want to expose all the databases via the API.
I think the term "background worker" should be taken as a hint that
"foreground protocol endpoint" was not one of the envisioned use cases.
I think this sort of thing should be done as a separate process in
front of the postmaster.
--
Consultoria/Coaching PostgreSQL
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
On 08/31/2014 12:40 PM, Dobes Vandermeer wrote: > 1. Connecting to multiple databases > > The background workers can apparently only connect to a single database > at a time, but I want to expose all the databases via the API. bgworkers are assigned a database at launch time (if SPI is enabled), and this database may not change during the worker's lifetime, same as a normal backend. Sometimes frustrating, but that's how it is. > I think I could use libpq to connect to PostgreSQL on localhost but this > might have weird side-effects in terms of authentication, pid use, stuff > like that. If you're going to do that, why use a bgworker at all? In general, what do you gain from trying to do this within the database server its self, not as an app in front of the DB? > I could probably manage a pool of dynamic workers (as of 9.4), one per > user/database combination or something along those lines. Even one per > request? Is there some kind of IPC system in place to help shuttle the > requests and responses between dynamic workers? Or do I need to come up > with my own? The dynamic shmem code apparently has some queuing functionality. I haven't used it yet. > It seems like PostgreSQL itself has a way to shuttle requests out to > workers, is it possible to tap into that system instead? Basically some > way to send the requests to a PostgreSQL backend from the background worker? It does? It's not the SPI, that executes work directly within the bgworker, making it behave like a normal backend for the purpose of query execution. > Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL > itself and do it in a more integrated/destructive manner? Or just write a front-end. The problem you'd have attempting to modify PostgreSQL its self for this is that connection dispatch occurs via the postmaster, which is a single-threaded process that already needs to do a bit of work to keep an eye on how things are running. You don't want it constantly busy processing and dispatching millions of tiny HTTP requests. It can't just hand a connection off to a back-end immediately after accepting it, either; it'd have to read the HTTP headers to determine what database to connect to. Then launch a new backend for the connection, which is horribly inefficient when doing tiny short-lived connections. The postmaster has no concept of a pool of backends (unfortunately, IMO) to re-use. I imagine (it's not something I've investigated, really) that you'd want a connection accepter process that watched the listening http request socket. It'd hand connections off to dispatcher processes that read the message content to get the target DB and dispatch the request to a worker backend for the appropriate user/db combo, then collect the results and return them on the connection. Hopefully at this point you're thinking "that sounds a lot like a connection pool"... because it is. An awfully complicated one, probably, as you'd have to manage everything using shared memory segments and latches. In my view it's unwise to try to do this in the DB with PostgreSQL's architecture. Hack PgBouncer or PgPool to do what you want. Or write a server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in connection pool facilities - you won't *believe* how easy it is. > 3. Parallelism > > The regular PostgreSQL server can run many queries in parallel Well, one PostgreSQL instance (postmaster) may have many backends, each of which may run queries in series but not in parallel. Any given process may only run one query at once. > but it > seems like if I am using SPI I could only run one query at a time - it's > not an asynchronous API. Correct. > Any help, sage advice, tips, and suggestions how to move forward in > these areas would be muchly appreciated! Don't do it with bgworkers. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hmm yes I am learning that the BG worker system isn't as helpful as I had hoped due to the single database restriction.
As for a writing a frontend this might be the best solution.
A java frontend would be easy but pointless because the whole point here is to provide a lightweight access method to the database for environments that don't have the ability to use the jdbc or libpq libraries. Deploying a java setup would be too much trouble.
I do see now that PG uses one worker per connection rather than a worker pool as I had thought before. So there's nothing already in there to help me dispatch requests and making my own worker pool that distributes requests using sockets wouldn't be any better than connecting back using libpq.
A C frontend using libevent would be easy enough to make and deploy for this I guess.
But... Maybe nobody really wants this thing anyway, there seem to be some other options out there already.
Thanks for the feedback.
On 08/31/2014 12:40 PM, Dobes Vandermeer wrote:
> 1. Connecting to multiple databases
>
> The background workers can apparently only connect to a single database
> at a time, but I want to expose all the databases via the API.
bgworkers are assigned a database at launch time (if SPI is enabled),
and this database may not change during the worker's lifetime, same as a
normal backend.
Sometimes frustrating, but that's how it is.
> I think I could use libpq to connect to PostgreSQL on localhost but this
> might have weird side-effects in terms of authentication, pid use, stuff
> like that.
If you're going to do that, why use a bgworker at all?
In general, what do you gain from trying to do this within the database
server its self, not as an app in front of the DB?
> I could probably manage a pool of dynamic workers (as of 9.4), one per
> user/database combination or something along those lines. Even one per
> request? Is there some kind of IPC system in place to help shuttle the
> requests and responses between dynamic workers? Or do I need to come up
> with my own?
The dynamic shmem code apparently has some queuing functionality. I
haven't used it yet.
> It seems like PostgreSQL itself has a way to shuttle requests out to
> workers, is it possible to tap into that system instead? Basically some
> way to send the requests to a PostgreSQL backend from the background worker?
It does?
It's not the SPI, that executes work directly within the bgworker,
making it behave like a normal backend for the purpose of query execution.
> Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
> itself and do it in a more integrated/destructive manner?
Or just write a front-end.
The problem you'd have attempting to modify PostgreSQL its self for this
is that connection dispatch occurs via the postmaster, which is a
single-threaded process that already needs to do a bit of work to keep
an eye on how things are running. You don't want it constantly busy
processing and dispatching millions of tiny HTTP requests. It can't just
hand a connection off to a back-end immediately after accepting it,
either; it'd have to read the HTTP headers to determine what database to
connect to. Then launch a new backend for the connection, which is
horribly inefficient when doing tiny short-lived connections. The
postmaster has no concept of a pool of backends (unfortunately, IMO) to
re-use.
I imagine (it's not something I've investigated, really) that you'd want
a connection accepter process that watched the listening http request
socket. It'd hand connections off to dispatcher processes that read the
message content to get the target DB and dispatch the request to a
worker backend for the appropriate user/db combo, then collect the
results and return them on the connection. Hopefully at this point
you're thinking "that sounds a lot like a connection pool"... because it
is. An awfully complicated one, probably, as you'd have to manage
everything using shared memory segments and latches.
In my view it's unwise to try to do this in the DB with PostgreSQL's
architecture. Hack PgBouncer or PgPool to do what you want. Or write a
server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in
connection pool facilities - you won't *believe* how easy it is.
> 3. Parallelism
>
> The regular PostgreSQL server can run many queries in parallel
Well, one PostgreSQL instance (postmaster) may have many backends, each
of which may run queries in series but not in parallel. Any given
process may only run one query at once.
> but it
> seems like if I am using SPI I could only run one query at a time - it's
> not an asynchronous API.
Correct.
> Any help, sage advice, tips, and suggestions how to move forward in
> these areas would be muchly appreciated!
Don't do it with bgworkers.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 09/02/2014 12:50 AM, Dobes Vandermeer wrote: > Hmm yes I am learning that the BG worker system isn't as helpful as I > had hoped due to the single database restriction. > > As for a writing a frontend this might be the best solution. > > A java frontend would be easy but pointless because the whole point here > is to provide a lightweight access method to the database for > environments that don't have the ability to use the jdbc or libpq > libraries. Deploying a java setup would be too much trouble. If you can't run libpq, you can't run *anything* really, it's very lightweight. I think you misunderstood what I was saying; I'm talking about it acting as a proxy for HTTP-based requests, running on or in front of the PostgreSQL server like a server-side connection pool would. You can have: [client machine] [ PostgreSQL server machine ] client --- HTTP --- Tomcat/JAX-RS -- pgjdbc -- PostgreSQL and despite what people often say, a .war deployed on jetty or tomcat isn't exactly heavyweight. Same idea as PgBouncer or PgPool. The advantage over hacking PgBouncer/PgPool for the job is that Tomcat can already do a lot of what you want using built-in, pre-existing functionality. Connection pool management, low level REST-style HTTP processing, JSON handling etc are all done for you. > I do see now that PG uses one worker per connection rather than a worker > pool as I had thought before. So there's nothing already in there to > help me dispatch requests and making my own worker pool that distributes > requests using sockets wouldn't be any better than connecting back using > libpq. Yep. > A C frontend using libevent would be easy enough to make and deploy for > this I guess. > > But... Maybe nobody really wants this thing anyway, there seem to be > some other options out there already. It's something I think would be interesting to have, but IMO to be really useful it'd need to support composing object graphs as json, a json query format, etc. So you can say "get me this customer with all their addresses and contact records" without having to issue a bunch of queries (round trips) or use ORM-style left-join-and-deduplicate hacks that waste bandwidth and are messy and annoying. Just sending chunks of SQL and getting back arrays of tuples isn't very interesting or, IMO, hugely useful for the kind of apps that want to avoid JDBC/libpq. Especially webapps. Close care to security and auth would also need to be taken. You don't want to be sending a username/password with each request; you need a reasonable authentication token system, request signing to prevent replay attacks, idempotent requests, etc. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/02/2014 12:50 AM, Dobes Vandermeer wrote:If you can't run libpq, you can't run *anything* really, it's very
> Hmm yes I am learning that the BG worker system isn't as helpful as I
> had hoped due to the single database restriction.
>
> As for a writing a frontend this might be the best solution.
>
> A java frontend would be easy but pointless because the whole point here
> is to provide a lightweight access method to the database for
> environments that don't have the ability to use the jdbc or libpq
> libraries. Deploying a java setup would be too much trouble.
lightweight. I think you misunderstood what I was saying; I'm talking
about it acting as a proxy for HTTP-based requests, running on or in
front of the PostgreSQL server like a server-side connection pool would.
Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling etc are
all done for you.
> A C frontend using libevent would be easy enough to make and deploy for> this I guess.It's something I think would be interesting to have, but IMO to be
>
> But... Maybe nobody really wants this thing anyway, there seem to be
> some other options out there already.
really useful it'd need to support composing object graphs as json, a
json query format, etc. So you can say "get me this customer with all
their addresses and contact records" without having to issue a bunch of
queries (round trips) or use ORM-style left-join-and-deduplicate hacks
that waste bandwidth and are messy and annoying.
Close care to security and auth would also need to be taken. You don't
want to be sending a username/password with each request; you need a
reasonable authentication token system, request signing to prevent
replay attacks, idempotent requests, etc.
Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling etc are
all done for you.Yeah, those are nice conveniences but I still think installing Java and getting something to run on startup is a bit more of a hurdle. Better maek life easier up front by having a simple standalone proxy you can compile and run with just whatever is already available on a typical AWS ubuntu environment.
If instead of Tomcat you use Jetty, you can embed the whole app+Jetty+dependencies in a single executable JAR, which easies deployment a lot. Installing a JVM in a Ubuntu environment is just one apt-get and even easier if you use CloudFormation for automation. I don't think is a bad choice at all... you get most of the functionality you want already there, as Craig said, and it's lightweight.
Hope it helps,
Álvaro
On 02/09/14 04:47, Dobes Vandermeer wrote:Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling etc are
all done for you.Yeah, those are nice conveniences but I still think installing Java and getting something to run on startup is a bit more of a hurdle. Better maek life easier up front by having a simple standalone proxy you can compile and run with just whatever is already available on a typical AWS ubuntu environment.
If instead of Tomcat you use Jetty, you can embed the whole app+Jetty+dependencies in a single executable JAR, which easies deployment a lot. Installing a JVM in a Ubuntu environment is just one apt-get and even easier if you use CloudFormation for automation. I don't think is a bad choice at all... you get most of the functionality you want already there, as Craig said, and it's lightweight.
Hope it helps,
Álvaro