Обсуждение: Asynchronous queries with bound data.

Поиск
Список
Период
Сортировка

Asynchronous queries with bound data.

От
Вячеслав Блинников
Дата:
Database connection using "libpq":
So, generally speaking, I need:
- send multiple queries using "PQsendQuery()" (or what else)
- bind some data (void*) to each qeury
- obtain results using "PQgetResult()" (or what else) and to know which data is bound to each result (correspondence between queries and results)
How it can be implemented?

Re: Asynchronous queries with bound data.

От
Peter Geoghegan
Дата:
2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:
> Database connection using "libpq":
> So, generally speaking, I need:
> - send multiple queries using "PQsendQuery()" (or what else)
> - bind some data (void*) to each qeury
> - obtain results using "PQgetResult()" (or what else) and to know which data
> is bound to each result (correspondence between queries and results)
> How it can be implemented?

You cannot "bind" data to a query. You cannot execute two queries in
parallel with one connection. However, PGresults exist totally
independently or the connection or thread that originated them, so
feel free to stash them as you see fit.

libpq usually operates within the same thread context as the client
application, so in general it doesn't make sense to pass a function
pointer (which you suggested in a slightly earlier mail to the list)
that will be called back asynchronously like a signal handler.
Asynchronous command processing just exists as a way to keep a GUI
responsive and things like that, because PQExec() blocks.

Maybe it would help if you stepped back and described your problem in
broader terms.

--
Regards,
Peter Geoghegan

Re: Asynchronous queries with bound data.

От
Вячеслав Блинников
Дата:
The whole thing is:
- server connect to the database and wait for incoming connections
- when incoming connection occurs, server request the database for some data about connected client - server must do it asynchronously and without creating any threads just for connected client (there can be more than thousand clients)
- at moment of requesting data from the database there already can be some active requests
- when database respond it is need to know for which client this data received (and what kind of request it was)
That's all.

P.S. it is possible to implement it requesting data one-by-one (adding each request to the queue and popping (FIFO) each request after each database' respond) but it is slow because it's need to wait data transferring through the net while database (server) itself will be idling.

[Did I figured out with some "cc"? Can I remove the commented previous letters in each respond?]

5 января 2011 г. 23:10 пользователь Peter Geoghegan <peter.geoghegan86@gmail.com> написал:
2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:
> Database connection using "libpq":
> So, generally speaking, I need:
> - send multiple queries using "PQsendQuery()" (or what else)
> - bind some data (void*) to each qeury
> - obtain results using "PQgetResult()" (or what else) and to know which data
> is bound to each result (correspondence between queries and results)
> How it can be implemented?

You cannot "bind" data to a query. You cannot execute two queries in
parallel with one connection. However, PGresults exist totally
independently or the connection or thread that originated them, so
feel free to stash them as you see fit.

libpq usually operates within the same thread context as the client
application, so in general it doesn't make sense to pass a function
pointer (which you suggested in a slightly earlier mail to the list)
that will be called back asynchronously like a signal handler.
Asynchronous command processing just exists as a way to keep a GUI
responsive and things like that, because PQExec() blocks.

Maybe it would help if you stepped back and described your problem in
broader terms.

--
Regards,
Peter Geoghegan

Re: Asynchronous queries with bound data.

От
Peter Geoghegan
Дата:
2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:
> The whole thing is:
> - server connect to the database and wait for incoming connections
> - when incoming connection occurs, server request the database for some data
> about connected client - server must do it asynchronously and without
> creating any threads just for connected client (there can be more than
> thousand clients)
> - at moment of requesting data from the database there already can be some
> active requests
> - when database respond it is need to know for which client this data
> received (and what kind of request it was)
> That's all.
>
> P.S. it is possible to implement it requesting data one-by-one (adding each
> request to the queue and popping (FIFO) each request after each database'
> respond) but it is slow because it's need to wait data transferring through
> the net while database (server) itself will be idling.

I'm afraid I don't understand your problem, but may I suggest that you:

1. Wrap the connection in a semaphore. This isn't pretty, but if I've
understood you correctly, it will do the job.

or

2. Have one database connection per "server" client. Having one
process that itself has up to a thousand clients but uses only one PG
connection is a very questionable approach - the single database
connection is certain to become a bottleneck.

and

3. Use a connection pooler.

--
Regards,
Peter Geoghegan

Re: Asynchronous queries with bound data.

От
Вячеслав Блинников
Дата:
1: Didn't figured out what it does mean - can you explain it better?

2: Operation system will refuse me to create thousand threads and, anyway, database will return responds averagely just when all of them will be accomplished.

3: I never close a connection once it was created, so any pool will not help me (I google says right about "connection pool").

I found the expression: "you can't issue another PQsendQuery until you've gotten that NULL." - so if it is true, then it's impossible (because it's only one function available to send an asynchronous request) to send multiple requests before any result will be gotten.

Problem can be observed from this abstract point of view:
Transferring data from application server (which connects to the database) takes 200 ms (and the same amount to transfer backward); adding data to the database and then selecting data (all in one request) from it takes 250 ms, so each database operation (of such type) will take 200 + 250 + 200 = 650 ms. Two such operations will take 650 + 650 = 1300 ms, but if there existed the way to send two queries at once and then get two results at once (of course tracking the correspondence between requests/responds) we could decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900 ms. So, we win 400 ms - when there are thousand requests per several minutes - it bocomes to be a very good time.

6 января 2011 г. 1:07 пользователь Peter Geoghegan <peter.geoghegan86@gmail.com> написал:
2011/1/5 Вячеслав Блинников <slavmfm@gmail.com>:
> The whole thing is:
> - server connect to the database and wait for incoming connections
> - when incoming connection occurs, server request the database for some data
> about connected client - server must do it asynchronously and without
> creating any threads just for connected client (there can be more than
> thousand clients)
> - at moment of requesting data from the database there already can be some
> active requests
> - when database respond it is need to know for which client this data
> received (and what kind of request it was)
> That's all.
>
> P.S. it is possible to implement it requesting data one-by-one (adding each
> request to the queue and popping (FIFO) each request after each database'
> respond) but it is slow because it's need to wait data transferring through
> the net while database (server) itself will be idling.

I'm afraid I don't understand your problem, but may I suggest that you:

1. Wrap the connection in a semaphore. This isn't pretty, but if I've
understood you correctly, it will do the job.

or

2. Have one database connection per "server" client. Having one
process that itself has up to a thousand clients but uses only one PG
connection is a very questionable approach - the single database
connection is certain to become a bottleneck.

and

3. Use a connection pooler.

--
Regards,
Peter Geoghegan

Re: Asynchronous queries with bound data.

От
Peter Geoghegan
Дата:
2011/1/6 Вячеслав Блинников <slavmfm@gmail.com>:
> 1: Didn't figured out what it does mean - can you explain it better?


http://ru.wikipedia.org/wiki/%D0%A1%D0%B5%D0%BC%D0%B0%D1%84%D0%BE%D1%80_(%D0%B8%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%82%D0%B8%D0%BA%D0%B0)

> 2: Operation system will refuse me to create thousand threads and, anyway,
> database will return responds averagely just when all of them will be
> accomplished.

I don't know how I can help you, since you haven't explained the
architecture of your application very well.

> 3: I never close a connection once it was created, so any pool will not help
> me (I google says right about "connection pool").

Maybe you should.

> Problem can be observed from this abstract point of view:
> Transferring data from application server (which connects to the database)
> takes 200 ms (and the same amount to transfer backward); adding data to the
> database and then selecting data (all in one request) from it takes 250 ms,
> so each database operation (of such type) will take 200 + 250 + 200 = 650
> ms. Two such operations will take 650 + 650 = 1300 ms, but if there existed
> the way to send two queries at once and then get two results at once (of
> course tracking the correspondence between requests/responds) we could
> decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900 ms.
> So, we win 400 ms - when there are thousand requests per several minutes -
> it bocomes to be a very good time.

Databases are optimized for throughput, not latency. It isn't in
question that there would be less latency if we could parallelise the
queries. What is in question is:

1. Whether or not it matters.

2. Whether or not that's possible, given the restrictions you insist on.

--
Regards,
Peter Geoghegan

Re: Asynchronous queries with bound data.

От
Вячеслав Блинников
Дата:
> 2: Operation system will refuse me to create thousand threads and, anyway,
> database will return responds averagely just when all of them will be
> accomplished.

I don't know how I can help you, since you haven't explained the
architecture of your application very well.
 
It does not make a point, but whatever: "application erchitecture" is the implementation of distributed computing where multiple clients connect to the server, get it's peace of job, caltulate and returns results back. Server must track an all (generally speaking) job what was done by each client (processors' data, adapter's data, amount of accomplished tasks, calculation time, etc...).
 
> 3: I never close a connection once it was created, so any pool will not help
> me (I google says right about "connection pool").

Maybe you should.
 
No I shouldn't. I am the only one user of the database, so I am very welcome there.
 
> Problem can be observed from this abstract point of view:
> Transferring data from application server (which connects to the database)
> takes 200 ms (and the same amount to transfer backward); adding data to the
> database and then selecting data (all in one request) from it takes 250 ms,
> so each database operation (of such type) will take 200 + 250 + 200 = 650
> ms. Two such operations will take 650 + 650 = 1300 ms, but if there existed
> the way to send two queries at once and then get two results at once (of
> course tracking the correspondence between requests/responds) we could
> decrease such two "dialogs" from 1300 ms to 200 + 250 + 250 + 200 = 900 ms.
> So, we win 400 ms - when there are thousand requests per several minutes -
> it bocomes to be a very good time.

Databases are optimized for throughput, not latency. It isn't in
question that there would be less latency if we could parallelise the
queries

I did not talked (and both requested it) about any kind of some parallelization - I just meant the database' ability to stack incoming requests and then pop them one-by-one - just not refusing sending requests when "there are one already in progress".
And it's not the point to start doing on this direction right now - it will take weeks or months - and I believe that if it is not implemented already - so there was some reasons for it - PostgreSQL exist not for the first day (it's 6 years yonger than me).