Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results

Поиск
Список
Период
Сортировка
От Peter T Mount
Тема Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results
Дата
Msg-id Pine.LNX.3.95.980106113350.16836A-100000@maidast
обсуждение исходный текст
Ответ на I want to change libpq and libpgtcl for better handling of large query results  (Constantin Teodorescu <teo@flex.ro>)
Список pgsql-hackers
On Mon, 5 Jan 1998, Constantin Teodorescu wrote:

> I have reached a point in developing PgAccess when I discovered that
> some functions in libpgtcl are bad implemented and even libpq does not
> help me a lot!
>
> What's the problem ! Is about working with large queries result, big
> tables with thousand of records that have to be processed in order to
> get a full report for example.

In the past, I've had a lot of people complaining about the performance
(or lack of) when handling large results in JDBC.

> Getting a query result from Tcl/Tk (pg_select function) uses PQexec.
> But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can
> handle query results.
> But what if table has thousand records ? Probably I would need more than
> 512 Mb of RAM in order to get a report finished.

The only solution I was able to give was for them to use cursors, and
fetch the result in chunks.

> With some small changes in libpq-fe.h
>
>  (  void (* callback)(PGresult *,void *ptr,int stat);
>     void *usr_ptr;)
>
> and also in libpq to allow a newly defined function in libpgtcl
> (pg_loop) to initiate a query and then calling back a user defined
> function after every record fetched from the connection.
>
> In order to do this, the connection is 'cloned' and on this new
> connection the query is issued. For every record fetched, the C callback
> function is called, here the Tcl interpreted is invoked for the source
> inside the loop, then memory used by the record is release and the next
> record is ready to come.

I understand the idea here as I've use this trick before with tcl, but
this could cause a problem with the other languages that we support. I
don't know how this would be done for Perl, but with Java, the JDBC spec
doesn't have this type of callback.

Some time back (around v6.0), I did look at having a seperate thread on
the client, that read the results in the background, and the foreground
thread would then get the results almost immediately. It would only wait,
if it had read everything transfered so far, and (as JDBC cannot go back a
row in a ResultSet), the read rows are freed once used.

Although the idea was sound, in practice, it didn't work well. Not every
JVM implemented threading in the same way, so it locked up a lot. In the
end, the idea was dropped.

> More than that, after processing some records, user can choose to break
> the loop (using break command in Tcl) that is actually breaking the
> connection.

What side effects could this have to the backend if the second connection
is broken. I think the existing code would simply terminate.

> What we achieve making this patches ?
>
> First of all the ability of sequential processing large tables.
> Then increasing performance due to parallel execution of receiving data
> on the network and local processing. The backend process on the server
> is filling the communication channel with data and the local task is
> processing it as it comes.
> In the old version, the local task has to wait until *all* data has
> comed (buffered in memory if it was room enough) and then processing it.

> What I would ask from you?
> 1) First of all, if my needs could be satisfied in other way with
> current functions in libpq of libpgtcl. I can assure you that with
> current libpgtcl is rather impossible. I am not sure if there is another
> mechanism using some subtle functions that I didn't know about them.

We were talking about some changes to the protocol. Perhaps, we could do
something like changing it so it sends the result in blocks of tuples,
rather than everything in one block. Then, in between each packet, an ACK
or CAN style packet could be sent to the backend, either asking for the
next, or canceling the results.

Another alternative is (as an option definable by the client at run time)
to have results open another connection on a per-result basis (aka FTP).
However, I can see a performance hit with the overhead involved in opening
a new connection every time.

Also, I can see a possible problem:-

Say, a client has executed a query, which returns a large number of rows.
We have read in the first 100 rows. The backend still has the majority of
the result queued up behind it.

Now in JDBC, we have getAsciiStream/getBinaryStream/getUnicodeStream which
are the standard way of getting at BLOBS.

If one of the columns is a blob, and the client tries to read from the
blob, it will fail, because we are not in the main loop in the backend
(were still transfering a result, and BLOBS use fastpath).

There are ways around this, but things could get messy if were not
careful.

> 3) Is there any other normal mode to tell to the backend not to send any
> more tuples instead of breaking the connection ?

Apart from using cursors, not that I know of.

> 4) Even working in C, using PQexec , it's impossible to handle large
> query results, am I true ?

Memory is the only limitation to this.

> Please cc to : mviorel@flex.ro      and also     teo@flex.ro

Done..

It would be interesting to see what the others think. Both TCL & Java are
close relatives, and Sun are working on a TCL extension to Java, so any
changes could (in the future) help both of us.

--
Peter T Mount  petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter T Mount
Дата:
Сообщение: Large objects fixed
Следующее
От: "Thomas G. Lockhart"
Дата:
Сообщение: Re: [HACKERS] Postgres acl