Re: Disk buffering of resultsets

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: Disk buffering of resultsets
Дата	4 октября 2014 г. 14:57:10
Msg-id	54300AA6.3040309@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: Disk buffering of resultsets (Vitalii Tymchyshyn <vit@tym.im>)
Ответы	Re: Disk buffering of resultsets Re: Disk buffering of resultsets
Список	pgsql-jdbc

Дерево обсуждения

On 09/23/2014 09:39 AM, Vitalii Tymchyshyn wrote:
> Some more thoughts:
> 1) is it really not possible to stream results in autocommit mode? The
> only problem I can see is that connection can't be used (e.g. by other
> statement objects) until after all the results are fetched. So what? In
> many cases it's OK.

It is possible to stream results in autocommit mode.

You can't do so without an explicit transaction using SQL-level cursors
(DECLARE and FETCH) unless you use "WITH HOLD" cursors, which have their
own problems. But you can do so at the protocol level, so long as you
don't send another query until you've consumed all the results you want
or don't require any more results.

For details, see the Execute message documentation in the v3 protocol,
particularly the notes about the row-count limit. Basically, you send
repeated Execute messages to fetch chunks of rows, until you don't get
any more rows or don't want any more.

Here's something I wrote on the topic recently, which is accurate to the
best of my knowledge:

http://stackoverflow.com/a/25907922/398670

> 2) another alternative to temp files are direct buffers. They are not
> exactly the same, but are not counted towards heap and can be swapped
> out by system if needed.

"Direct buffer" appears to referer to a Java NIO ByteBuffer. Right?

They are indeed an alternative, something more like BigMemory.

The number of options around are exactly why I think this needs to be a
simple interface that you can plug implementations into. Start with a
tempfile interface, but allow for future needs.

> 3) For any disk saving I'd
> - postpone it as long as possible (e.g. until after connection is used
> by another statement)

You can't really do that. You could fetch and flush results just before
issuing another statement, as part of the execute call, but not
afterwards. I think that's what you meant, though.

This could lead to surprises where the execution of the next statement
throws an error that's really from the prior statement, though. It also
makes no sense, as when you execute a new statement, the resultset of
the prior statement is automatically closed.

So your app would have to have already fetched the whole resultset.

If you're talking about adding support for resultsets that aren't
auto-closed when the next statement is run, that's a different matter
entirely to what we've discussed so far. It might be useful, but it's
not the same as just spilling big results to disk.

> - do as little as possible (may be by saving frames from network as is
> with little decoding)

That really won't work well. We have to read the stream and parse the
messages. Otherwise we'll fail to see asynchronous notifications, error
messages, etc.

Not only that, PgJDBC doesn't have access to "network frames" anyway.
It's working with a stream-oriented socket. The lowest level it can
possibly work with is the byte stream.

> - do it in background (a little conflicting to postponing, but more
> thinking is needed). Return first fetchSize rows and start copying
> network to disk in background thread.

You don't want to go there. It's horribly complicated to work with
background threads portably in the JDBC driver. For an example, see the
recent discussion of Timer handling.

Not only that, it's also pointless because there's only one TCP
connection to work with. So while the "background" thread is saving the
resultset, everything else that wants to use the connection has to wait
anyway.

> Anyway one needs to go through everything to issue next command, does
> not he?

Yes, per the protocol documentation.

> The question is if all of this needs to be saved or simply
> skipped (as soon as result set is closed).

If we're fetching chunks of the resultset progressively, we can just
close the resultset and close the server-side portal.

I'm pretty sure this is already possible in PgJDBC (when you set a fetch
size) though I'd need to write a test case and do some code reading to
be totally sure.

I think you might be trying to solve a different problem to what Denis
is talking about.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-jdbc по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Disk buffering of resultsets