Re: Streaming ResultSet rows instead of buffering them in memory

Поиск
Список
Период
Сортировка
От Dave Cramer
Тема Re: Streaming ResultSet rows instead of buffering them in memory
Дата
Msg-id CADK3HHJzr85hKSXnp5y6mqs-NgiSaYtdXL3F+OmgfUdypJvM9w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Streaming ResultSet rows instead of buffering them in memory  (Mikko Tiihonen <mikko.tiihonen@nitor.com>)
Список pgsql-jdbc
Hi Mikko,

Yes to both. Let me see what I can do

Dave Cramer
www.postgres.rocks


On Sat, 9 Jan 2021 at 12:10, Mikko Tiihonen <mikko.tiihonen@nitor.com> wrote:
I have now finished the PR https://github.com/pgjdbc/pgjdbc/pull/1735 and fixed all the comments from the earlier discussions.

Would you be open to merging this at some point? At least a new review would be appreciated.

-Mikko


From: Mikko Tiihonen <mikko.tiihonen@nitor.com>
Sent: 15 March 2020 14:57
To: pgsql-jdbc@lists.postgresql.org <pgsql-jdbc@lists.postgresql.org>
Subject: Streaming ResultSet rows instead of buffering them in memory
 
Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is safe to do so.
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the protocol parsing variables into a state object and drag it along when processing next row of ResultSet. The effect on applications is that queries have an effective fetch size of 1, without any of the downsides of such setting.

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since the query processing is parallelized. Now we do in sequence:
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its locks as fast. But this only effects large results that do not fit entirely into the tcp buffers.

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many tests this happens in the tearDown phase. I also think that currently things might not work correctly if application triggers queries on the connection while processing the results, for example by requesting metadata. In that case we have no other option than to go back to buffering to clear the connection for reuse.

-Mikko

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Sehrope Sarkuni
Дата:
Сообщение: [pgjdbc/pgjdbc] df4b6a: test: Add tests for broken PGCopyInputStream.readF...
Следующее
От: Brett Okken
Дата:
Сообщение: [pgjdbc/pgjdbc] ff6c82: fix: potential overflow when reading cache sizes (...