Re: Streaming ResultSet rows instead of buffering them in memory

Поиск
Список
Период
Сортировка
От Mikko Tiihonen
Тема Re: Streaming ResultSet rows instead of buffering them in memory
Дата
Msg-id HE1PR0701MB23789296418AB624EDCD9D6888AD0@HE1PR0701MB2378.eurprd07.prod.outlook.com
обсуждение исходный текст
Ответ на Streaming ResultSet rows instead of buffering them in memory  (Mikko Tiihonen <mikko.tiihonen@nitor.com>)
Ответы Re: Streaming ResultSet rows instead of buffering them in memory  (Dave Cramer <davecramer@postgres.rocks>)
Список pgsql-jdbc
I have now finished the PR https://github.com/pgjdbc/pgjdbc/pull/1735 and fixed all the comments from the earlier discussions.

Would you be open to merging this at some point? At least a new review would be appreciated.

-Mikko


From: Mikko Tiihonen <mikko.tiihonen@nitor.com>
Sent: 15 March 2020 14:57
To: pgsql-jdbc@lists.postgresql.org <pgsql-jdbc@lists.postgresql.org>
Subject: Streaming ResultSet rows instead of buffering them in memory
 
Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is safe to do so.
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the protocol parsing variables into a state object and drag it along when processing next row of ResultSet. The effect on applications is that queries have an effective fetch size of 1, without any of the downsides of such setting.

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since the query processing is parallelized. Now we do in sequence:
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its locks as fast. But this only effects large results that do not fit entirely into the tcp buffers.

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many tests this happens in the tearDown phase. I also think that currently things might not work correctly if application triggers queries on the connection while processing the results, for example by requesting metadata. In that case we have no other option than to go back to buffering to clear the connection for reuse.

-Mikko

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Vladimir Sitnikov
Дата:
Сообщение: [pgjdbc/pgjdbc] 6e3be1: chore: add org.postgresql.util.internal.Unsafe for...
Следующее
От: Pavel Bludov
Дата:
Сообщение: [pgjdbc/pgjdbc] 965b80: chore: javadoc changes in PgResultSet.java to pass...