Обсуждение: Streaming ResultSet rows instead of buffering them in memory

Поиск

Список

Период

Сортировка

Streaming ResultSet rows instead of buffering them in memory

От

Mikko Tiihonen

Дата:

15 марта 2020 г., 12:57:45

Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is
safeto do so.
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the
protocolparsing variables into a state object and drag it along when processing next row of ResultSet. The effect on
applicationsis that queries have an effective fetch size of 1, without any of the downsides of such setting.

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since
thequery processing is parallelized. Now we do in sequence:
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its
locksas fast. But this only effects large results that do not fit entirely into the tcp buffers.

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many
teststhis happens in the tearDown phase. I also think that currently things might not work correctly if application
triggersqueries on the connection while processing the results, for example by requesting metadata. In that case we
haveno other option than to go back to buffering to clear the connection for reuse.

-Mikko

Re: Streaming ResultSet rows instead of buffering them in memory

От

Mikko Tiihonen

Дата:

09 января 2021 г., 15:43:52

I have now finished the PR https://github.com/pgjdbc/pgjdbc/pull/1735 and fixed all the comments from the earlier discussions.

Would you be open to merging this at some point? At least a new review would be appreciated.

-Mikko

From: Mikko Tiihonen <mikko.tiihonen@nitor.com>
Sent: 15 March 2020 14:57
To: pgsql-jdbc@lists.postgresql.org <pgsql-jdbc@lists.postgresql.org>
Subject: Streaming ResultSet rows instead of buffering them in memory

Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is safe to do so.
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the protocol parsing variables into a state object and drag it along when processing next row of ResultSet. The effect on applications is that queries have an effective fetch size of 1, without any of the downsides of such setting.

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since the query processing is parallelized. Now we do in sequence:
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its locks as fast. But this only effects large results that do not fit entirely into the tcp buffers.

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many tests this happens in the tearDown phase. I also think that currently things might not work correctly if application triggers queries on the connection while processing the results, for example by requesting metadata. In that case we have no other option than to go back to buffering to clear the connection for reuse.

-Mikko

Re: Streaming ResultSet rows instead of buffering them in memory

От

Dave Cramer

Дата:

12 января 2021 г., 13:51:52

Hi Mikko,

Yes to both. Let me see what I can do

Dave Cramer

www.postgres.rocks

On Sat, 9 Jan 2021 at 12:10, Mikko Tiihonen <mikko.tiihonen@nitor.com> wrote:

I have now finished the PR https://github.com/pgjdbc/pgjdbc/pull/1735 and fixed all the comments from the earlier discussions.

Would you be open to merging this at some point? At least a new review would be appreciated.

-Mikko

From: Mikko Tiihonen <mikko.tiihonen@nitor.com>
Sent: 15 March 2020 14:57
To: pgsql-jdbc@lists.postgresql.org <pgsql-jdbc@lists.postgresql.org>
Subject: Streaming ResultSet rows instead of buffering them in memory

Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is safe to do so.
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the protocol parsing variables into a state object and drag it along when processing next row of ResultSet. The effect on applications is that queries have an effective fetch size of 1, without any of the downsides of such setting.

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since the query processing is parallelized. Now we do in sequence:
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its locks as fast. But this only effects large results that do not fit entirely into the tcp buffers.

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many tests this happens in the tearDown phase. I also think that currently things might not work correctly if application triggers queries on the connection while processing the results, for example by requesting metadata. In that case we have no other option than to go back to buffering to clear the connection for reuse.

-Mikko

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Streaming ResultSet rows instead of buffering them in memory

Streaming ResultSet rows instead of buffering them in memory

Re: Streaming ResultSet rows instead of buffering them in memory

Re: Streaming ResultSet rows instead of buffering them in memory