Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Поиск
Список
Период
Сортировка
От Vitalii Tymchyshyn
Тема Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Дата
Msg-id CABWW-d3xywucp_rCnTOe4aoUknL0TqmLB1ucDf7v1Khniynt+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Correct use of cursors for very large result sets inPostgres  (John Gorman <jgorman@eldocomp.com>)
Ответы Re: [PERFORM] Correct use of cursors for very large result sets inPostgres
Список pgsql-performance
For JDBC there are certain prerequisites for setFetchSize to work, e.g. using forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman@eldocomp.com> пише:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

 

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

 

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.

 

Regards

John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

 

if you want the whole query result at once, why are you bothering with a cursor?

 

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

 

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

 

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

 

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

 

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Wes Vaske (wvaske)"
Дата:
Сообщение: Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madamRAID10)
Следующее
От: John Gorman
Дата:
Сообщение: Re: [PERFORM] Correct use of cursors for very large result sets inPostgres