Re: Disk buffering of resultsets

Поиск

Список

Период

Сортировка

От	Vitalii Tymchyshyn
Тема	Re: Disk buffering of resultsets
Дата	22 октября 2014 г. 00:48:45
Msg-id	CABWW-d1r9yc92jNAV67rokjcwWf-DTN3DA5Q8SeqW0i3+A-_8A@mail.gmail.com обсуждение исходный текст
Ответ на	R: Disk buffering of resultsets (Enrico Olivelli - Diennea <enrico.olivelli@diennea.com>)
Список	pgsql-jdbc

Дерево обсуждения

The disk buffering is enabled by default. Basically it works when:

1) you set fetchSize

2) your resultSet is forward only

3) it's autocommit (otherwise cursors are used as before)

I am not sure you can catch a stack trace, but you should see pgjdbc*.bin in your temp directory and

"Already received N tuples that is over N limit. Swapping remained to tmp file" in driver debug log if enabled. N should be your fetch size.

It will create files in java.io.tmpdir as per File.createTempFile contract.

Best regards, Vitalii Tymchyshyn

2014-10-21 6:47 GMT-04:00 Enrico Olivelli - Diennea <enrico.olivelli@diennea.com>:

Thank you

Is there any option to set on the JDBC connection URL or disk buffering is enabled by default ?
How can I see that this new feature is running (dumping stacktraces?)
Which directory will be used for temporary buffers ? java.io.tmpdir ?

Enrico Olivelli
Software Development Manager @Diennea
Tel.: (+39) 0546 066100 - Int. 925
Viale G.Marconi 30/14 - 48018 Faenza (RA)

MagNews - E-mail Marketing Solutions
http://www.magnews.it
Diennea - Digital Marketing Solutions
http://www.diennea.com

Da: tivv00@gmail.com [mailto:tivv00@gmail.com] Per conto di Vitalii Tymchyshyn
Inviato: martedì 21 ottobre 2014 04:20
A: Enrico Olivelli - Diennea
Cc: Craig Ringer; Tom Lane; John R Pierce; PG-JDBC Mailing List

Oggetto: Re: [JDBC] Disk buffering of resultsets

Hello, all.

Basically, alpha of p.1 is available here: https://github.com/tivv/pgjdbc/tree/offloading
It passes all the tests of cursor-based implementation, but I want to add some more (mostly with multiple open statements).

Best regards, Vitalii Tymchyshyn

2014-10-14 4:09 GMT-04:00 Enrico Olivelli - Diennea <enrico.olivelli@diennea.com>:
Hi,
we can give support doing some benchs with our platform as soon as some ALFA/BETA will be available.
We are longing for this series of improvements!
I hope that these features could be used with 9.3 server and that we won’t need to upgrade to 9.4 (which is not still stable )

Thank you very much

Enrico Olivelli
Software Development Manager @Diennea
Tel.: (+39) 0546 066100 - Int. 925
Viale G.Marconi 30/14 - 48018 Faenza (RA)

MagNews - E-mail Marketing Solutions
http://www.magnews.it
Diennea - Digital Marketing Solutions
http://www.diennea.com

Da: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] Per conto di Vitalii Tymchyshyn
Inviato: lunedì 13 ottobre 2014 17:34
A: Craig Ringer
Cc: Tom Lane; John R Pierce; PG-JDBC Mailing List
Oggetto: Re: [JDBC] Disk buffering of resultsets

Hello, again.

Sorry for the pause, I had a really busy week. Yet it allowed me to think a little more.
As for me, there are three independent goals that can be addressed independently:

1) Prevent OOMs
Unfortunately this can be addressed with out of heap saving only. The way I did in draft would still OOM when secondary query comes.
Note that it's not that unusual. It's usually used without any multithreading to perform a client-side join, e.g. when complicated inheritance scenario is in place or to load some dictionary data without much duplication (e.g. only few wide dictionary entries for the long query), ...
I am still thinking to do it without much parsing (you will need record type and size and that's all, without field parsing) by simply copying as-is to temp file. Pluggable interfaces can be done later if needed.

2) Fast first record
Here we need to introduce strategies for "who is doing copying and when" from (1). I propose pluggable strategies with few predefined (see below). User can pass predefined strategy name or an Executor as a DataSource parameter or static method reference that returns an Executor when a string is needed (e.g. in connection URI). This would also allow to easily point to Executors.* methods. We may think about ScheduledExecutor requirement to also reuse it for QueryTimeout stuff.

I propose to have next predefined strategies:
a) Direct executor, that does all loading at the very beginning, potentially saving to a temp file.
b) Postponed executor, that works much like in my draft: reads as needed without any disk saving. Performs disk saving only when connection is needed for some other statement.
c) JVM-wide Executors.newCachedThreadPool that will start offloading in parallel as fetchSize is reached.

Also I'd propose to set default fetchSize to some reasonable value, like 1000 and specify one of the strategies (e.g (a)) as default so that we won't get OOM on default settings. Or we should allow to set default fetch size on connection/data source level (or both).

3) Fast cancel/resultset close.
It's the only place where switching to portals is needed as far as I can see and it can be done orthogonal to (1) and (2). I don't see any other goal that will benefit from it. To be honest, I am willing to do (1) and (2), but not (3) because this would mean me to get much deeper into the protocol I know almost nothing about right now.

Best regards, Vitalii Tymchyshyn.

Rimani aggiornato sul mondo dell’email marketing e del digital marketing: visita il nostro blog! http://blog.magnews.it

Rimani aggiornato sul mondo dell’email marketing e del digital marketing: visita il nostro blog! http://blog.magnews.it

В списке pgsql-jdbc по дате отправления:

Предыдущее

От: Vitalii Tymchyshyn
Дата: 22 октября 2014 г., 00:44:03
Сообщение: Re: Disk buffering of resultsets

Следующее

От: Vladimir Crnčić (Infoart d.o.o.)
Дата: 22 октября 2014 г., 16:12:20
Сообщение: TIMESTAMP value binary transfer problem

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Disk buffering of resultsets

Предыдущее

Следующее