Обсуждение: execute same query only one time?

Поиск
Список
Период
Сортировка

execute same query only one time?

От
Johannes
Дата:
Hi,

is there a best practice to share data between two select statements?

Imaging following situation: I want to receive two result sets from two
tables, referring to a specific id from table t0 AND I try not to query
for that specific id a second time.

Table t0 returns 1 row and table t1 returns multiple rows.

begin;
select id, col1, col2, ... from t0 where id = (select max(id) from t0
where col1 = value1 and col2 = value2 and ...);
select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
value1 and col2 = value2 and ...);
commit;

Best regards Johannes


Вложения

Re: execute same query only one time?

От
"David G. Johnston"
Дата:
On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
Hi,

is there a best practice to share data between two select statements?

Imaging following situation: I want to receive two result sets from two
tables, referring to a specific id from table t0 AND I try not to query
for that specific id a second time.

Table t0 returns 1 row and table t1 returns multiple rows.

begin;
select id, col1, col2, ... from t0 where id = (select max(id) from t0
where col1 = value1 and col2 = value2 and ...);
select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
value1 and col2 = value2 and ...);
commit;

Please confirm:​

​You want the​ result of "SELECT max(id) FROM t0" to be used in the second query without having to recompute it?

What client are you using to execute these statements?

Dave

Re: execute same query only one time?

От
travis@traviswellman.com
Дата:
Not an expert, but I would try a temporary unlogged table.

Sent from my android device.

-----Original Message-----
From: Johannes <jotpe@posteo.de>
To: pgsql-general@postgresql.org
Sent: Mon, 08 Feb 2016 11:07
Subject: [GENERAL] execute same query only one time?

Hi,

is there a best practice to share data between two select statements?

Imaging following situation: I want to receive two result sets from two
tables, referring to a specific id from table t0 AND I try not to query
for that specific id a second time.

Table t0 returns 1 row and table t1 returns multiple rows.

begin;
select id, col1, col2, ... from t0 where id = (select max(id) from t0
where col1 = value1 and col2 = value2 and ...);
select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
value1 and col2 = value2 and ...);
commit;

Best regards Johannes

Re: execute same query only one time?

От
Adrian Klaver
Дата:
On 02/08/2016 11:05 AM, Johannes wrote:
> Hi,
>
> is there a best practice to share data between two select statements?

A join:

http://www.postgresql.org/docs/9.4/interactive/sql-select.html

Search for:

join_type
>
> Imaging following situation: I want to receive two result sets from two
> tables, referring to a specific id from table t0 AND I try not to query
> for that specific id a second time.

>
> Table t0 returns 1 row and table t1 returns multiple rows.
>
> begin;
> select id, col1, col2, ... from t0 where id = (select max(id) from t0
> where col1 = value1 and col2 = value2 and ...);
> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
> value1 and col2 = value2 and ...);
> commit;

Based on rough guess of the above, without seeing actual table schemas:

select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id =
t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2
= value2 and ...);



>
> Best regards Johannes
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Johannes <jotpe@posteo.de> wrote:
> Hi,
>
> is there a best practice to share data between two select statements?
>
> Imaging following situation: I want to receive two result sets from two
> tables, referring to a specific id from table t0 AND I try not to query
> for that specific id a second time.
>
> Table t0 returns 1 row and table t1 returns multiple rows.
>
> begin;
> select id, col1, col2, ... from t0 where id = (select max(id) from t0
> where col1 = value1 and col2 = value2 and ...);
> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
> value1 and col2 = value2 and ...);
> commit;
>
> Best regards Johannes

Yes. You can use temporary autodeleting tables[1] for that. Similar to:

BEGIN;

CREATE TEMPORARY TABLE temptable(id int) ON COMMIT DROP;

INSERT INTO temptable
SELECT max(id)
FROM t0
WHERE col1 = value1 and col2 = value2 and ...;

SELECT id, col1, col2, ... FROM t0 INNER NATURAL JOIN temptable;

SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id);

COMMIT;


[1]http://www.postgresql.org/docs/9.5/static/sql-createtable.html
--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 02/08/2016 11:05 AM, Johannes wrote:
>> Imaging following situation: I want to receive two result sets from two
>> tables, referring to a specific id from table t0 AND I try not to query
>> for that specific id a second time.
>
>> Table t0 returns 1 row and table t1 returns multiple rows.
>>
>> begin;
>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>> where col1 = value1 and col2 = value2 and ...);
>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>> value1 and col2 = value2 and ...);
>> commit;
>>
>> Best regards Johannes
>
> Based on rough guess of the above, without seeing actual table schemas:
>
> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id =
> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2
> = value2 and ...);

I don't think it is a good solution because it leads to copying
columns from the t0 which is wasting net traffic and increasing
complexity at the client side. Moreover it works iff t0 returns only
one row.

>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com


--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, travis@traviswellman.com <travis@traviswellman.com> wrote:
> Not an expert, but I would try a temporary unlogged table.

Note: temporary tables are always unlogged.

Please,
1. Don't top post.
2. Use "Reply to all" to be sure an author of an original letter gets
your answer even if he hasn't subscribed to the list.

> -----Original Message-----
> From: Johannes <jotpe@posteo.de>
> To: pgsql-general@postgresql.org
> Sent: Mon, 08 Feb 2016 11:07
> Subject: [GENERAL] execute same query only one time?
>
> Hi,
>
> is there a best practice to share data between two select statements?
>
> Imaging following situation: I want to receive two result sets from two
> tables, referring to a specific id from table t0 AND I try not to query
> for that specific id a second time.
>
> Table t0 returns 1 row and table t1 returns multiple rows.
>
> begin;
> select id, col1, col2, ... from t0 where id = (select max(id) from t0
> where col1 = value1 and col2 = value2 and ...);
> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
> value1 and col2 = value2 and ...);
> commit;
>
> Best regards Johannes

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Johannes
Дата:
Am 08.02.2016 um 20:15 schrieb David G. Johnston:
> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
>
>> Hi,
>>
>> is there a best practice to share data between two select statements?
>>
>> Imaging following situation: I want to receive two result sets from two
>> tables, referring to a specific id from table t0 AND I try not to query
>> for that specific id a second time.
>>
>> Table t0 returns 1 row and table t1 returns multiple rows.
>>
>> begin;
>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>> where col1 = value1 and col2 = value2 and ...);
>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>> value1 and col2 = value2 and ...);
>> commit;
>
>
> Please confirm:​
>
> ​You want the​ result of "SELECT max(id) FROM t0" to be used in the second
> query without having to recompute it?

Yes.

> What client are you using to execute these statements?

JDBC. I execute both statements at once and iterate through the resultsets.

Johannes


Вложения

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Johannes <jotpe@posteo.de> wrote:
> Am 08.02.2016 um 20:15 schrieb David G. Johnston:
>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
>>
>>> Hi,
>>>
>>> is there a best practice to share data between two select statements?
>>>
>>> Imaging following situation: I want to receive two result sets from two
>>> tables, referring to a specific id from table t0 AND I try not to query
>>> for that specific id a second time.
>>>
>>> Table t0 returns 1 row and table t1 returns multiple rows.
>>>
>>> begin;
>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>>> where col1 = value1 and col2 = value2 and ...);
>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>>> value1 and col2 = value2 and ...);
>>> commit;
>>
>>
>> Please confirm:​
>>
>> ​You want the​ result of "SELECT max(id) FROM t0" to be used in the
>> second
>> query without having to recompute it?
>
> Yes.
>
>> What client are you using to execute these statements?
>
> JDBC. I execute both statements at once and iterate through the resultsets.
>
> Johannes

Hmm. Could you clarify why you don't want to pass id from the first
query to the second one:

select col1 from t1 where t0_id = value_id_from_the_first_query

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Johannes
Дата:

Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy:
> On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> On 02/08/2016 11:05 AM, Johannes wrote:
>>> Imaging following situation: I want to receive two result sets from two
>>> tables, referring to a specific id from table t0 AND I try not to query
>>> for that specific id a second time.
>>
>>> Table t0 returns 1 row and table t1 returns multiple rows.
>>>
>>> begin;
>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>>> where col1 = value1 and col2 = value2 and ...);
>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>>> value1 and col2 = value2 and ...);
>>> commit;
>>>
>>> Best regards Johannes
>>
>> Based on rough guess of the above, without seeing actual table schemas:
>>
>> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id =
>> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2
>> = value2 and ...);
>
> I don't think it is a good solution because it leads to copying
> columns from the t0 which is wasting net traffic and increasing
> complexity at the client side. Moreover it works iff t0 returns only
> one row.

I had same doubts.
CTE would be first class, if it was be reusable for other statements.

Johannes


Вложения

Re: execute same query only one time?

От
Johannes
Дата:

Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy:
> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>> Am 08.02.2016 um 20:15 schrieb David G. Johnston:
>>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> is there a best practice to share data between two select statements?
>>>>
>>>> Imaging following situation: I want to receive two result sets from two
>>>> tables, referring to a specific id from table t0 AND I try not to query
>>>> for that specific id a second time.
>>>>
>>>> Table t0 returns 1 row and table t1 returns multiple rows.
>>>>
>>>> begin;
>>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>>>> where col1 = value1 and col2 = value2 and ...);
>>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>>>> value1 and col2 = value2 and ...);
>>>> commit;
>>>
>>>
>>> Please confirm:​
>>>
>>> ​You want the​ result of "SELECT max(id) FROM t0" to be used in the
>>> second
>>> query without having to recompute it?
>>
>> Yes.
>>
>>> What client are you using to execute these statements?
>>
>> JDBC. I execute both statements at once and iterate through the resultsets.
>>
>> Johannes
>
> Hmm. Could you clarify why you don't want to pass id from the first
> query to the second one:
>
> select col1 from t1 where t0_id = value_id_from_the_first_query
>

Of course I could do that, but in that case I would not ask.

I thougt there could be a better solution to execute all statements at
once. Saving roundtrips, increase speed, a more sophistacted solution,
learn something new...

Johannes


Вложения

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>
> Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy:
>> On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> Based on rough guess of the above, without seeing actual table schemas:
>>>
>>> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id =
>>> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2
>>> = value2 and ...);
>>
>> I don't think it is a good solution because it leads to copying
>> columns from the t0 which is wasting net traffic and increasing
>> complexity at the client side. Moreover it works iff t0 returns only
>> one row.
>
> I had same doubts.
> CTE would be first class, if it was be reusable for other statements.
>
> Johannes

CTEs are temporary tables for a _statement_ for using a single
statement instead of several ones (create temp table, insert into,
select from it, select from it, drop temp table).

But it is not your case because CTEs are for a queries which return a
single set of rows. Your case is returning two sets (one row with
several columns from t0 and several rows with a single columns from
t1).

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Johannes
Дата:

Am 08.02.2016 um 21:33 schrieb Vitaly Burovoy:
> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>
>> Am 08.02.2016 um 20:32 schrieb Vitaly Burovoy:
>>> On 2/8/16, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>>> Based on rough guess of the above, without seeing actual table schemas:
>>>>
>>>> select id, t0.col1, t1.col1, col2, ... from t0 JOIN t1 ON t0.id =
>>>> t1.t0_id where id = (select max(id) from t0 where col1 = value1 and col2
>>>> = value2 and ...);
>>>
>>> I don't think it is a good solution because it leads to copying
>>> columns from the t0 which is wasting net traffic and increasing
>>> complexity at the client side. Moreover it works iff t0 returns only
>>> one row.
>>
>> I had same doubts.
>> CTE would be first class, if it was be reusable for other statements.
>>
>> Johannes
>
> CTEs are temporary tables for a _statement_ for using a single
> statement instead of several ones (create temp table, insert into,
> select from it, select from it, drop temp table).
>
> But it is not your case because CTEs are for a queries which return a
> single set of rows. Your case is returning two sets (one row with
> several columns from t0 and several rows with a single columns from
> t1).

Sure.
Thanks for the temporary table example!

Johannes



Вложения

Re: execute same query only one time?

От
Alban Hertroys
Дата:
> On 08 Feb 2016, at 20:05, Johannes <jotpe@posteo.de> wrote:
>
> select id, col1, col2, ... from t0 where id = (select max(id) from t0
> where col1 = value1 and col2 = value2 and …);


> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
> value1 and col2 = value2 and …);

select t0.id, t0.col1, t0.col2, t0…., t1.col1
from t0
join t1 on (t1.t0_id = t0.id)
group by t0.id, t0.col1, t0.col2, t0…., t1.col1
having t0.id = max(t0.id);

Low complexity and works with any number of rows from t0 (as does Adrian's solution, btw).
I'm not sure what you mean by "copying of columns" in your reply to Adrian's solution, but I don't think that happens
here.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Johannes <jotpe@posteo.de> wrote:
> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy:
>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>> Am 08.02.2016 um 20:15 schrieb David G. Johnston:
>>>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> is there a best practice to share data between two select statements?
>>>>>
>>>>> Imaging following situation: I want to receive two result sets from
>>>>> two
>>>>> tables, referring to a specific id from table t0 AND I try not to
>>>>> query
>>>>> for that specific id a second time.
>>>>>
>>>>> Table t0 returns 1 row and table t1 returns multiple rows.
>>>>>
>>>>> begin;
>>>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>>>>> where col1 = value1 and col2 = value2 and ...);
>>>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>>>>> value1 and col2 = value2 and ...);
>>>>> commit;
>>
>> Hmm. Could you clarify why you don't want to pass id from the first
>> query to the second one:
>>
>> select col1 from t1 where t0_id = value_id_from_the_first_query
>>
>
> Of course I could do that, but in that case I would not ask.
>
> I thougt there could be a better solution to execute all statements at
> once.

What the reason to execute all statements which return different
columns at once?

> Saving roundtrips,

In most cases they are not so big. Getting a bunch of duplicated data
is wasting you network bandwidth and don't increase speed.

> increase speed,

Speed will be at least the same. In your case either you have to use
more DDL (like CREATE TEMP TABLE) or get copied columns that leads
more time to encode/decode and send it via network.

> a more sophisticated solution,

It usually depends on a task. Your case is simple enough and can't
lead any sophisticated solution. =(

> learn something new...

It makes sense. =)

> Johannes
--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Alban Hertroys <haramrae@gmail.com> wrote:
>
>> On 08 Feb 2016, at 20:05, Johannes <jotpe@posteo.de> wrote:
>>
>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>> where col1 = value1 and col2 = value2 and …);
>>
>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>> value1 and col2 = value2 and …);
>
> select t0.id, t0.col1, t0.col2, t0…., t1.col1
> from t0
> join t1 on (t1.t0_id = t0.id)
> group by t0.id, t0.col1, t0.col2, t0…., t1.col1
> having t0.id = max(t0.id);
>
> Low complexity and works with any number of rows from t0 (as does Adrian's
> solution, btw).

I think it fully ruins speed at all. Try to create tables, insert at
least 100000 rows into each of them (note that cardinality between
them is 1:m) and see EXPLAIN of your query. You are joining two big
tables, sort and group a resulting table and remove most rows to fit
into one statement...

> I'm not sure what you mean by "copying of columns" in your reply to Adrian's
> solution, but I don't think that happens here.

In the original letter the first query returns one row: "(id, col1,
col2)", and the second one returns rows "(val1), (val2), (val3), ..."
(values of the t1.col1).

If you use joining, you get rows:

(id, col1, col2, val1)
(id, col1, col2, val2)
(id, col1, col2, val3)
...

where values of the first three columns are the same.

>
> Alban Hertroys

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Johannes
Дата:

Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy:
> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy:
>>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>>> Am 08.02.2016 um 20:15 schrieb David G. Johnston:
>>>>> On Mon, Feb 8, 2016 at 12:05 PM, Johannes <jotpe@posteo.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> is there a best practice to share data between two select statements?
>>>>>>
>>>>>> Imaging following situation: I want to receive two result sets from
>>>>>> two
>>>>>> tables, referring to a specific id from table t0 AND I try not to
>>>>>> query
>>>>>> for that specific id a second time.
>>>>>>
>>>>>> Table t0 returns 1 row and table t1 returns multiple rows.
>>>>>>
>>>>>> begin;
>>>>>> select id, col1, col2, ... from t0 where id = (select max(id) from t0
>>>>>> where col1 = value1 and col2 = value2 and ...);
>>>>>> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
>>>>>> value1 and col2 = value2 and ...);
>>>>>> commit;
>>>
>>> Hmm. Could you clarify why you don't want to pass id from the first
>>> query to the second one:
>>>
>>> select col1 from t1 where t0_id = value_id_from_the_first_query
>>>
>>
>> Of course I could do that, but in that case I would not ask.
>>
>> I thougt there could be a better solution to execute all statements at
>> once.
>
> What the reason to execute all statements which return different
> columns at once?
>
>> Saving roundtrips,
>
> In most cases they are not so big. Getting a bunch of duplicated data
> is wasting you network bandwidth and don't increase speed.

In my and your example no duplicated data (result sets) is send over the
network. The server do not need to wait until the client snips out the
id and sends it id in the next query again. So the server can compute
the result set without external dependencies as fast as possible.

>> increase speed,
>
> Speed will be at least the same. In your case either you have to use
> more DDL (like CREATE TEMP TABLE) or get copied columns that leads
> more time to encode/decode and send it via network.

The time difference is small, yes.
My old variant with executing the first select, remember the returned id
value and paste it into the second query and execute it takes 32ms.

Your temp table variant need 29ms. Nice to see. That are 10% speed
improvement.

>> a more sophisticated solution,
>
> It usually depends on a task. Your case is simple enough and can't
> lead any sophisticated solution. =(
>

No problem.

>> learn something new...
>
> It makes sense. =)
>
>> Johannes

Good night.


Вложения

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/8/16, Johannes <jotpe@posteo.de> wrote:
> Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy:
>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy:
>>>> Hmm. Could you clarify why you don't want to pass id from the first
>>>> query to the second one:
>>>>
>>>> select col1 from t1 where t0_id = value_id_from_the_first_query
>>>
>>> Of course I could do that, but in that case I would not ask.
>>>
>>> I thougt there could be a better solution to execute all statements at
>>> once.
>>
>> What the reason to execute all statements which return different
>> columns at once?
>>
>>> Saving roundtrips,
>>
>> In most cases they are not so big. Getting a bunch of duplicated data
>> is wasting you network bandwidth and don't increase speed.
>
> In my and your example no duplicated data (result sets) is send over the
> network. The server do not need to wait until the client snips out the
> id and sends it id in the next query again. So the server can compute
> the result set without external dependencies as fast as possible.

We are talking about executing all statements at once to save RTT. Are we?

And a parallel thread has advice to join tables (queries). It is a way
to run both queries at once, but it is not a solution.

>>> increase speed,
>>
>> Speed will be at least the same. In your case either you have to use
>> more DDL (like CREATE TEMP TABLE) or get copied columns that leads
>> more time to encode/decode and send it via network.
>
> The time difference is small, yes.
> My old variant with executing the first select, remember the returned id
> value and paste it into the second query and execute it takes 32ms.
>
> Your temp table variant need 29ms. Nice to see. That are 10% speed
> improvement.

I guess you measure it by your app. It is just a measurement error.
+-3ms can be a sum of TCP packet loss, system interrupts, system timer
inaccuracy, multiple cache missing, different layers (you are using
Java, it has a VM and a lot of intermediate abstraction layers).

Remember, my version has 6 statements each of them requires some work
at PG's side, plus my version has two joins which usually slower than
direct search by a value. Your version has only 4 statements and the
only one slow place -- "where" clause in the second select which can
be replaced by a value founded in the first select (your version sends
more data: value1, value2, ...).

You also can avoid "begin" and "commit" since default transaction
isolation is "READ COMMITTED"[1]:
> Also note that two successive SELECT commands can see different data,
> even though they are within a single transaction, if other transactions commit
> changes after the first SELECT starts and before the second SELECT starts.

If you want to measure time, run both versions 10000 times in 8
connections simultaneously and compare results. ;-)

32ms * 10k requests / 8 threads = 40000ms = 40sec

[1]http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED
--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Harald Fuchs
Дата:
Johannes <jotpe@posteo.de> writes:

>> What the reason to execute all statements which return different
>> columns at once?
>>
>>> Saving roundtrips,
>>
>> In most cases they are not so big. Getting a bunch of duplicated data
>> is wasting you network bandwidth and don't increase speed.
>
> In my and your example no duplicated data (result sets) is send over the
> network. The server do not need to wait until the client snips out the
> id and sends it id in the next query again. So the server can compute
> the result set without external dependencies as fast as possible.

Sounds like what you're really after is a stored procedure, isn't it?

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/9/16, Harald Fuchs <hari.fuchs@gmail.com> wrote:
> Johannes <jotpe@posteo.de> writes:
>
>>> What the reason to execute all statements which return different
>>> columns at once?
>>>
>>>> Saving roundtrips,
>>>
>>> In most cases they are not so big. Getting a bunch of duplicated data
>>> is wasting you network bandwidth and don't increase speed.
>>
>> In my and your example no duplicated data (result sets) is send over the
>> network. The server do not need to wait until the client snips out the
>> id and sends it id in the next query again. So the server can compute
>> the result set without external dependencies as fast as possible.
>
> Sounds like what you're really after is a stored procedure, isn't it?

Unfortunately, his case is different, because he needs to get two
different set of rows that is impossible even with stored procedures.

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
"David G. Johnston"
Дата:
On Tuesday, February 9, 2016, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
On 2/9/16, Harald Fuchs <hari.fuchs@gmail.com> wrote:
> Johannes <jotpe@posteo.de> writes:
>
>>> What the reason to execute all statements which return different
>>> columns at once?
>>>
>>>> Saving roundtrips,
>>>
>>> In most cases they are not so big. Getting a bunch of duplicated data
>>> is wasting you network bandwidth and don't increase speed.
>>
>> In my and your example no duplicated data (result sets) is send over the
>> network. The server do not need to wait until the client snips out the
>> id and sends it id in the next query again. So the server can compute
>> the result set without external dependencies as fast as possible.
>
> Sounds like what you're really after is a stored procedure, isn't it?

Unfortunately, his case is different, because he needs to get two
different set of rows that is impossible even with stored procedures.


Correct, though it might be workable to use cursors in this situation.  Not exactly sure how, though...

David J. 

Re: execute same query only one time?

От
Marc Mamin
Дата:
>>>> Hi,
>>>>
>>>> is there a best practice to share data between two select statements?

Hi,
I didn't check the whole thread so forgive me if this was already proposed,
but maybe you could do something like:

create temp table result2 (...)

query_1:
WITH cte as (select ..),
tmp as ( INSERT INTO result2 select ...  from cte),
SELECT ... from cte;

query_2:
select * from result2;

regards,

Marc Mamin

Re: execute same query only one time?

От
"David G. Johnston"
Дата:
On Tue, Feb 9, 2016 at 12:16 PM, Marc Mamin <M.Mamin@intershop.de> wrote:

>>>> Hi,
>>>>
>>>> is there a best practice to share data between two select statements?

Hi,
I didn't check the whole thread so forgive me if this was already proposed,
but maybe you could do something like:

create temp table result2 (...)

query_1:
WITH cte as (select ..),
tmp as ( INSERT INTO result2 select ...  from cte),
SELECT ... from cte;

query_2:
select * from result2;

​It was, more or less.  I'm not sure you buy much using an updating CTE in lieu of a dedicated statement populating the temporary table.  It seems a bit more confusing to comprehend and the performance benefit has to be marginal given we expect to only insert a single row into the temp table.

David J.
 

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/9/16, Marc Mamin <M.Mamin@intershop.de> wrote:
>
>>>>> Hi,
>>>>>
>>>>> is there a best practice to share data between two select statements?
>
> Hi,
> I didn't check the whole thread

Try it[1]. The thread is not so long (21 letters before yours) and it worth it.

> so forgive me if this was already proposed,
> but maybe you could do something like:
>
> create temp table result2 (...)
>
> query_1:
> WITH cte as (select ..),
> tmp as ( INSERT INTO result2 select ...  from cte),
> SELECT ... from cte;
>
> query_2:
> select * from result2;

There is a mistake here: query2 returns the same result as the query_1.

>
> regards,
> Marc Mamin

It is similar to the fourth answer[2] in the thread.

If you are able to create temporary table with all fields of the first
result only for avoiding one statement, it can be rewritten without
CTE and one INNER JOIN (five statements):

BEGIN;

CREATE TEMPORARY TABLE temptable(id ..., col1 ..., col2 ..., ...) ON
COMMIT DROP;

INSERT INTO temptable
SELECT id, col1, col2, ...
FROM t0
WHERE col1 = value1 and col2 = value2 and ...
RETURNING *;

SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id);

COMMIT;

===
If it is hard to detect (or just lazy to write) what types temporary
table has, you can rewrite it as (also five statements):

BEGIN;

CREATE TEMPORARY TABLE temptable ON COMMIT DROP AS
SELECT id, col1, col2, ...
FROM t0
WHERE id = (
  SELECT max(id)
  FROM t0
  WHERE col1 = value1 and col2 = value2 and ...
);

SELECT * FROM temptable;

SELECT col1 FROM t1 INNER JOIN temptable ON (t0_id = temptable.id);

COMMIT;

===
But it is not so useful. In the case in original letter the best way
is to use only two queries and pass id from the result of the first
query as an argument to the second query. See the other letter[3] in
the thread.

Temporary tables are useful for keeping a lot of rows to prevent
copying them between client and server.

[1]http://www.postgresql.org/message-id/flat/56B8E6F9.9070600@posteo.de
[2]http://www.postgresql.org/message-id/CAKOSWNkRB0kfWHcw9CvOcRmkS8HuzrPVFLr0kKcjuYn6juK5NA@mail.gmail.com
[3]http://www.postgresql.org/message-id/CAKOSWN=wX9Myw4q9mpiqESyizU4tWrGivgbw+3Ef=5Q60viFSg@mail.gmail.com

--
Best regards,
Vitaly Burovoy


Re: execute same query only one time?

От
Johannes
Дата:

Am 09.02.2016 um 00:56 schrieb Vitaly Burovoy:
> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>> Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy:
>>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>>> Am 08.02.2016 um 21:17 schrieb Vitaly Burovoy:
>>>>> Hmm. Could you clarify why you don't want to pass id from the first
>>>>> query to the second one:
>>>>>
>>>>> select col1 from t1 where t0_id = value_id_from_the_first_query
>>>>
>>>> Of course I could do that, but in that case I would not ask.
>>>>
>>>> I thougt there could be a better solution to execute all statements at
>>>> once.
>>>
>>> What the reason to execute all statements which return different
>>> columns at once?
>>>
>>>> Saving roundtrips,
>>>
>>> In most cases they are not so big. Getting a bunch of duplicated data
>>> is wasting you network bandwidth and don't increase speed.
>>
>> In my and your example no duplicated data (result sets) is send over the
>> network. The server do not need to wait until the client snips out the
>> id and sends it id in the next query again. So the server can compute
>> the result set without external dependencies as fast as possible.
>
> We are talking about executing all statements at once to save RTT. Are we?

Yes, we do.

> And a parallel thread has advice to join tables (queries). It is a way
> to run both queries at once, but it is not a solution.

Right.

>>>> increase speed,
>>>
>>> Speed will be at least the same. In your case either you have to use
>>> more DDL (like CREATE TEMP TABLE) or get copied columns that leads
>>> more time to encode/decode and send it via network.
>>
>> The time difference is small, yes.
>> My old variant with executing the first select, remember the returned id
>> value and paste it into the second query and execute it takes 32ms.
>>
>> Your temp table variant need 29ms. Nice to see. That are 10% speed
>> improvement.
>
> I guess you measure it by your app. It is just a measurement error.
> +-3ms can be a sum of TCP packet loss, system interrupts, system timer
> inaccuracy, multiple cache missing, different layers (you are using
> Java, it has a VM and a lot of intermediate abstraction layers).

I know all these facts. I run it a "only" a few times, single threaded.
With high resolution. The rounded result of 3ms was reliable and thats
exact enough for my rule of thumb.
Your temp table variant looks more elegant, and is sightly faster. I'm
fine with that.

> Remember, my version has 6 statements each of them requires some work
> at PG's side, plus my version has two joins which usually slower than
> direct search by a value. Your version has only 4 statements and the
> only one slow place -- "where" clause in the second select which can
> be replaced by a value founded in the first select (your version sends
> more data: value1, value2, ...).

Anyway, it is faster ;)

> You also can avoid "begin" and "commit" since default transaction
> isolation is "READ COMMITTED"[1]:
>> Also note that two successive SELECT commands can see different data,
>> even though they are within a single transaction, if other transactions commit
>> changes after the first SELECT starts and before the second SELECT starts.

I know. I did not tell that I run my queries in with repeatable read
isolation.
And I read it is wise to bundle multiple queries in an transaction,
because the overhead of multiple transaction can be avoid to one.

> If you want to measure time, run both versions 10000 times in 8
> connections simultaneously and compare results. ;-)
>
> 32ms * 10k requests / 8 threads = 40000ms = 40sec

Thats more complicated, I think I learned enough about it. But thanks.

Ciao Johannes


Вложения

Re: execute same query only one time?

От
Vitaly Burovoy
Дата:
On 2/9/16, Johannes <jotpe@posteo.de> wrote:
> Am 09.02.2016 um 00:56 schrieb Vitaly Burovoy:
>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>> Am 08.02.2016 um 21:50 schrieb Vitaly Burovoy:
>>>> On 2/8/16, Johannes <jotpe@posteo.de> wrote:
>>>>> increase speed,
>>>>
>>>> Speed will be at least the same. In your case either you have to use
>>>> more DDL (like CREATE TEMP TABLE) or get copied columns that leads
>>>> more time to encode/decode and send it via network.
>>>
>>> The time difference is small, yes.
>>> My old variant with executing the first select, remember the returned id
>>> value and paste it into the second query and execute it takes 32ms.
>>>
>>> Your temp table variant need 29ms. Nice to see. That are 10% speed
>>> improvement.
>>
>> I guess you measure it by your app. It is just a measurement error.
>> +-3ms can be a sum of TCP packet loss, system interrupts, system timer
>> inaccuracy, multiple cache missing, different layers (you are using
>> Java, it has a VM and a lot of intermediate abstraction layers).
>
> I know all these facts. I run it a "only" a few times, single threaded.
> With high resolution. The rounded result of 3ms was reliable and thats
> exact enough for my rule of thumb.
> Your temp table variant looks more elegant, and is sightly faster. I'm
> fine with that.
>
>> Remember, my version has 6 statements each of them requires some work
>> at PG's side, plus my version has two joins which usually slower than
>> direct search by a value. Your version has only 4 statements and the
>> only one slow place -- "where" clause in the second select which can
>> be replaced by a value founded in the first select (your version sends
>> more data: value1, value2, ...).
>
> Anyway, it is faster ;)

I can't believe it. I insist it is a measurement error.

>> You also can avoid "begin" and "commit" since default transaction
>> isolation is "READ COMMITTED"[1]:
>>> Also note that two successive SELECT commands can see different data,
>>> even though they are within a single transaction, if other transactions
>>> commit
>>> changes after the first SELECT starts and before the second SELECT
>>> starts.
>
> I know. I did not tell that I run my queries in with repeatable read
> isolation.
> And I read it is wise to bundle multiple queries in an transaction,
> because the overhead of multiple transaction can be avoid to one.

It is not true for transactions with SELECTs only. It is wise to join
multiple queries in one transaction when you do multiple _changes_ in
a DB, because after each change DB must save it into WAL file and
increase shared "Transaction ID sequence".

Also as in your example, transactions with isolation SERIALIZABLE and
REPEATABLE READ are used to do (even RO) queries to the DB in a
consistent state. But RO queries don't become faster.

>> If you want to measure time, run both versions 10000 times in 8
>> connections simultaneously and compare results. ;-)
>>
>> 32ms * 10k requests / 8 threads = 40000ms = 40sec
>
> Thats more complicated, I think I learned enough about it. But thanks.

Real world is complex. 40 seconds per test is not long.
If you have to save only one param for the other query creating a temp
table is heavy enough.

I'm waiting for a result of 10k requests test.
--
Best regards,
Vitaly Burovoy