Обсуждение: Largely inconsistent query execution speed, involving psql_tmp

Поиск

Список

Период

Сортировка

Largely inconsistent query execution speed, involving psql_tmp

От

Spiros Ioannou

Дата:

08 июля 2014 г., 09:48:12

While executing the following query through psql :

SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time

there are two behaviors observed by postgresql (8.4):

1) Either the query performs lots of reads on the database and completes in about 4 hours (that is the normal-expected behavior)

2) Either the query starts filling-up pgsql_tmp and this causes large write I/O on the server, and the query never actually completes on a reasonable time (we stop it after 10h).

For some strange reason, behaviour 2 is always observed when running psql through a bash script, while behavior 1 is only observed while running psql interactively from command line (but not always).

explain:

# explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time;

QUERY PLAN

----------------------------------------------------------------------------------------------

Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103)

Hash Cond: (me.measurement_source_id = mt.measurement_source_id)

Join Filter: (me.measurement_time > mt.last_update_time)

-> Seq Scan on measurement_events me (cost=0.00..234251772.85 rows=8876789085 width=103)

-> Hash (cost=5733.57..5733.57 rows=350257 width=24)

-> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57 rows=350257 width=24)

(6 rows)

We have tried so far fiddling with work_mem up to 512M - no difference.

Any suggestions?

Thanks for any help,

-Spiros Ioannou

inaccess

Re: Largely inconsistent query execution speed, involving psql_tmp

От

Andy Colson

Дата:

08 июля 2014 г., 15:11:00

On 7/8/2014 4:47 AM, Spiros Ioannou wrote:
> While executing the following query through psql :
>
> SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON
> me.measurement_source_id=mt.measurement_source_id WHERE
>   measurement_time > last_update_time
>
> there are two behaviors observed by postgresql (8.4):
> 1) Either the query performs lots of reads on the database and completes
> in about 4 hours (that is the normal-expected behavior)
> 2) Either the query starts filling-up pgsql_tmp and this causes large
> write I/O on the server, and the query never actually completes on a
> reasonable time (we stop it after 10h).
>
> For some strange reason, behaviour 2 is always observed when running
> psql through a bash script, while behavior 1 is only observed while
> running psql interactively from command line (but not always).
>
> explain:
> # explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps
> mt ON me.measurement_source_id=mt.measurement_source_id WHERE
>   measurement_time > last_update_time;
>                                            QUERY PLAN
> ----------------------------------------------------------------------------------------------
>   Hash Join  (cost=10111.78..422893652.69 rows=2958929695 width=103)
>     Hash Cond: (me.measurement_source_id = mt.measurement_source_id)
>     Join Filter: (me.measurement_time > mt.last_update_time)
>     ->  Seq Scan on measurement_events me  (cost=0.00..234251772.85
> rows=8876789085 width=103)
>     ->  Hash  (cost=5733.57..5733.57 rows=350257 width=24)
>           ->  Seq Scan on msrcs_timestamps mt  (cost=0.00..5733.57
> rows=350257 width=24)
> (6 rows)
>
>
> We have tried so far fiddling with work_mem up to 512M - no difference.
> Any suggestions?
>
>
>
> Thanks for any help,
> -Spiros Ioannou
> inaccess
>

Is there any reason you don't have an index?

One, or both, of these will help:

create index measurement_events_pk on
measurement_events(measurement_source_id);

create index msrcs_timestamps_pk on msrcs_timestamps(measurement_source_id);



measurement_events has 8 billion rows, so expect it to take a while, but
its a one time cost and should _dramatically_ increase your query
performance.

-Andy

Re: Largely inconsistent query execution speed, involving psql_tmp

От

Jeff Janes

Дата:

08 июля 2014 г., 16:21:32

On Tue, Jul 8, 2014 at 2:47 AM, Spiros Ioannou <sivann@inaccess.com> wrote:

While executing the following query through psql :

SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time

there are two behaviors observed by postgresql (8.4):
1) Either the query performs lots of reads on the database and completes in about 4 hours (that is the normal-expected behavior)
2) Either the query starts filling-up pgsql_tmp and this causes large write I/O on the server, and the query never actually completes on a reasonable time (we stop it after 10h).

For some strange reason, behaviour 2 is always observed when running psql through a bash script, while behavior 1 is only observed while running psql interactively from command line (but not always).

explain:
# explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time;
QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103)
Hash Cond: (me.measurement_source_id = mt.measurement_source_id)
Join Filter: (me.measurement_time > mt.last_update_time)
-> Seq Scan on measurement_events me (cost=0.00..234251772.85 rows=8876789085 width=103)
-> Hash (cost=5733.57..5733.57 rows=350257 width=24)
-> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57 rows=350257 width=24)
(6 rows)

Is this plan from a situation where it would probably take 4 hours, or from the situation where it would probably fail to complete in 10 hours?

Cheers,

Jeff

Re: Largely inconsistent query execution speed, involving psql_tmp

От

Spiros Ioannou

Дата:

09 июля 2014 г., 11:27:48

There are indices:

table:measurement_events "measurement_events_pkey" PRIMARY KEY, btree (measurement_source_id, measurement_time)

table:measurement_sources "measurement_sources_pkey" PRIMARY KEY, btree (measurement_source_id)

-Spiros

On 8 July 2014 18:10, Andy Colson <andy@squeakycode.net> wrote:

On 7/8/2014 4:47 AM, Spiros Ioannou wrote:
While executing the following query through psql :

SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON
me.measurement_source_id=mt.measurement_source_id WHERE
measurement_time > last_update_time

there are two behaviors observed by postgresql (8.4):
1) Either the query performs lots of reads on the database and completes
in about 4 hours (that is the normal-expected behavior)
2) Either the query starts filling-up pgsql_tmp and this causes large
write I/O on the server, and the query never actually completes on a
reasonable time (we stop it after 10h).

For some strange reason, behaviour 2 is always observed when running
psql through a bash script, while behavior 1 is only observed while
running psql interactively from command line (but not always).

explain:
# explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps
mt ON me.measurement_source_id=mt.measurement_source_id WHERE
measurement_time > last_update_time;
QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103)
Hash Cond: (me.measurement_source_id = mt.measurement_source_id)
Join Filter: (me.measurement_time > mt.last_update_time)
-> Seq Scan on measurement_events me (cost=0.00..234251772.85
rows=8876789085 width=103)
-> Hash (cost=5733.57..5733.57 rows=350257 width=24)
-> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57
rows=350257 width=24)
(6 rows)

We have tried so far fiddling with work_mem up to 512M - no difference.
Any suggestions?

Thanks for any help,
-Spiros Ioannou
inaccess

Is there any reason you don't have an index?

One, or both, of these will help:

create index measurement_events_pk on measurement_events(measurement_source_id);

create index msrcs_timestamps_pk on msrcs_timestamps(measurement_source_id);

measurement_events has 8 billion rows, so expect it to take a while, but its a one time cost and should _dramatically_ increase your query performance.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Largely inconsistent query execution speed, involving psql_tmp

Largely inconsistent query execution speed, involving psql_tmp

Re: Largely inconsistent query execution speed, involving psql_tmp

Re: Largely inconsistent query execution speed, involving psql_tmp

Re: Largely inconsistent query execution speed, involving psql_tmp