Обсуждение: Slow query with big tables

Поиск

Список

Период

Сортировка

Slow query with big tables

От

Tommi Kaksonen

Дата:

24 августа 2016 г., 14:35:33

Hello,

I have the following tables and query. I would like to get some help to find out why it is slow and how its performance could be improved.

Thanks,

Tommi K.

--Table definitions---

CREATE TABLE "Measurement"

(

id bigserial NOT NULL,

product_id bigserial NOT NULL,

nominal_data_id bigserial NOT NULL,

description text,

serial text,

measurement_time timestamp without time zone,

status smallint,

system_description text,

CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),

CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)

REFERENCES "Nominal_data" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION,

CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)

REFERENCES "Product" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

CREATE INDEX measurement_time_index

ON "Measurement"

USING btree

(measurement_time);

ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

CREATE TABLE "Product"

(

id bigserial NOT NULL,

name text,

description text,

info text,

system_name text,

CONSTRAINT "Product_pkey" PRIMARY KEY (id)

)

WITH (

OIDS=FALSE

);

CREATE TABLE "Extra_info"

(

id bigserial NOT NULL,

measurement_id bigserial NOT NULL,

name text,

description text,

info text,

type text,

value_string text,

value_double double precision,

value_integer bigint,

value_bool boolean,

CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),

CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)

REFERENCES "Measurement" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

CREATE INDEX extra_info_measurement_id_index

ON "Extra_info"

USING btree

(measurement_id);

CREATE TABLE "Feature"

(

id bigserial NOT NULL,

measurement_id bigserial NOT NULL,

name text,

description text,

info text,

CONSTRAINT "Feature_pkey" PRIMARY KEY (id),

CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)

REFERENCES "Measurement" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

CREATE INDEX feature_measurement_id_and_name_index

ON "Feature"

USING btree

(measurement_id, name COLLATE pg_catalog."default");

CREATE INDEX feature_measurement_id_index

ON "Feature"

USING hash

(measurement_id);

CREATE TABLE "Point"

(

id bigserial NOT NULL,

feature_id bigserial NOT NULL,

x double precision,

y double precision,

z double precision,

status_x smallint,

status_y smallint,

status_z smallint,

difference_x double precision,

difference_y double precision,

difference_z double precision,

CONSTRAINT "Point_pkey" PRIMARY KEY (id),

CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)

REFERENCES "Feature" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

CREATE INDEX point_feature_id_index

ON "Point"

USING btree

(feature_id);

CREATE TABLE "Warning"

(

id bigserial NOT NULL,

feature_id bigserial NOT NULL,

"number" smallint,

info text,

CONSTRAINT "Warning_pkey" PRIMARY KEY (id),

CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)

REFERENCES "Feature" (id) MATCH SIMPLE

ON UPDATE NO ACTION ON DELETE NO ACTION

)

WITH (

OIDS=FALSE

);

CREATE INDEX warning_feature_id_index

ON "Warning"

USING btree

(feature_id);

---Query---

SELECT

f.name,

f.description,

SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,

SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,

SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS yellow_count,

SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS yellow_warned_count,

SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,

SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,

SUM(CASE WHEN (p.status_x = 1000 OR p.status_y = 1000 OR p.status_z = 1000) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS unable_to_measure_count

FROM "Point" p

JOIN "Feature" f ON f.id = p.feature_id

JOIN "Measurement" measurement ON measurement.id = f.measurement_id

JOIN "Product" product ON product.id = measurement.product_id

LEFT JOIN "Warning" warning ON f.id = warning.feature_id

WHERE (product.name ILIKE 'Part 1') AND

measurement.measurement_start_time >= '2015-06-18 17:00:00' AND

measurement.measurement_start_time <= '2015-06-18 18:00:00' AND

measurement.id NOT IN(SELECT measurement_id FROM "Extra_info" e

WHERE e.measurement_id = measurement.id AND e.description = 'Clamp' AND e.value_string ILIKE 'Clamped%')

GROUP BY f.name, f.description;

---Explain Analyze---

GroupAggregate (cost=1336999.08..1337569.18 rows=5562 width=33) (actual time=6223.622..6272.321 rows=255 loops=1)

Buffers: shared hit=263552 read=996, temp read=119 written=119

-> Sort (cost=1336999.08..1337012.98 rows=5562 width=33) (actual time=6223.262..6231.106 rows=26265 loops=1)

Sort Key: f.name, f.description

Sort Method: external merge Disk: 936kB

Buffers: shared hit=263552 read=996, temp read=119 written=119

-> Nested Loop Left Join (cost=0.00..1336653.08 rows=5562 width=33) (actual time=55.792..6128.875 rows=26265 loops=1)

Buffers: shared hit=263552 read=996

-> Nested Loop (cost=0.00..1220487.17 rows=5562 width=33) (actual time=55.773..5910.852 rows=26265 loops=1)

Buffers: shared hit=182401 read=954

-> Nested Loop (cost=0.00..22593.53 rows=8272 width=27) (actual time=30.980..3252.869 rows=38831 loops=1)

Buffers: shared hit=972 read=528

-> Nested Loop (cost=0.00..657.24 rows=22 width=8) (actual time=0.102..109.577 rows=103 loops=1)

Join Filter: (measurement.product_id = product.id)

Rows Removed by Join Filter: 18

Buffers: shared hit=484 read=9

-> Seq Scan on "Product" product (cost=0.00..1.04 rows=1 width=8) (actual time=0.010..0.019 rows=1 loops=1)

Filter: (name ~~* 'Part 1'::text)

Rows Removed by Filter: 2

Buffers: shared hit=1

-> Index Scan using measurement_start_time_index on "Measurement" measurement (cost=0.00..655.37 rows=67 width=16) (actual time=0.042..109.416 rows=121 loops=1)

Index Cond: ((measurement_start_time >= '2015-06-18 17:00:00'::timestamp without time zone) AND (measurement_start_time <= '2015-06-18 18:00:00'::timestamp without time zone))

Filter: (NOT (SubPlan 1))

Buffers: shared hit=483 read=9

SubPlan 1

-> Index Scan using extra_info_measurement_id_index on "Extra_info" e (cost=0.00..9.66 rows=1 width=8) (actual time=0.900..0.900 rows=0 loops=121)

Index Cond: (measurement_id = measurement.id)

Filter: ((value_string ~~* 'Clamped%'::text) AND (description = 'Clamp'::text))

Rows Removed by Filter: 2

Buffers: shared hit=479 read=7

-> Index Scan using feature_measurement_id_and_name_index on "Feature" rf (cost=0.00..993.40 rows=370 width=35) (actual time=28.152..30.407 rows=377 loops=103)

Index Cond: (measurement_id = measurement.id)

Buffers: shared hit=488 read=519

-> Index Scan using point_feature_id_index on "Point" p (cost=0.00..144.80 rows=1 width=14) (actual time=0.067..0.067 rows=1 loops=38831)

Index Cond: (feature_id = f.id)

Buffers: shared hit=181429 read=426

-> Index Scan using warning_feature_id_index on "Warning" warning (cost=0.00..20.88 rows=1 width=16) (actual time=0.007..0.007 rows=0 loops=26265)

Index Cond: (f.id = feature_id)

Buffers: shared hit=81151 read=42

Total runtime: 6273.312 ms

---Version---

PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

---Table sizes---

Extra_info 1223400 rows

Feature 185436000 rows

Measurement 500000 rows

Point 124681000 rows

Warning 11766800 rows

---Hardware---

Intel Core i5-2320 CPU 3.00GHz (4 CPUs)

6GB Memory

64-bit Operating System (Windows 7 Professional)

WD Blue 500GB HDD - 7200 RPM SATA 6 Gb/s 16MB Cache

---History---

Query gets slower as more data is added to the database

---Maintenance---

Autovacuum is used with default settings

Re: Slow query with big tables

От

Andreas Kretschmer

Дата:

25 августа 2016 г., 13:36:27

Tommi Kaksonen <t2nn2t@gmail.com> wrote:

> ---Version---
> PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

current point release for 9.2 is 9.2.18, you are some years behind.

The plan seems okay for me, apart from the on-disk sort: increase
work_mem to avoid that.

If i where you i would switch to PG 9.5 - or wait for 9.6 and parallel
execution of aggregates.

Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Slow query with big tables

От

Tommi K

Дата:

26 августа 2016 г., 16:17:54

Hello,

thanks for the response. I did not get the response to my email even though I am subscribed to the pgsql-performance mail list. Let's hope that I get the next one :)

Increasing work_mem did not have great impact on the performance. But I will try to update the PostgreSQL version to see if it speeds up things.

However is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

Best Regards,

Tommi Kaksonen

Re: Slow query with big tables

От

Craig James

Дата:

26 августа 2016 г., 16:40:30

On Fri, Aug 26, 2016 at 6:17 AM, Tommi K <t2nn2t@gmail.com> wrote:

Hello,
thanks for the response. I did not get the response to my email even though I am subscribed to the pgsql-performance mail list. Let's hope that I get the next one :)

Please include the email you are replying to when you respond. It saves everyone time if they don't have to dig up your old emails, and many of us discard old emails anyway and have no idea what you wrote before.

Craig

Increasing work_mem did not have great impact on the performance. But I will try to update the PostgreSQL version to see if it speeds up things.

However is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

Best Regards,
Tommi Kaksonen

---------------------------------
Craig A. James

Chief Technology Officer

eMolecules, Inc.

---------------------------------

Re: Slow query with big tables

От

Tommi K

Дата:

26 августа 2016 г., 17:25:13

Ok, sorry that I did not add the original message. I thought that it would be automatically added to the message thread.

Here is the question again:

Is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

Thanks,

Tommi Kaksonen

> Hello,

> I have the following tables and query. I would like to get some help to find out why it is slow and how its performance could be improved.

> Thanks,

> Tommi K.

> --Table definitions---

> CREATE TABLE "Measurement"

> (

> id bigserial NOT NULL,

> product_id bigserial NOT NULL,

> nominal_data_id bigserial NOT NULL,

> description text,

> serial text,

> measurement_time timestamp without time zone,

> status smallint,

> system_description text,

> CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),

> CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)

> REFERENCES "Nominal_data" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION,

> CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)

> REFERENCES "Product" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX measurement_time_index

> ON "Measurement"

> USING btree

> (measurement_time);

> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

> CREATE TABLE "Product"

> (

> id bigserial NOT NULL,

> name text,

> description text,

> info text,

> system_name text,

> CONSTRAINT "Product_pkey" PRIMARY KEY (id)

> )

> WITH (

> OIDS=FALSE

> );

> CREATE TABLE "Extra_info"

> (

> id bigserial NOT NULL,

> measurement_id bigserial NOT NULL,

> name text,

> description text,

> info text,

> type text,

> value_string text,

> value_double double precision,

> value_integer bigint,

> value_bool boolean,

> CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),

> CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)

> REFERENCES "Measurement" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX extra_info_measurement_id_index

> ON "Extra_info"

> USING btree

> (measurement_id);

> CREATE TABLE "Feature"

> (

> id bigserial NOT NULL,

> measurement_id bigserial NOT NULL,

> name text,

> description text,

> info text,

> CONSTRAINT "Feature_pkey" PRIMARY KEY (id),

> CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)

> REFERENCES "Measurement" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX feature_measurement_id_and_name_index

> ON "Feature"

> USING btree

> (measurement_id, name COLLATE pg_catalog."default");

> CREATE INDEX feature_measurement_id_index

> ON "Feature"

> USING hash

> (measurement_id);

> CREATE TABLE "Point"

> (

> id bigserial NOT NULL,

> feature_id bigserial NOT NULL,

> x double precision,

> y double precision,

> z double precision,

> status_x smallint,

> status_y smallint,

> status_z smallint,

> difference_x double precision,

> difference_y double precision,

> difference_z double precision,

> CONSTRAINT "Point_pkey" PRIMARY KEY (id),

> CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)

> REFERENCES "Feature" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX point_feature_id_index

> ON "Point"

> USING btree

> (feature_id);

> CREATE TABLE "Warning"

> (

> id bigserial NOT NULL,

> feature_id bigserial NOT NULL,

> "number" smallint,

> info text,

> CONSTRAINT "Warning_pkey" PRIMARY KEY (id),

> CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)

> REFERENCES "Feature" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX warning_feature_id_index

> ON "Warning"

> USING btree

> (feature_id);

> ---Query---

> SELECT

> f.name,

> f.description,

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,

> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS yellow_count,

> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS yellow_warned_count,

> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,

> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,

> SUM(CASE WHEN (p.status_x = 1000 OR p.status_y = 1000 OR p.status_z = 1000) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS unable_to_measure_count

> FROM "Point" p

> JOIN "Feature" f ON f.id = p.feature_id

> JOIN "Measurement" measurement ON measurement.id = f.measurement_id

> JOIN "Product" product ON product.id = measurement.product_id

> LEFT JOIN "Warning" warning ON f.id = warning.feature_id

> WHERE (product.name ILIKE 'Part 1') AND

> measurement.measurement_start_time >= '2015-06-18 17:00:00' AND

> measurement.measurement_start_time <= '2015-06-18 18:00:00' AND

> measurement.id NOT IN(SELECT measurement_id FROM "Extra_info" e

> WHERE e.measurement_id = measurement.id AND e.description = 'Clamp' AND e.value_string ILIKE 'Clamped%')

> GROUP BY f.name, f.description;

> ---Explain Analyze---

> GroupAggregate (cost=1336999.08..1337569.18 rows=5562 width=33) (actual time=6223.622..6272.321 rows=255 loops=1)

> Buffers: shared hit=263552 read=996, temp read=119 written=119

> -> Sort (cost=1336999.08..1337012.98 rows=5562 width=33) (actual time=6223.262..6231.106 rows=26265 loops=1)

> Sort Key: f.name, f.description

> Sort Method: external merge Disk: 936kB

> Buffers: shared hit=263552 read=996, temp read=119 written=119

> -> Nested Loop Left Join (cost=0.00..1336653.08 rows=5562 width=33) (actual time=55.792..6128.875 rows=26265 loops=1)

> Buffers: shared hit=263552 read=996

> -> Nested Loop (cost=0.00..1220487.17 rows=5562 width=33) (actual time=55.773..5910.852 rows=26265 loops=1)

> Buffers: shared hit=182401 read=954

> -> Nested Loop (cost=0.00..22593.53 rows=8272 width=27) (actual time=30.980..3252.869 rows=38831 loops=1)

> Buffers: shared hit=972 read=528

> -> Nested Loop (cost=0.00..657.24 rows=22 width=8) (actual time=0.102..109.577 rows=103 loops=1)

> Join Filter: (measurement.product_id = product.id)

> Rows Removed by Join Filter: 18

> Buffers: shared hit=484 read=9

> -> Seq Scan on "Product" product (cost=0.00..1.04 rows=1 width=8) (actual time=0.010..0.019 rows=1 loops=1)

> Filter: (name ~~* 'Part 1'::text)

> Rows Removed by Filter: 2

> Buffers: shared hit=1

> -> Index Scan using measurement_start_time_index on "Measurement" measurement (cost=0.00..655.37 rows=67 width=16) (actual time=0.042..109.416 rows=121 loops=1)

> Index Cond: ((measurement_start_time >= '2015-06-18 17:00:00'::timestamp without time zone) AND (measurement_start_time <= '2015-06-18 18:00:00'::timestamp without time zone))

> Filter: (NOT (SubPlan 1))

> Buffers: shared hit=483 read=9

> SubPlan 1

> -> Index Scan using extra_info_measurement_id_index on "Extra_info" e (cost=0.00..9.66 rows=1 width=8) (actual time=0.900..0.900 rows=0 loops=121)

> Index Cond: (measurement_id = measurement.id)

> Filter: ((value_string ~~* 'Clamped%'::text) AND (description = 'Clamp'::text))

> Rows Removed by Filter: 2

> Buffers: shared hit=479 read=7

> -> Index Scan using feature_measurement_id_and_name_index on "Feature" rf (cost=0.00..993.40 rows=370 width=35) (actual time=28.152..30.407 rows=377 loops=103)

> Index Cond: (measurement_id = measurement.id)

> Buffers: shared hit=488 read=519

> -> Index Scan using point_feature_id_index on "Point" p (cost=0.00..144.80 rows=1 width=14) (actual time=0.067..0.067 rows=1 loops=38831)

> Index Cond: (feature_id = f.id)

> Buffers: shared hit=181429 read=426

> -> Index Scan using warning_feature_id_index on "Warning" warning (cost=0.00..20.88 rows=1 width=16) (actual time=0.007..0.007 rows=0 loops=26265)

> Index Cond: (f.id = feature_id)

> Buffers: shared hit=81151 read=42

> Total runtime: 6273.312 ms

> ---Version---

> PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

> ---Table sizes---

> Extra_info 1223400 rows

> Feature 185436000 rows

> Measurement 500000 rows

> Point 124681000 rows

> Warning 11766800 rows

> ---Hardware---

> Intel Core i5-2320 CPU 3.00GHz (4 CPUs)

> 6GB Memory

> 64-bit Operating System (Windows 7 Professional)

> WD Blue 500GB HDD - 7200 RPM SATA 6 Gb/s 16MB Cache

> ---History---

> Query gets slower as more data is added to the database

> ---Maintenance---

> Autovacuum is used with default settings

> Tommi Kaksonen <t2nn2t(at)gmail(dot)com> wrote:

> ------------------------------------------------------------------------

> > ---Version---

> > PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

> current point release for 9.2 is 9.2.18, you are some years behind.

> The plan seems okay for me, apart from the on-disk sort: increase

> work_mem to avoid that.

> If i where you i would switch to PG 9.5 - or wait for 9.6 and parallel

> execution of aggregates.

> Regards, Andreas Kretschmer

> --

> Andreas Kretschmer

> http://www.2ndQuadrant.com/

> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

> ------------------------------------------------------------------------

> Hello,

> thanks for the response. I did not get the response to my email even though I am subscribed to the pgsql-performance mail list. Let's hope that I get the next one :)

> Increasing work_mem did not have great impact on the performance. But I will try to update the PostgreSQL version to see if it speeds up things.

> However is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

> Best Regards,

> Tommi Kaksonen

> ------------------------------------------------------------------------

> Please include the email you are replying to when you respond. It saves

> everyone time if they don't have to dig up your old emails, and many of us

> discard old emails anyway and have no idea what you wrote before.

> Craig

Re: Slow query with big tables

От

"Mike Sofen"

Дата:

26 августа 2016 г., 23:27:11

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tommi K
Sent: Friday, August 26, 2016 7:25 AM
To: Craig James <cjames@emolecules.com>
Cc: andreas kretschmer <akretschmer@spamfence.net>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query with big tables

Ok, sorry that I did not add the original message. I thought that it would be automatically added to the message thread.

Here is the question again:

Is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

Thanks,

Tommi Kaksonen

> Hello,

> I have the following tables and query. I would like to get some help to find out why it is slow and how its performance could be improved.

> Thanks,

> Tommi K.

> --Table definitions---

> CREATE TABLE "Measurement"

> (

> id bigserial NOT NULL,

> product_id bigserial NOT NULL,

> nominal_data_id bigserial NOT NULL,

> description text,

> serial text,

> measurement_time timestamp without time zone,

> status smallint,

> system_description text,

> CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),

> CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)

> REFERENCES "Nominal_data" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION,

> CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)

> REFERENCES "Product" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX measurement_time_index

> ON "Measurement"

> USING btree

> (measurement_time);

> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

> CREATE TABLE "Product"

> (

> id bigserial NOT NULL,

> name text,

> description text,

> info text,

> system_name text,

> CONSTRAINT "Product_pkey" PRIMARY KEY (id)

> )

> WITH (

> OIDS=FALSE

> );

> CREATE TABLE "Extra_info"

> (

> id bigserial NOT NULL,

> measurement_id bigserial NOT NULL,

> name text,

> description text,

> info text,

> type text,

> value_string text,

> value_double double precision,

> value_integer bigint,

> value_bool boolean,

> CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),

> CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)

> REFERENCES "Measurement" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX extra_info_measurement_id_index

> ON "Extra_info"

> USING btree

> (measurement_id);

> CREATE TABLE "Feature"

> (

> id bigserial NOT NULL,

> measurement_id bigserial NOT NULL,

> name text,

> description text,

> info text,

> CONSTRAINT "Feature_pkey" PRIMARY KEY (id),

> CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)

> REFERENCES "Measurement" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX feature_measurement_id_and_name_index

> ON "Feature"

> USING btree

> (measurement_id, name COLLATE pg_catalog."default");

> CREATE INDEX feature_measurement_id_index

> ON "Feature"

> USING hash

> (measurement_id);

> CREATE TABLE "Point"

> (

> id bigserial NOT NULL,

> feature_id bigserial NOT NULL,

> x double precision,

> y double precision,

> z double precision,

> status_x smallint,

> status_y smallint,

> status_z smallint,

> difference_x double precision,

> difference_y double precision,

> difference_z double precision,

> CONSTRAINT "Point_pkey" PRIMARY KEY (id),

> CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)

> REFERENCES "Feature" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX point_feature_id_index

> ON "Point"

> USING btree

> (feature_id);

> CREATE TABLE "Warning"

> (

> id bigserial NOT NULL,

> feature_id bigserial NOT NULL,

> "number" smallint,

> info text,

> CONSTRAINT "Warning_pkey" PRIMARY KEY (id),

> CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)

> REFERENCES "Feature" (id) MATCH SIMPLE

> ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

> OIDS=FALSE

> );

> CREATE INDEX warning_feature_id_index

> ON "Warning"

> USING btree

> (feature_id);

> ---Query---

> SELECT

> f.name,

> f.description,

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,

> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS yellow_count,

> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,

> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,

> SUM(CASE WHEN (p.status_x = 1000 OR p.status_y = 1000 OR p.status_z = 1000) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS unable_to_measure_count

> FROM "Point" p

> JOIN "Feature" f ON f.id = p.feature_id

> JOIN "Measurement" measurement ON measurement.id = f.measurement_id

> JOIN "Product" product ON product.id = measurement.product_id

> LEFT JOIN "Warning" warning ON f.id = warning.feature_id

> WHERE (product.name ILIKE 'Part 1') AND

> measurement.measurement_start_time >= '2015-06-18 17:00:00' AND

> measurement.measurement_start_time <= '2015-06-18 18:00:00' AND

> measurement.id NOT IN(SELECT measurement_id FROM "Extra_info" e

> WHERE e.measurement_id = measurement.id AND e.description = 'Clamp' AND e.value_string ILIKE 'Clamped%')

> GROUP BY f.name, f.description;

> ---Explain Analyze---

> GroupAggregate (cost=1336999.08..1337569.18 rows=5562 width=33) (actual time=6223.622..6272.321 rows=255 loops=1)

> Buffers: shared hit=263552 read=996, temp read=119 written=119

> -> Sort (cost=1336999.08..1337012.98 rows=5562 width=33) (actual time=6223.262..6231.106 rows=26265 loops=1)

> Sort Key: f.name, f.description

> Sort Method: external merge Disk: 936kB

> Buffers: shared hit=263552 read=996, temp read=119 written=119

> -> Nested Loop Left Join (cost=0.00..1336653.08 rows=5562 width=33) (actual time=55.792..6128.875 rows=26265 loops=1)

> Buffers: shared hit=263552 read=996

> -> Nested Loop (cost=0.00..1220487.17 rows=5562 width=33) (actual time=55.773..5910.852 rows=26265 loops=1)

> Buffers: shared hit=182401 read=954

> -> Nested Loop (cost=0.00..22593.53 rows=8272 width=27) (actual time=30.980..3252.869 rows=38831 loops=1)

> Buffers: shared hit=972 read=528

> -> Nested Loop (cost=0.00..657.24 rows=22 width=8) (actual time=0.102..109.577 rows=103 loops=1)

> Join Filter: (measurement.product_id = product.id)

> Rows Removed by Join Filter: 18

> Buffers: shared hit=484 read=9

> -> Seq Scan on "Product" product (cost=0.00..1.04 rows=1 width=8) (actual time=0.010..0.019 rows=1 loops=1)

> Filter: (name ~~* 'Part 1'::text)

> Rows Removed by Filter: 2

> Buffers: shared hit=1

> -> Index Scan using measurement_start_time_index on "Measurement" measurement (cost=0.00..655.37 rows=67 width=16) (actual time=0.042..109.416 rows=121 loops=1)

> Index Cond: ((measurement_start_time >= '2015-06-18 17:00:00'::timestamp without time zone) AND (measurement_start_time <= '2015-06-18 18:00:00'::timestamp without time zone))

> Filter: (NOT (SubPlan 1))

> Buffers: shared hit=483 read=9

> SubPlan 1

> -> Index Scan using extra_info_measurement_id_index on "Extra_info" e (cost=0.00..9.66 rows=1 width=8) (actual time=0.900..0.900 rows=0 loops=121)

> Index Cond: (measurement_id = measurement.id)

> Filter: ((value_string ~~* 'Clamped%'::text) AND (description = 'Clamp'::text))

> Rows Removed by Filter: 2

> Buffers: shared hit=479 read=7

> -> Index Scan using feature_measurement_id_and_name_index on "Feature" rf (cost=0.00..993.40 rows=370 width=35) (actual time=28.152..30.407 rows=377 loops=103)

> Index Cond: (measurement_id = measurement.id)

> Buffers: shared hit=488 read=519

> -> Index Scan using point_feature_id_index on "Point" p (cost=0.00..144.80 rows=1 width=14) (actual time=0.067..0.067 rows=1 loops=38831)

> Index Cond: (feature_id = f.id)

> Buffers: shared hit=181429 read=426

> -> Index Scan using warning_feature_id_index on "Warning" warning (cost=0.00..20.88 rows=1 width=16) (actual time=0.007..0.007 rows=0 loops=26265)

> Index Cond: (f.id = feature_id)

> Buffers: shared hit=81151 read=42

> Total runtime: 6273.312 ms

> ---Version---

> PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

> ---Table sizes---

> Extra_info 1223400 rows

> Feature 185436000 rows

> Measurement 500000 rows

> Point 124681000 rows

> Warning 11766800 rows

> ---Hardware---

> Intel Core i5-2320 CPU 3.00GHz (4 CPUs)

> 6GB Memory

> 64-bit Operating System (Windows 7 Professional)

> WD Blue 500GB HDD - 7200 RPM SATA 6 Gb/s 16MB Cache

> ---History---

> Query gets slower as more data is added to the database

> ---Maintenance---

> Autovacuum is used with default settings

> Tommi Kaksonen <t2nn2t(at)gmail(dot)com> wrote:

I don’t see a reason to partition such small data. What I do see is you attempting to run a big query on what looks like a small desktop pc. 6GB of ram, especially under Windows 7, isn’t enough ram for a database server. Run the query on a normal small server of say 16gb and it should perform fine. IMO.

Mike

Re: Slow query with big tables

От

Jim Nasby

Дата:

27 августа 2016 г., 07:11:43

On 8/26/16 3:26 PM, Mike Sofen wrote:
> Is there way to keep query time constant as the database size grows.

No. More data == more time. Unless you find a way to break the laws of
physics.

> Should I use partitioning or partial indexes?

Neither technique is a magic bullet. I doubt either would help here.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Slow query with big tables

От

Pavel Stehule

Дата:

27 августа 2016 г., 07:56:36

2016-08-26 22:26 GMT+02:00 Mike Sofen <msofen@runbox.com>:

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tommi K
Sent: Friday, August 26, 2016 7:25 AM
To: Craig James <cjames@emolecules.com>
Cc: andreas kretschmer <akretschmer@spamfence.net>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query with big tables

Ok, sorry that I did not add the original message. I thought that it would be automatically added to the message thread.

Here is the question again:

Is there way to keep query time constant as the database size grows. Should I use partitioning or partial indexes?

try to disable nested_loop - there are bad estimations.

This query should not be fast - there are two ILIKE filters with negative impact on estimations.

Regards

Pavel

Thanks,
Tommi Kaksonen

> Hello,
>
> I have the following tables and query. I would like to get some help to find out why it is slow and how its performance could be improved.
>
> Thanks,
> Tommi K.
>
>
> --Table definitions---
> CREATE TABLE "Measurement"
> (
> id bigserial NOT NULL,
> product_id bigserial NOT NULL,
> nominal_data_id bigserial NOT NULL,
> description text,
> serial text,
> measurement_time timestamp without time zone,
> status smallint,
> system_description text,
> CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),
> CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)
> REFERENCES "Nominal_data" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)
> REFERENCES "Product" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX measurement_time_index
> ON "Measurement"
> USING btree
> (measurement_time);
> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;
>
> CREATE TABLE "Product"
> (
> id bigserial NOT NULL,
> name text,
> description text,
> info text,
> system_name text,
> CONSTRAINT "Product_pkey" PRIMARY KEY (id)
> )
> WITH (
> OIDS=FALSE
> );
>
>
> CREATE TABLE "Extra_info"
> (
> id bigserial NOT NULL,
> measurement_id bigserial NOT NULL,
> name text,
> description text,
> info text,
> type text,
> value_string text,
> value_double double precision,
> value_integer bigint,
> value_bool boolean,
> CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),
> CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)
> REFERENCES "Measurement" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX extra_info_measurement_id_index
> ON "Extra_info"
> USING btree
> (measurement_id);
>
> CREATE TABLE "Feature"
> (
> id bigserial NOT NULL,
> measurement_id bigserial NOT NULL,
> name text,
> description text,
> info text,
> CONSTRAINT "Feature_pkey" PRIMARY KEY (id),
> CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)
> REFERENCES "Measurement" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX feature_measurement_id_and_name_index
> ON "Feature"
> USING btree
> (measurement_id, name COLLATE pg_catalog."default");
>
> CREATE INDEX feature_measurement_id_index
> ON "Feature"
> USING hash
> (measurement_id);
>
>
> CREATE TABLE "Point"
> (
> id bigserial NOT NULL,
> feature_id bigserial NOT NULL,
> x double precision,
> y double precision,
> z double precision,
> status_x smallint,
> status_y smallint,
> status_z smallint,
> difference_x double precision,
> difference_y double precision,
> difference_z double precision,
> CONSTRAINT "Point_pkey" PRIMARY KEY (id),
> CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)
> REFERENCES "Feature" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX point_feature_id_index
> ON "Point"
> USING btree
> (feature_id);
>
> CREATE TABLE "Warning"
> (
> id bigserial NOT NULL,
> feature_id bigserial NOT NULL,
> "number" smallint,
> info text,
> CONSTRAINT "Warning_pkey" PRIMARY KEY (id),
> CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)
> REFERENCES "Feature" (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX warning_feature_id_index
> ON "Warning"
> USING btree
> (feature_id);
>
>
> ---Query---
> SELECT
> f.name,
> f.description,
> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,
> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,
> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS yellow_count,
> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0) AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS yellow_warned_count,
> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,
> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,
> SUM(CASE WHEN (p.status_x = 1000 OR p.status_y = 1000 OR p.status_z = 1000) AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS unable_to_measure_count
> FROM "Point" p
> JOIN "Feature" f ON f.id = p.feature_id
> JOIN "Measurement" measurement ON measurement.id = f.measurement_id
> JOIN "Product" product ON product.id = measurement.product_id
> LEFT JOIN "Warning" warning ON f.id = warning.feature_id
> WHERE (product.name ILIKE 'Part 1') AND
> measurement.measurement_start_time >= '2015-06-18 17:00:00' AND
> measurement.measurement_start_time <= '2015-06-18 18:00:00' AND
> measurement.id NOT IN(SELECT measurement_id FROM "Extra_info" e
> WHERE e.measurement_id = measurement.id AND e.description = 'Clamp' AND e.value_string ILIKE 'Clamped%')
> GROUP BY f.name, f.description;
>
>
> ---Explain Analyze---
> GroupAggregate (cost=1336999.08..1337569.18 rows=5562 width=33) (actual time=6223.622..6272.321 rows=255 loops=1)
> Buffers: shared hit=263552 read=996, temp read=119 written=119
> -> Sort (cost=1336999.08..1337012.98 rows=5562 width=33) (actual time=6223.262..6231.106 rows=26265 loops=1)
> Sort Key: f.name, f.description
> Sort Method: external merge Disk: 936kB
> Buffers: shared hit=263552 read=996, temp read=119 written=119
> -> Nested Loop Left Join (cost=0.00..1336653.08 rows=5562 width=33) (actual time=55.792..6128.875 rows=26265 loops=1)
> Buffers: shared hit=263552 read=996
> -> Nested Loop (cost=0.00..1220487.17 rows=5562 width=33) (actual time=55.773..5910.852 rows=26265 loops=1)
> Buffers: shared hit=182401 read=954
> -> Nested Loop (cost=0.00..22593.53 rows=8272 width=27) (actual time=30.980..3252.869 rows=38831 loops=1)
> Buffers: shared hit=972 read=528
> -> Nested Loop (cost=0.00..657.24 rows=22 width=8) (actual time=0.102..109.577 rows=103 loops=1)
> Join Filter: (measurement.product_id = product.id)
> Rows Removed by Join Filter: 18
> Buffers: shared hit=484 read=9
> -> Seq Scan on "Product" product (cost=0.00..1.04 rows=1 width=8) (actual time=0.010..0.019 rows=1 loops=1)
> Filter: (name ~~* 'Part 1'::text)
> Rows Removed by Filter: 2
> Buffers: shared hit=1
> -> Index Scan using measurement_start_time_index on "Measurement" measurement (cost=0.00..655.37 rows=67 width=16) (actual time=0.042..109.416 rows=121 loops=1)
> Index Cond: ((measurement_start_time >= '2015-06-18 17:00:00'::timestamp without time zone) AND (measurement_start_time <= '2015-06-18 18:00:00'::timestamp without time zone))
> Filter: (NOT (SubPlan 1))
> Buffers: shared hit=483 read=9
> SubPlan 1
> -> Index Scan using extra_info_measurement_id_index on "Extra_info" e (cost=0.00..9.66 rows=1 width=8) (actual time=0.900..0.900 rows=0 loops=121)
> Index Cond: (measurement_id = measurement.id)
> Filter: ((value_string ~~* 'Clamped%'::text) AND (description = 'Clamp'::text))
> Rows Removed by Filter: 2
> Buffers: shared hit=479 read=7
> -> Index Scan using feature_measurement_id_and_name_index on "Feature" rf (cost=0.00..993.40 rows=370 width=35) (actual time=28.152..30.407 rows=377 loops=103)
> Index Cond: (measurement_id = measurement.id)
> Buffers: shared hit=488 read=519
> -> Index Scan using point_feature_id_index on "Point" p (cost=0.00..144.80 rows=1 width=14) (actual time=0.067..0.067 rows=1 loops=38831)
> Index Cond: (feature_id = f.id)
> Buffers: shared hit=181429 read=426
> -> Index Scan using warning_feature_id_index on "Warning" warning (cost=0.00..20.88 rows=1 width=16) (actual time=0.007..0.007 rows=0 loops=26265)
> Index Cond: (f.id = feature_id)
> Buffers: shared hit=81151 read=42
> Total runtime: 6273.312 ms
>
>
> ---Version---
> PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
>
>
> ---Table sizes---
> Extra_info          1223400 rows
> Feature                              185436000 rows
> Measurement     500000 rows
> Point                   124681000 rows
> Warning                            11766800 rows
>
> ---Hardware---
> Intel Core i5-2320 CPU 3.00GHz (4 CPUs)
> 6GB Memory
> 64-bit Operating System (Windows 7 Professional)
> WD Blue 500GB HDD - 7200 RPM SATA 6 Gb/s 16MB Cache
>
> ---History---
> Query gets slower as more data is added to the database
>
> ---Maintenance---
> Autovacuum is used with default settings
>
> Tommi Kaksonen <t2nn2t(at)gmail(dot)com> wrote:

I don’t see a reason to partition such small data. What I do see is you attempting to run a big query on what looks like a small desktop pc. 6GB of ram, especially under Windows 7, isn’t enough ram for a database server. Run the query on a normal small server of say 16gb and it should perform fine. IMO.

Mike

Re: Slow query with big tables

От

Craig James

Дата:

27 августа 2016 г., 17:13:08

On Fri, Aug 26, 2016 at 9:11 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 8/26/16 3:26 PM, Mike Sofen wrote:
Is there way to keep query time constant as the database size grows.

No. More data == more time. Unless you find a way to break the laws of physics.

Straight hash-table indexes (which Postgres doesn't use) have O(1) access time. The amount of data has no effect on the access time. Postgres uses B-trees which have O(logN) performance. There is no "law of physics" that says Postgres couldn't employ pure hash indexes.

Craig

Should I use partitioning or partial indexes?

Neither technique is a magic bullet. I doubt either would help here.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

---------------------------------
Craig A. James

Chief Technology Officer

eMolecules, Inc.

---------------------------------

Re: Slow query with big tables

От

Tom Lane

Дата:

27 августа 2016 г., 19:37:05

Craig James <cjames@emolecules.com> writes:
> Straight hash-table indexes (which Postgres doesn't use) have O(1) access
> time. The amount of data has no effect on the access time.

This is wishful thinking --- once you have enough data, O(1) goes out the
window.  For example, a hash index is certainly not going to continue to
scale linearly once you reach its maximum possible number of buckets
(2^N for N-bit hashes, and remember you can't get very many useful hash
bits out of small objects like integers).  But even before that, large
numbers of buckets put enough stress on your storage system that you will
see some not very O(1)-ish behavior, just because too little of the index
fits in whatever cache and RAM you have.  Any storage hierarchy is
ultimately going to impose O(log N) access costs, that's the way they're
built.

I think it's fairly pointless to discuss such matters in the abstract.
If you want to make useful engineering tradeoffs you have to talk about
specific data sets and available hardware.

            regards, tom lane

Re: Slow query with big tables

От

Jeff Janes

Дата:

27 августа 2016 г., 21:12:23

On Sat, Aug 27, 2016 at 7:13 AM, Craig James <cjames@emolecules.com> wrote:

On Fri, Aug 26, 2016 at 9:11 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 8/26/16 3:26 PM, Mike Sofen wrote:
Is there way to keep query time constant as the database size grows.

No. More data == more time. Unless you find a way to break the laws of physics.

Straight hash-table indexes (which Postgres doesn't use) have O(1) access time.

But he isn't doing single-row lookups, he is doing large aggregations. If you have to aggregate N rows, doing a O(1) operation on different N occasions is still O(N).

Not that big-O is useful here anyway. It assumes that either everything fits in RAM (and is already there), or that nothing fits in RAM and it all has to be fetched from disk, even the index root pages, every time it is needed. Tommi is not operating under an environment where the first assumption holds, and no one operates in an environment where the second assumption holds.

As N increases beyond available RAM, your actual time for a single look-up is going to be a weighted average of two different constant-time operations, one with a small constant and one with a large constant. Big-O notation ignores this nicety and assumes all operations are at the slower speed, because that is what the limit of the weighted average will be as N gets very large. But real world systems do not operate at the infinite limit.

So his run time could easily be proportional to N^2, if he aggregates more rows and each one of them is less likely to be a cache hit.

Cheers,

Jeff

Re: Slow query with big tables

От

Jeff Janes

Дата:

27 августа 2016 г., 21:34:01

On Fri, Aug 26, 2016 at 6:17 AM, Tommi K <t2nn2t@gmail.com> wrote:

Hello,
thanks for the response. I did not get the response to my email even though I am subscribed to the pgsql-performance mail list. Let's hope that I get the next one :)

Increasing work_mem did not have great impact on the performance. But I will try to update the PostgreSQL version to see if it speeds up things.

However is there way to keep query time constant as the database size grows.

Not likely. If the number of rows you are aggregating grows, it will take more work to do those aggregations.

If the number of rows being aggregated doesn't grow, because all the growth occurs outside of the measurement_time range, even then the new data will still make it harder to keep the stuff you want cached in memory. If you really want more-constant query time, you could approach that by giving the machine as little RAM as possible. This works not by making the large database case faster, but by making the small database case slower. That usually is not what people want.

Should I use partitioning or partial indexes?

Partitioning the Feature and Point tables on measurement_time (or measurement_start_time, you are not consistent on what it is called) might be helpful. However, measurement_time does not exist in those tables, so you would first have to de-normalize by introducing it into them.

More likely to be helpful would be precomputing the aggregates and storing them in a materialized view (not available in 9.2). Also, more RAM and better hard-drives can't hurt.

Cheers,

Jeff

Re: Slow query with big tables

От

Tommi Kaksonen

Дата:

29 августа 2016 г., 10:00:16

On Sat, Aug 27, 2016 at 18:33 GMT+03:00, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> Partitioning the Feature and Point tables on measurement_time (or

> measurement_start_time,

> you are not consistent on what it is called) might be helpful. However,

> measurement_time does not exist in those tables, so you would first have to

> de-normalize by introducing it into them.

> More likely to be helpful would be precomputing the aggregates and storing

> them in a materialized view (not available in 9.2). Also, more RAM and

> better hard-drives can't hurt.

Thanks a lot for help and all suggestions. Before this I tried to partition by measurement_id (Feature table) and by result_feature_id (Point table) but the performance was worse than without partitioning. Using measurement_time in partitioning might be a better idea (measurement_start_time was meant to be measurement_time).

I think I will update to newer version, use better hardware and try materialized views for better performance.

Best Regards,

Tommi Kaksonen

2016-08-27 21:33 GMT+03:00 Jeff Janes <jeff.janes@gmail.com>:

On Fri, Aug 26, 2016 at 6:17 AM, Tommi K <t2nn2t@gmail.com> wrote:
Hello,
thanks for the response. I did not get the response to my email even though I am subscribed to the pgsql-performance mail list. Let's hope that I get the next one :)

Increasing work_mem did not have great impact on the performance. But I will try to update the PostgreSQL version to see if it speeds up things.

However is there way to keep query time constant as the database size grows.

Not likely. If the number of rows you are aggregating grows, it will take more work to do those aggregations.

If the number of rows being aggregated doesn't grow, because all the growth occurs outside of the measurement_time range, even then the new data will still make it harder to keep the stuff you want cached in memory. If you really want more-constant query time, you could approach that by giving the machine as little RAM as possible. This works not by making the large database case faster, but by making the small database case slower. That usually is not what people want.

Should I use partitioning or partial indexes?

Partitioning the Feature and Point tables on measurement_time (or measurement_start_time, you are not consistent on what it is called) might be helpful. However, measurement_time does not exist in those tables, so you would first have to de-normalize by introducing it into them.

More likely to be helpful would be precomputing the aggregates and storing them in a materialized view (not available in 9.2). Also, more RAM and better hard-drives can't hurt.

Cheers,

Jeff

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Slow query with big tables