Обсуждение: Query that took a lot of time in Postgresql when not using trim in order by
I have a query that produce a different query plan if I put a trim in one of the columns in the order by.
When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.
Is wear because the columns is clean not need to be trimmed, I have check that.
The problem is that I can't change the query because is generate by the mondrian.
I do research and found in postgres list that I need to crank work_mem up high but don't work for me.
My postgresql.conf
# Add settings for extensions here
default_statistics_target = 50 # pgtune wizard 2014-06-04
maintenance_work_mem = 1GB # pgtune wizard 2014-06-04
constraint_exclusion = on # pgtune wizard 2014-06-04
checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04
effective_cache_size = 44GB # pgtune wizard 2014-06-04
work_mem = 1536MB # pgtune wizard 2014-06-04
#work_mem = 16GB # I have try this but don't work
wal_buffers = 32MB # pgtune wizard 2014-06-04
checkpoint_segments = 16 # pgtune wizard 2014-06-04
shared_buffers = 15GB # pgtune wizard 2014-06-04
max_connections = 20 # pgtune wizard 2014-06-04
___________________________________________________
Query with trim
SELECT "dim_cliente"."tipocliente" AS "c0",
"dim_cliente"."a1_ibge" AS "c1",
"dim_cliente"."a1_cod" AS "c2",
"dim_cliente"."a1_nome" AS "c3",
"dim_vendedor"."a3_nome" AS "c4"
FROM "public"."dim_cliente" AS "dim_cliente",
"public"."fato_ventas_productos" AS "fato_ventas_productos",
"public"."dim_vendedor" AS "dim_vendedor"
WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"
AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"
GROUP
BY "dim_cliente"."tipocliente" ,
"dim_cliente"."a1_ibge",
"dim_cliente"."a1_cod",
"dim_cliente"."a1_nome",
"dim_vendedor"."a3_nome"
ORDER
BY trim("dim_cliente"."tipocliente") ASC NULLS LAST,
"dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here
"dim_cliente"."a1_cod" ASC NULLS LAST, -- or here
"dim_cliente"."a1_nome" ASC NULLS LAST; -- or here
-- this query took 3845.895 ms
___________________________________________________
Query Plan when using trim
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
Sort Method: quicksort Memory: 13121kB
-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)
-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor
Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)
-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)
Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor
-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
Buckets: 2048 Batches: 1 Memory Usage: 1980kB
-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Buckets: 1024 Batches: 1 Memory Usage: 56kB
-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Total runtime: 3845.895 ms
(25 filas)
___________________________________________________
Query without trim
SELECT "dim_cliente"."tipocliente" AS "c0",
"dim_cliente"."a1_ibge" AS "c1",
"dim_cliente"."a1_cod" AS "c2",
"dim_cliente"."a1_nome" AS "c3",
"dim_vendedor"."a3_nome" AS "c4"
FROM "public"."dim_cliente" AS "dim_cliente",
"public"."fato_ventas_productos" AS "fato_ventas_productos",
"public"."dim_vendedor" AS "dim_vendedor"
WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"
AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"
GROUP
BY "dim_cliente"."tipocliente" ,
"dim_cliente"."a1_ibge",
"dim_cliente"."a1_cod",
"dim_cliente"."a1_nome",
"dim_vendedor"."a3_nome"
ORDER
BY "dim_cliente"."tipocliente" ASC NULLS LAST,
"dim_cliente"."a1_ibge" ASC NULLS LAST,
"dim_cliente"."a1_cod" ASC NULLS LAST,
"dim_cliente"."a1_nome" ASC NULLS LAST;
-- this query took 37249.268 ms
___________________________________________________
Query Plan when not using trim
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Method: quicksort Memory: 265592kB
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)
-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor
Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)
-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)
Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor
-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
Buckets: 2048 Batches: 1 Memory Usage: 1980kB
-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Buckets: 1024 Batches: 1 Memory Usage: 56kB
-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Total runtime: 37249.268 ms
(25 filas)
___________________________________________________
Is anything that I can do to solve this problem, is that a bug or a config problem?
Here the link with a dump of the tables
https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing
I appreciate your help
Re: Query that took a lot of time in Postgresql when not using trim in order by
Hello!I have a query that produce a different query plan if I put a trim in one of the columns in the order by.
When i put the trim in any column it use hashaggregate and took 3 seconds against 30 when not.
Is wear because the columns is clean not need to be trimmed, I have check that.
The problem is that I can't change the query because is generate by the mondrian.
I do research and found in postgres list that I need to crank work_mem up high but don't work for me.
My postgresql.conf
# Add settings for extensions here
default_statistics_target = 50 # pgtune wizard 2014-06-04
maintenance_work_mem = 1GB # pgtune wizard 2014-06-04
constraint_exclusion = on # pgtune wizard 2014-06-04
checkpoint_completion_target = 0.9 # pgtune wizard 2014-06-04
effective_cache_size = 44GB # pgtune wizard 2014-06-04
work_mem = 1536MB # pgtune wizard 2014-06-04
#work_mem = 16GB # I have try this but don't work
wal_buffers = 32MB # pgtune wizard 2014-06-04
checkpoint_segments = 16 # pgtune wizard 2014-06-04
shared_buffers = 15GB # pgtune wizard 2014-06-04
max_connections = 20 # pgtune wizard 2014-06-04
___________________________________________________
Query with trim
SELECT "dim_cliente"."tipocliente" AS "c0",
"dim_cliente"."a1_ibge" AS "c1",
"dim_cliente"."a1_cod" AS "c2",
"dim_cliente"."a1_nome" AS "c3",
"dim_vendedor"."a3_nome" AS "c4"
FROM "public"."dim_cliente" AS "dim_cliente",
"public"."fato_ventas_productos" AS "fato_ventas_productos",
"public"."dim_vendedor" AS "dim_vendedor"
WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"
AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"
GROUP
BY "dim_cliente"."tipocliente" ,
"dim_cliente"."a1_ibge",
"dim_cliente"."a1_cod",
"dim_cliente"."a1_nome",
"dim_vendedor"."a3_nome"
ORDER
BY trim("dim_cliente"."tipocliente") ASC NULLS LAST,
"dim_cliente"."a1_ibge" ASC NULLS LAST, -- the same result if I put the trim here
"dim_cliente"."a1_cod" ASC NULLS LAST, -- or here
"dim_cliente"."a1_nome" ASC NULLS LAST; -- or here
-- this query took 3845.895 ms
___________________________________________________
Query Plan when using trim
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
Sort Method: quicksort Memory: 13121kB
-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)
-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=28.746..1183.691 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor
Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)
-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..699.779 rows=907357 loops=1)
Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor
-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=28.699..28.699 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
Buckets: 2048 Batches: 1 Memory Usage: 1980kB
-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.008..16.537 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.747..0.747 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Buckets: 1024 Batches: 1 Memory Usage: 56kB
-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.026..0.423 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Total runtime: 3845.895 ms
(25 filas)
___________________________________________________
Query without trim
SELECT "dim_cliente"."tipocliente" AS "c0",
"dim_cliente"."a1_ibge" AS "c1",
"dim_cliente"."a1_cod" AS "c2",
"dim_cliente"."a1_nome" AS "c3",
"dim_vendedor"."a3_nome" AS "c4"
FROM "public"."dim_cliente" AS "dim_cliente",
"public"."fato_ventas_productos" AS "fato_ventas_productos",
"public"."dim_vendedor" AS "dim_vendedor"
WHERE "fato_ventas_productos"."key_cliente" = "dim_cliente"."key_cliente"
AND "fato_ventas_productos"."key_vendedor" = "dim_vendedor"."key_vendedor"
GROUP
BY "dim_cliente"."tipocliente" ,
"dim_cliente"."a1_ibge",
"dim_cliente"."a1_cod",
"dim_cliente"."a1_nome",
"dim_vendedor"."a3_nome"
ORDER
BY "dim_cliente"."tipocliente" ASC NULLS LAST,
"dim_cliente"."a1_ibge" ASC NULLS LAST,
"dim_cliente"."a1_cod" ASC NULLS LAST,
"dim_cliente"."a1_nome" ASC NULLS LAST;
-- this query took 37249.268 ms
___________________________________________________
Query Plan when not using trim
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Method: quicksort Memory: 265592kB
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Hash Cond: (fato_ventas_productos.key_vendedor = dim_vendedor.key_vendedor)
-> Hash Join (cost=830.02..68126.13 rows=907357 width=86) (actual time=25.980..1203.775 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, fato_ventas_productos.key_vendedor
Hash Cond: (fato_ventas_productos.key_cliente = dim_cliente.key_cliente)
-> Seq Scan on public.fato_ventas_productos (cost=0.00..46880.57 rows=907357 width=16) (actual time=0.004..680.283 rows=907357 loops=1)
Output: fato_ventas_productos.key_cliente, fato_ventas_productos.key_vendedor
-> Hash (cost=618.90..618.90 rows=16890 width=86) (actual time=25.931..25.931 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
Buckets: 2048 Batches: 1 Memory Usage: 1980kB
-> Seq Scan on public.dim_cliente (cost=0.00..618.90 rows=16890 width=86) (actual time=0.005..13.736 rows=16890 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_cliente.key_cliente
-> Hash (cost=18.90..18.90 rows=590 width=59) (actual time=0.715..0.715 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Buckets: 1024 Batches: 1 Memory Usage: 56kB
-> Seq Scan on public.dim_vendedor (cost=0.00..18.90 rows=590 width=59) (actual time=0.024..0.405 rows=590 loops=1)
Output: dim_vendedor.a3_nome, dim_vendedor.key_vendedor
Total runtime: 37249.268 ms
(25 filas)
___________________________________________________
Is anything that I can do to solve this problem, is that a bug or a config problem?
Here the link with a dump of the tables
https://drive.google.com/file/d/0Bwupj61i9BtWZ1NiVXltaWc0dnM/view?usp=sharing
I appreciate your help
What is your Postgres version?
Do you have correct statistics on this tables?
Please show yours execution plans with buffers i.e. explain (analyze,buffers) ...
-- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Query that took a lot of time in Postgresql when not using trim in order by
> What is your Postgres version? > Do you have correct statistics on this tables? > Please show yours execution plans with buffers i.e. explain (analyze,buffers) ... > Fast: Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text)) Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome Sort Method: quicksort Memory: 13121kB -> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1) Slow: Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome -> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1) Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome Sort Method: quicksort Memory: 265592kB -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1) The difference is in the top of plans. As we see, hashjoin time is practically the same. But fast plan uses hashagg first and only 43k rows require sorting. Slow plan dominated by sorting 900k rows. I wonder if increasing cpu_tuple_cost will help. As cost difference between two plans is negligible now.
> What is your Postgres version?
> Do you have correct statistics on this tables?
> Please show yours execution plans with buffers i.e. explain (analyze,buffers) ...
>
Fast:
Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, (btrim((dim_cliente.tipocliente)::text))
Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome
Sort Method: quicksort Memory: 13121kB
-> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome, btrim((dim_cliente.tipocliente)::text)
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1)
Slow:
Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
-> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1)
Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome
Sort Method: quicksort Memory: 265592kB
-> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1)
The difference is in the top of plans.
As we see, hashjoin time is practically the same.
But fast plan uses hashagg first and only 43k rows require sorting.
Slow plan dominated by sorting 900k rows.
I wonder if increasing cpu_tuple_cost will help.
As cost difference between two plans is negligible now.
Re: Query that took a lot of time in Postgresql when not using trim in order by
On 2015-11-25 19:35:15 +0300, Evgeniy Shishkin wrote: > Fast: > > Sort (cost=193101.41..195369.80 rows=907357 width=129) (actual time=3828.176..3831.261 rows=43615 loops=1) > Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome,(btrim((dim_cliente.tipocliente)::text)) > Sort Key: (btrim((dim_cliente.tipocliente)::text)), dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome > Sort Method: quicksort Memory: 13121kB > -> HashAggregate (cost=91970.52..103312.49 rows=907357 width=129) (actual time=2462.690..2496.729 rows=43615 loops=1) > Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome,btrim((dim_cliente.tipocliente)::text) > -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=29.524..1533.880 rows=907357 loops=1) > > > Slow: > > Group (cost=170417.48..184027.84 rows=907357 width=129) (actual time=36649.329..37235.158 rows=43615 loops=1) > Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome > -> Sort (cost=170417.48..172685.88 rows=907357 width=129) (actual time=36649.315..36786.760 rows=907357 loops=1) > Output: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome > Sort Key: dim_cliente.tipocliente, dim_cliente.a1_ibge, dim_cliente.a1_cod, dim_cliente.a1_nome, dim_vendedor.a3_nome > Sort Method: quicksort Memory: 265592kB > -> Hash Join (cost=856.30..80628.56 rows=907357 width=129) (actual time=26.719..1593.693 rows=907357 loops=1) > > > The difference is in the top of plans. > As we see, hashjoin time is practically the same. > But fast plan uses hashagg first and only 43k rows require sorting. > Slow plan dominated by sorting 900k rows. > > I wonder if increasing cpu_tuple_cost will help. > As cost difference between two plans is negligible now. Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an in-memory sort of 900k rows seems slow to me. I tested this on my PC at home (1.8 GHz Core2 Dual, so a rather old and slow box) and I could sort 1E6 rows of 128 random bytes in 5.6 seconds. Even if I kept the first 96 bytes constant (so only the last 32 were random), it took only 21 seconds. Either this CPU is really slow or the data is heavily skewed - is it possible that all dimensions except dim_vendedor.a3_nome have only one or very few values? In that case changing the sort order might help. hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Вложения
"Peter J. Holzer" <hjp-pgsql@hjp.at> writes: > Seems plausible. Also I'm wondering what CPU this is: 36 seconds for an > in-memory sort of 900k rows seems slow to me. I'm wondering if it's textual data in some locale whose strcoll() behavior is exceptionally slow. regards, tom lane