Обсуждение: Slow GroupAggregate and Sort

Поиск
Список
Период
Сортировка

Slow GroupAggregate and Sort

От
Darwin Correa
Дата:
Hello great day, we have a strange case with slow query and would like some help.









We have citus cluster with the following configuration: 1 master + 3 data nodes, each machine have:
- 24 cores (Intel Xeon E5 2620)
- 192 GB RAM
- 1TB SSD

each node has configured postgres settings using tuning.sql

The main Table DDL is in (ddl.sql)

also distributed are as follow:

SELECT create_distributed_table('salert_post', 'id',shard_count := 72);

SELECT create_distributed_table('salert_q56', 'post',
                                colocate_with => 'salert_post');

when run the query (query.sql)  as you can see in explain (plan4_v3.txt) citus take about 18s to run all fragments
but each fragment take at most 2s, so my questions are- why citus take this time in run all fragments?
- if I tuned each postgres node efficiently why take much time to make sort and aggregate with citus results?

good night, I hope you can help me with some ideas


also we remove partitions, and test only with citus, but query took more than a minute.
as a note, we not have 72 shards on the same node we have 72 in total, 24 shards each node.

I think the problem was in Sort and in GroupAggregate  I no have idea how speed up this in master node, because the Custom Scan (Citus Adaptive)  is not too slow, the most time is consumed in master on Sort and group

I hope you can help me.

Re: Slow GroupAggregate and Sort

От
Jeff Janes
Дата:
On Thu, Dec 28, 2023 at 12:03 PM Darwin Correa <dcorrea@jedai.group> wrote:

when run the query (query.sql)  as you can see in explain (plan4_v3.txt) citus take about 18s to run all fragments

Where is plan4_v3.txt?  Is that hidden in some non-obvious way in one of your links?
 
but each fragment take at most 2s, so my questions are- why citus take this time in run all fragments?

I only see that one arbitrary fragment takes 2.7s, with no indication whether that one is the slowest one or not.  But I am not used to reading citus plans.
 
also we remove partitions, and test only with citus, but query took more than a minute.
as a note, we not have 72 shards on the same node we have 72 in total, 24 shards each node.

I thought the point of sharding was to bring more CPU and RAM to bear than can feasibly be obtained in one machine.  Doesn't that make 24 shards per machine completely nuts?
 

I think the problem was in Sort and in GroupAggregate  I no have idea how speed up this in master node, because the Custom Scan (Citus Adaptive)  is not too slow, the most time is consumed in master on Sort and group

You want to know why citus is so slow here, but also say it isn't slow and something else is slow instead? 

I'd break this down into more manageable chunks for investigation.  Populate one scratch table (on one node, not a hypertable) with all 2.6 million rows.  See how long it takes to populate it based on the citus query, and separately see how long it takes to run the aggregate query on the populated scratch table.

What version of PostgreSQL (and citus) are you using?  In my hands (without citus being involved), the sort includes "users" as the last column, to support the count(distinct users) operation.  I don't know why yours doesn't do that.

Cheers,

Jeff

Re: Slow GroupAggregate and Sort

От
Darwin Correa
Дата:
Hello, Happy New Year! I add my responses in blue.



---- El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes <jeff.janes@gmail.com> escribió ----

On Thu, Dec 28, 2023 at 12:03 PM Darwin Correa <dcorrea@jedai.group> wrote:


when run the query (query.sql)  as you can see in explain (plan4_v3.txt) citus take about 18s to run all fragments

Where is plan4_v3.txt?  Is that hidden in some non-obvious way in one of your links?

sorry by the wrong name, Yes The explain plan is in the link that said plan, is this




 

but each fragment take at most 2s, so my questions are- why citus take this time in run all fragments?

I only see that one arbitrary fragment takes 2.7s, with no indication whether that one is the slowest one or not.  But I am not used to reading citus plans.

In the explain plan citus show one of 72 subtask and show the most slow

 
also we remove partitions, and test only with citus, but query took more than a minute.
as a note, we not have 72 shards on the same node we have 72 in total, 24 shards each node.

I thought the point of sharding was to bring more CPU and RAM to bear than can feasibly be obtained in one machine.  Doesn't that make 24 shards per machine completely nuts?

Based o citus docs the recommended shards is 2x cpu cores in my case I've tested with few shards and 1:1, 2:1 shards but always have slow query time in the last step (sorting and grouping) in máster node.



I think the problem was in Sort and in GroupAggregate  I no have idea how speed up this in master node, because the Custom Scan (Citus Adaptive)  is not too slow, the most time is consumed in master on Sort and group

You want to know why citus is so slow here, but also say it isn't slow and something else is slow instead? 

I'm refering in general that this query run slow in Citus cluster, but analizing explain plan I think that the specific part of citus (Adaptive executor) is not the slow part, instead of I can show that the “postgres only part” is slow (Sort and GroupAggregate)


I'd break this down into more manageable chunks for investigation.  Populate one scratch table (on one node, not a hypertable) with all 2.6 million rows.  See how long it takes to populate it based on the citus query, and separately see how long it takes to run the aggregate query on the populated scratch table.

Populate table based with citus query, took 1.45 seconds each fragment, I don't know how citus run all fragments in parallel  but running secuential each fragment, total took 51s

After scratch table filled sort took 32s, explain (https://explain.dalibo.com/plan/8a3h26hcc6328c11)

and sort+aggregation took 34s explain (https://explain.dalibo.com/plan/c5e4d62ge87cafg4)

I don't understand "actual time" metric, because accordind plan (citus) startup time is high in Sort step


What version of PostgreSQL (and citus) are you using?  In my hands (without citus being involved), the sort includes "users" as the last column, to support the count(distinct users) operation.  I don't know why yours doesn't do that.
I'm using citus 12.0 wich comes with postgreSQL 16, I upgrade to 12.1 this is the updated plan:  (now took more time)




Cheers,

Jeff



Darwin

Correa P.

//    software architect














Veintimilla y Leonidas Plaza 

0999965925     

  //   DESARROLLO E INNOVACIÓN TECNOLÓGICA
 












Re: Slow GroupAggregate and Sort

От
Jeff Janes
Дата:
On Mon, Jan 1, 2024 at 9:57 AM Darwin Correa <dcorrea@jedai.group> wrote:
Hello, Happy New Year! I add my responses in blue.



---- El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes <jeff.janes@gmail.com> escribió ----

I thought the point of sharding was to bring more CPU and RAM to bear than can feasibly be obtained in one machine.  Doesn't that make 24 shards per machine completely nuts?

Based o citus docs the recommended shards is 2x cpu cores in my case I've tested with few shards and 1:1, 2:1 shards but always have slow query time in the last step (sorting and grouping) in máster node.

That might make sense if PostgreSQL didn't do parallelization itself.  But according to your plan, PostgreSQL itself tries to parallelize 4 ways (although fails, as it can't find any available workers) and then you have 24 nodes all doing the same thing, all with only 12 CPU.  That doesn't seem good. although it now does seem unrelated to the issue at hand.


I'd break this down into more manageable chunks for investigation.  Populate one scratch table (on one node, not a hypertable) with all 2.6 million rows.  See how long it takes to populate it based on the citus query, and separately see how long it takes to run the aggregate query on the populated scratch table.

After scratch table filled sort took 32s, explain (https://explain.dalibo.com/plan/8a3h26hcc6328c11)

So that plan shows the sort to be egregiously slow, and with no involvement of citus and no apparent reason for slowness.  I'm thinking you have a pathological collation being used.  What is your default collation?  (Your DDL shows that no non-default collations are in use, but doesn't indicate what the default is)

Cheers,

Jeff

Re: Slow GroupAggregate and Sort

От
Darwin Correa
Дата:
Hello, my answers in blue again



---- El Wed, 03 Jan 2024 21:43:15 -0500, Jeff Janes <jeff.janes@gmail.com> escribió ----

On Mon, Jan 1, 2024 at 9:57 AM Darwin Correa <dcorrea@jedai.group> wrote:
Hello, Happy New Year! I add my responses in blue.



---- El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes <jeff.janes@gmail.com> escribió ----

I thought the point of sharding was to bring more CPU and RAM to bear than can feasibly be obtained in one machine.  Doesn't that make 24 shards per machine completely nuts?

Based o citus docs the recommended shards is 2x cpu cores in my case I've tested with few shards and 1:1, 2:1 shards but always have slow query time in the last step (sorting and grouping) in máster node.

That might make sense if PostgreSQL didn't do parallelization itself.  But according to your plan, PostgreSQL itself tries to parallelize 4 ways (although fails, as it can't find any available workers) and then you have 24 nodes all doing the same thing, all with only 12 CPU.  That doesn't seem good. although it now does seem unrelated to the issue at hand.

But the coordinator (who make sort and aggr) are in separate server (each node si phisically other server) I no understand why if cooridnator not aree too busy, and I've already test with less shards, and time increment.



I'd break this down into more manageable chunks for investigation.  Populate one scratch table (on one node, not a hypertable) with all 2.6 million rows.  See how long it takes to populate it based on the citus query, and separately see how long it takes to run the aggregate query on the populated scratch table.

After scratch table filled sort took 32s, explain (https://explain.dalibo.com/plan/8a3h26hcc6328c11)

So that plan shows the sort to be egregiously slow, and with no involvement of citus and no apparent reason for slowness.  I'm thinking you have a pathological collation being used.  What is your default collation?  (Your DDL shows that no non-default collations are in use, but doesn't indicate what the default is)

The collation is en_US.UFT-8, can you give more detail or which refer to "pathological collation" please to research about that? and the data store in this tables and this column specific are only alphanumeric charactres a-z,A-Z and numbers, nothing special

Cheers,

Jeff



Darwin

Correa P.

//    software architect














Veintimilla y Leonidas Plaza 

0999965925     

  //   DESARROLLO E INNOVACIÓN TECNOLÓGICA