On 1/12/24 6:34 AM, Jeremiah Bauer wrote:
>
> That'd be a band-aid at best, because we know that the query used to
> define the materialized view runs in a reasonable amount of time on it's
> own, as does a CTAS. So either the REFRESH is doing something odd when
> writing into the new relation (which looking at the code seems very
> unlikely), or REFRESH is getting a different query plan for some reason.
> Unfortunately, I don't know of any easy way to get the query plan for
> the REFRESH (it might be possible via gdb, but I'm not sure). We do at
> least know that the REFRESH is using parallel workers.
>
> Can you post the output of EXPLAIN ANALYZE for the SELECT? That might
> provide some clues.
>
> Sure, here is the explain analyze for the select:
>
> HashAggregate (cost=123986470.20..129486707.63 rows=164493082 width=24)
> (actual time=697250.385..741548.965 rows=59015171 loops=1)
> Group Key: id1, id2, id3
> Planned Partitions: 64 Batches: 65 Memory Usage: 328209kB Disk
> Usage: 6750176kB
> -> Gather (cost=64653301.50..107228737.47 rows=328986164 width=24)
> (actual time=357598.331..594226.355 rows=161151623 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> HashAggregate (cost=64652301.50..74329121.07
> rows=164493082 width=24) (actual time=357572.082..578038.457
> rows=53717208 loops=3)
> Group Key: id1, id2, id3
> Planned Partitions: 64 Batches: 65 Memory Usage:
> 328209kB Disk Usage: 25774088kB
> Worker 0: Batches: 65 Memory Usage: 328209kB Disk
> Usage: 25375784kB
> Worker 1: Batches: 65 Memory Usage: 328209kB Disk
> Usage: 25382936kB
> -> Parallel Seq Scan on large_table
> (cost=0.00..29740358.40 rows=685387840 width=24) (actual
> time=12.954..99596.289 rows=548310252 loops=3)
> Planning Time: 5.380 ms
> Execution Time: 745750.371 ms
Ok, so that's using a parallel query as well, just like REFRESH, so no
help there.
At this point I think you'd need to do some OS-level investigation using
a tool like perf to determine what the parent process is doing once the
workers finish.
BTW, I did just come across a script[1] that will print the query plan
for a running query. Based on what you've said I suspect it wouldn't
tell us much here, but I wanted to mention it.
1: https://github.com/StarfishStorage/explain-running-query
--
Jim Nasby, Data Architect, Austin TX