I'm working on a custom aggregate, that generates a serialized data format. The preparation of the geometry before being formatted is pretty intense, so it is probably a good thing for that work to be done in parallel, in partial aggregates. Here's an example SQL call:
EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid, fed_num
FROM pts_10 p
WHERE p.geom && ::geometry_literal
AND p.geom IS NOT NULL
) a;
The ST_AsMVTGeom() function can be comically expensive, it's really good when it's in partial aggregates. But the cost of the function seems to be ignored.
Whether I get a parallel aggregate seems entirely determined by the number of rows, not the cost of preparing those rows.
When changing the number of rows in the subquery, with a LIMIT, I can change from a seq scan to a paralllel seq scan and finally to a parallel aggregate, as the number of rows goes up.
An odd effect: when I have enough rows to get a paralllel seq scan, I get flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That seems odd and backwards.
Is there anywhere a guide or rough description to how costs are used in determining parallel plans? The empirical approach starts to wear one down after a while :)
P.