Thanks very much for your help so far.
>
> (it is pretty confusing that the HashAggregate reports ~6M rows, but
> the sort does 41M rows, but maybe I can not read this).
> Anyway, I think that if You up the work_mem for this query to 512M,
> the sort will be in memory, an thus plenty faster.
Tried this (with work_mem 2GB). It seems to make a difference, but not
enough: the query time is about halved (from 220 sec to 120 sec)
>
> Also, You say You are experiencing unstable query plans, and this may
> mean that geqo is kicking in (but Your query seems too simple for
> that, even considering the views involved). A quick way to check that
> would be to run explain <the query> a coule tens of times, and check
> if the plans change. If they do, try upping geqo_threshold.
It's not unstable from one run to the next; it's unstable from one day
to the next (more on this later)
>
> You have seq_page_cost 4 times larger than random_page_cost. You say
> You are on SSD, so there is no random access penalty. Try setting them
> equal.
>
Again, experimentally, it seems to be non-equal. I didn't benchmark
this, but the random access tests done by TomsHardware et al suggest a
factor 2.5 penalty for random access vs sequential. This is very much
better than rotational disks, but still significant.
> Your plan is full of merge-joins, some indices may be in order. Merge
> join is a kind of "last-chance" plan.
>
I think the fix here is going to be to do more work at write-time and
less at read-time. i.e. rather than having really complex views, we'll
generate some extra tables, and keep them synchronized with triggers.
Richard