Thank you Dave. I've opened an SR with GP and see if they have any good suggestion on changing the plan.
Thanks, Suya
From: David Rowley [dgrowleyml@gmail.com] Sent: Tuesday, October 28, 2014 6:06 PM To: Huang, Suya Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] unnecessary sort in the execution plan when doing group by
Most likely the reason you're getting the difference in plan is because the planner is probably decided that there will be too many hash entries for a hash table based on the 3 grouping columns... Look at the estimates, 1683 with 2 columns and 1196982 with the 3 columns. If those estimates turned out to be true, then the hash table for 3 columns will be massively bigger than it would be with 2 columns. With PostgreSQL you might see the plan changing if you increased the work_mem setting. For greenplum, I've no idea if that's the same.
Databases are often not very good at knowing with the number of distinct values would be over more than 1 column. Certain databases have solved this with multi column statistics, but PostgreSQL does not have these. Although I just noticed last night that someone is working on them.