I easily expect a much bigger overhead in building millions of Python object compared to building 20. Not only for the 37 bytes of overhead each string has (sys.getsizeof()), but also for the consequences for the GC to manage objects in the millions.
For the record, I've eventually settled for a solution using pgnumpy.
It's capable of handling results made of millions of rows with very little overhead as far as I could see.
As my original goal was to feed the data to Pandas down the line, pgnumpy seems spot on.