COPY cannot be optimised correctly if we have before triggers or
volatile default expressions.
The multi-insert code detects those cases and falls back to the single
row mechanism in those cases.
There a common class of volatile functions that wouldn't cause
problems: any volatile function that doesn't touch the table being
loaded and still works correctly when called with alternately ordered
data.
I claim this is a common class, since sequence next_val functions and
uuid generators meet that criteria and most common forms of auditing
trigger, as well as any other form of data-reformatting trigger. Since
this is a common case, it seems worth optimising.
What I'd like to do is to invent a new form of labelling that allows
us to understand that COPY can still be optimised. I'm thinking to add
a new function label, something like one of
* IDEMPOTENT
* ORDER INDEPENDENT
* BATCHABLE
* NON SELF REFERENCING
* GO FASTER DAMMIT
etc
I'm sure many people will have a much more exact description and a
better name than I do.
This becomes more important when we think about parallelising SQL,
since essentially the same problem exists with parallel SQL calling
volatile functions. Oracle handles that by having a pragma to allow a
function to be declared as parallel safe.
I was also thinking that the best way to do this would be to invent a
new flexible function labelling scheme, so use something like hstore
to store a list of function attributes. Something that would mean we
don't have to invent new keywords every time we have a new function
label.
Suggestions please.
--Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services