1) While comparing this to the FILTER clause we already have for aggregates, I've noticed the aggregate version is
FILTER '(' WHERE a_expr ')'
while here we have
FILTER '(' a_expr ')'
For a while I was thinking that maybe we should use the same syntax here, but I don't think so. The WHERE bit seems rather unnecessary and we probably implemented it only because it's required by SQL standard, which does not apply to COPY. So I think this is fine.
ok
2) The various parser checks emit errors like this:
case EXPR_KIND_COPY_FILTER: err = _("cannot use subquery in copy from FILTER condition"); break;
I think the "copy from" should be capitalized, to make it clear that it refers to a COPY FROM command and not a copy of something.
3) I think there should be regression tests for these prohibited things, i.e. for a set-returning function, for a non-existent column, etc.
fixed
4) What might be somewhat confusing for users is that the filter uses a single snapshot to evaluate the conditions for all rows. That is, one might do this
create or replace function f() returns int as $$ select count(*)::int from t; $$ language sql;
and hope that
copy t from '/...' filter (f() <= 100);
only ever imports the first 100 rows - but that's not true, of course, because f() uses the snapshot acquired at the very beginning. For example INSERT SELECT does behave differently:
test=# copy t from '/home/user/t.data' filter (f() < 100); COPY 81 test=# insert into t select * from t where f() < 100; INSERT 0 19
Obviously, this is not an issue when the filter clause references only the input row (which I assume will be the primary use case).
Not sure if this is expected / appropriate behavior, and if the patch needs to do something else here.
Comparing with overhead of setting snapshot before evaluating every row and considering this
kind of usage is not frequent it seems to me the behavior is acceptable