Hi everybody, I would like to add query sampling support to postgresql (atleast as a part ofmy project, if someone
feelsstrongly against checking it in the main branch).
I have been going over the code and I do see a lot of sampling stuff
in backend/commands/analyze.c. However, I plan to add sampling support
to the
executor, allowing the following types of queries:
SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLEBERNOULLI(10) REPEATABLE(5) GROUP BY STORE
(This is supported by DB2).
For starters I think this should be doable in the executor by cannibalizing
nodeSeqscan.c and adding sampling support to it.
However I am concerned about the planner optimizations as it might decide
to run an index scan (instead of a sequential scan) for a particular
base relation.
My question is: Is there any easy way of forcing the optimizer to
choose sequential
scan for a particular relation? (I apologize if this is documented in
the planner code
as I am still going over it).
I would appreciate any other comments.
Thanks much,
Varun