Re: WIP: Collecting statistics on CSV file data

Поиск
Список
Период
Сортировка
От Shigeru Hanada
Тема Re: WIP: Collecting statistics on CSV file data
Дата
Msg-id 4E9C6573.8050507@gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Collecting statistics on CSV file data  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: WIP: Collecting statistics on CSV file data
Re: WIP: Collecting statistics on CSV file data
Re: WIP: Collecting statistics on CSV file data
Список pgsql-hackers
(2011/10/07 18:09), Etsuro Fujita wrote:
> Thank you for the review and the helpful information.
> I rebased. Please find attached a patch. I'll add the patch to the next CF.
> 
> Changes:
> 
>    * cleanups and fixes
>    * addition of the following to ALTER FOREIGN TABLE
>        ALTER [COLUMN] column SET STATISTICS integer
>        ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
>        ALTER [COLUMN] column RESET ( n_distinct )
>    * reflection of the force_not_null info in acquiring sample rows
>    * documentation

The new patch could be applied with some shifts.  Regression tests of
core and file_fdw have passed cleanly.  Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.

New API AnalyzeForeignTable
===========================
In your design, new handler function is called instead of
do_analylze_rel.  IMO this hook point would be good for FDWs which can
provide statistics in optimized way.  For instance, pgsql_fdw can simply
copy statistics from remote PostgreSQL server if they are compatible.
Possible another idea is to replace acquire_sample_rows so that other
FDWs can reuse most part of fileAnalyzeForeignTable and
file_fdw_do_analyze_rel.

And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided.  Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically.  It won't be
efficient but I think it's not so unreasonable.

Other issues
============
There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit.  Perhaps 0 is OK for that parameter.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it.  Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.

Regards,
-- 
Shigeru Hanada


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Underspecified window queries in regression tests
Следующее
От: Alvaro Herrera
Дата:
Сообщение: HeapTupleHeaderAdvanceLatestRemovedXid doing the wrong thing with multixacts