Re: Benchmark Data requested

Поиск
Список
Период
Сортировка
От Dimitri Fontaine
Тема Re: Benchmark Data requested
Дата
Msg-id 200802051815.25800.dfontaine@hi-media.com
обсуждение исходный текст
Ответ на Re: Benchmark Data requested  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Benchmark Data requested  (Simon Riggs <simon@2ndquadrant.com>)
Re: Benchmark Data requested  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Re: Benchmark Data requested --- pgloader CE design ideas  (Dimitri Fontaine <dfontaine@hi-media.com>)
Список pgsql-performance
Le mardi 05 février 2008, Simon Riggs a écrit :
> I'll look at COPY FROM internals to make this faster. I'm looking at
> this now to refresh my memory; I already had some plans on the shelf.

Maybe stealing some ideas from pg_bulkload could somewhat help here?
  http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf

IIRC it's mainly about how to optimize index updating while loading data, and
I've heard complaints on the line "this external tool has to know too much
about PostgreSQL internals to be trustworthy as non-core code"... so...

> > The basic idea is for pgloader to ask PostgreSQL about
> > constraint_exclusion, pg_inherits and pg_constraint and if pgloader
> > recognize both the CHECK expression and the datatypes involved, and if we
> > can implement the CHECK in python without having to resort to querying
> > PostgreSQL, then we can run a thread per partition, with as many COPY
> > FROM running in parallel as there are partition involved (when threads =
> > -1).
> >
> > I'm not sure this will be quicker than relying on PostgreSQL trigger or
> > rules as used for partitioning currently, but ISTM Jignesh quoted § is
> > just about that.
>
> Much better than triggers and rules, but it will be hard to get it to
> work.

Well, I'm thinking about providing a somewhat modular approach where pgloader
code is able to recognize CHECK constraints, load a module registered to the
operator and data types, then use it.
The modules and their registration should be done at the configuration level,
I'll provide some defaults and users will be able to add their code, the same
way on-the-fly reformat modules are handled now.

This means that I'll be able to provide (hopefully) quickly the basic cases
(CHECK on dates >= x and < y), numeric ranges, etc, and users will be able to
care about more complex setups.

When the constraint won't match any configured pgloader exclusion module, the
trigger/rule code will get used (COPY will go to the main table), and when
the python CHECK implementation will be wrong (worst case) PostgreSQL will
reject the data and pgloader will fill your reject data and log files. And
you're back to debugging your python CHECK implementation...

All of this is only a braindump as of now, and maybe quite an optimistic
one... but baring any 'I know this can't work' objection that's what I'm
gonna try to implement for next pgloader version.

Thanks for comments, input is really appreciated !
--
dim

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Benchmark Data requested
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Benchmark Data requested