Re: Benchmark Data requested

Поиск

Список

Период

Сортировка

От	Dimitri Fontaine
Тема	Re: Benchmark Data requested
Дата	5 февраля 2008 г. 13:07:02
Msg-id	200802051506.51581.dfontaine@hi-media.com обсуждение исходный текст
Ответ на	Re: Benchmark Data requested ("Jignesh K. Shah" <J.K.Shah@Sun.COM>)
Ответы	Re: Benchmark Data requested Re: Benchmark Data requested Re: Benchmark Data requested
Список	pgsql-performance

Дерево обсуждения

Hi,

Le lundi 04 février 2008, Jignesh K. Shah a écrit :
> Single stream loader of PostgreSQL takes hours to load data. (Single
> stream load... wasting all the extra cores out there)

I wanted to work on this at the pgloader level, so CVS version of pgloader is
now able to load data in parallel, with a python thread per configured
section (1 section = 1 data file = 1 table is often the case).
Not configurable at the moment, but I plan on providing a "threads" knob which
will default to 1, and could be -1 for "as many thread as sections".

> Multiple table loads ( 1 per table) spawned via script  is bit better
> but hits wal problems.

pgloader will too hit the WAL problem, but it still may have its benefits, or
at least we will soon (you can already if you take it from CVS) be able to
measure if the parallel loading at the client side is a good idea perf. wise.

[...]
> I have not even started Partitioning of tables yet since with the
> current framework, you have to load the tables separately into each
> tables which means for the TPC-H data you need "extra-logic" to take
> that table data and split it into each partition child table. Not stuff
> that many people want to do by hand.

I'm planning to add ddl-partitioning support to pgloader:
  http://archives.postgresql.org/pgsql-hackers/2007-12/msg00460.php

The basic idea is for pgloader to ask PostgreSQL about constraint_exclusion,
pg_inherits and pg_constraint and if pgloader recognize both the CHECK
expression and the datatypes involved, and if we can implement the CHECK in
python without having to resort to querying PostgreSQL, then we can run a
thread per partition, with as many COPY FROM running in parallel as there are
partition involved (when threads = -1).

I'm not sure this will be quicker than relying on PostgreSQL trigger or rules
as used for partitioning currently, but ISTM Jignesh quoted § is just about
that.

Comments?
--
dim

Вложения

signature.asc

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Guillaume Smet"
Дата: 05 февраля 2008 г., 12:50:09
Сообщение: Re: Performance issue using Tsearch2

Следующее

От: Simon Riggs
Дата: 05 февраля 2008 г., 13:24:56
Сообщение: Re: Benchmark Data requested

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Benchmark Data requested

Вложения

Предыдущее

Следующее