Re: Benchmark Data requested

Поиск
Список
Период
Сортировка
От NikhilS
Тема Re: Benchmark Data requested
Дата
Msg-id d3c4af540802052338s7bd3649tafe1b53d3894b4e9@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Benchmark Data requested  (Greg Smith <gsmith@gregsmith.com>)
Список pgsql-performance
Hi,

On Feb 6, 2008 9:05 AM, Greg Smith <gsmith@gregsmith.com> wrote:
On Tue, 5 Feb 2008, Simon Riggs wrote:

> On Tue, 2008-02-05 at 15:50 -0500, Jignesh K. Shah wrote:
>>
>> Even if it is a single core, the mere fact that the loading process will
>> eventually wait for a read from the input file which cannot be
>> non-blocking, the OS can timeslice it well for the second process to use
>> those wait times for the index population work.
>
> If Dimitri is working on parallel load, why bother?

pgloader is a great tool for a lot of things, particularly if there's any
chance that some of your rows will get rejected.  But the way things pass
through the Python/psycopg layer made it uncompetative (more than 50%
slowdown) against the straight COPY path from a rows/second perspective
the last time (V2.1.0?) I did what I thought was a fair test of it (usual
caveat of "with the type of data I was loading").  Maybe there's been some
gigantic improvement since then, but it's hard to beat COPY when you've
got an API layer or two in the middle.

I think, its time now that we should jazz COPY up a bit to include all the discussed functionality. Heikki's batch-indexing idea is pretty useful too. Another thing that pg_bulkload does is it directly loads the tuples into the relation by constructing the tuples and writing them directly to the physical file corresponding to the involved relation, bypassing the engine completely (ofcourse the limitations that arise out of it are not supporting rules, triggers, constraints, default expression evaluation etc). ISTM, we could optimize the COPY code to try to do direct loading too (not necessarily as done by pg_bulkload) to speed it up further in certain cases.

Another thing that we should add to COPY is the ability to continue data load across errors as was discussed recently on hackers some time back too.

Regards,
Nikhils
--
EnterpriseDB               http://www.enterprisedb.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Benchmark Data requested
Следующее
От: SURANTYN Jean François
Дата:
Сообщение: Optimizer : query rewrite and execution plan ?