Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

Поиск
Список
Период
Сортировка
От Joshua Tolley
Тема Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Дата
Msg-id e7e0a2570811021641s560a7c27r6816946e766102f3@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>)
Список pgsql-hackers
On Sun, Nov 2, 2008 at 4:48 PM, Lawrence, Ramon <ramon.lawrence@ubc.ca> wrote:
> Joshua,
>
> Thank you for offering to review the patch.
>
> The easiest way to test would be to generate your own TPC-H data and
> load it into a database for testing.  I have posted the TPC-H generator
> at:
>
> http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip
>
> The generator can produce skewed data sets.  It was produced by
> Microsoft Research.
>
> After unzipping, on a Windows machine, you can just run the command:
>
> dbgen -s 1 -z 1
>
> This will produce a TPC-H database of scale 1 GB with a Zipfian skew of
> z=1.  More information on the generator is in the document README-S.DOC.
> Source is provided for the generator, so you should be able to run it on
> other operating systems as well.
>
> The schema DDL is at:
>
> http://people.ok.ubc.ca/rlawrenc/tpch_pg_ddl.txt
>
> Note that the load time for 1G data is 1-2 hours and for 10G data is
> about 24 hours.  I recommend you do not add the foreign keys until after
> the data is loaded.
>
> The other alternative is to do a pgdump on our data sets.  However, the
> download size would be quite large, and it will take a couple of days
> for us to get you the data in that form.
>
> --
> Dr. Ramon Lawrence
> Assistant Professor, Department of Computer Science, University of
> British Columbia Okanagan
> E-mail: ramon.lawrence@ubc.ca

I'll try out the TPC-H generator first :) Thanks.

- Josh


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Lawrence, Ramon"
Дата:
Сообщение: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Simple postgresql.conf wizard