Re: Hash Indexes

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Hash Indexes
Дата
Msg-id CAA4eK1+8b++bT+-doUta1K2bob-iJjA7H_zcd5+3FX4Eq74Yxg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash Indexes  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Ответы Re: Hash Indexes  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Список pgsql-hackers
On Wed, Sep 14, 2016 at 12:29 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> On 09/13/2016 07:26 AM, Amit Kapila wrote:
>>
>> Attached, new version of patch which contains the fix for problem
>> reported on write-ahead-log of hash index thread [1].
>>
>
> I have been testing patch in various scenarios, and it has a positive
> performance impact in some cases.
>
> This is especially seen in cases where the values of the indexed column are
> unique - SELECTs can see a 40-60% benefit over a similar query using b-tree.
>

Here, I think it is better if we have the data comparing the situation
of hash index with respect to HEAD as well.  What I mean to say is
that you are claiming that after the hash index improvements SELECT
workload is 40-60% better, but where do we stand as of HEAD?

> UPDATE also sees an improvement.
>

Can you explain this more?  Is it more compare to HEAD or more as
compare to Btree?  Isn't this contradictory to what the test in below
mail shows?

> In cases where the indexed column value isn't unique, it takes a long time
> to build the index due to the overflow page creation.
>
> Also in cases where the index column is updated with a high number of
> clients, ala
>
> -- ddl.sql --
> CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
> CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
> CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
> ANALYZE;
>
> -- test.sql --
> \set id random(1,10)
> \set val random(0,10)
> BEGIN;
> UPDATE test SET val = :val WHERE id = :id;
> COMMIT;
>
> w/ 100 clients - it takes longer than the b-tree counterpart (2921 tps for
> hash, and 10062 tps for b-tree).
>

Thanks for doing the tests.  Have you applied both concurrent index
and cache the meta page patch for these tests?  So from above tests,
we can say that after these set of patches read-only workloads will be
significantly improved even better than btree in quite-a-few useful
cases.  However when the indexed column is updated, there is still a
large gap as compare to btree (what about the case when the indexed
column is not updated in read-write transaction as in our pgbench
read-write transactions, by any chance did you ran any such test?).  I
think we need to focus on improving cases where index columns are
updated, but it is better to do that work as a separate patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Sharma
Дата:
Сообщение: Re: Write Ahead Logging for Hash Indexes
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Write Ahead Logging for Hash Indexes