Обсуждение: Disabling nested loops - worst case performance

Поиск

Список

Период

Сортировка

Disabling nested loops - worst case performance

От

Anssi Kääriäinen

Дата:

18 марта 2011 г., 06:58:18

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL
8.4.7. The basic problem is that when joining multiple times different
entities the planner thinks that there is vastly less rows to join than
there is in reality and decides to use multiple nested loops for the
join chain. This results in queries where when nested loops are enabled,
query time is somewhere around 35 seconds, but with nested loops
disabled, the performance is somewhere around 100ms. I don't think there
is much hope for getting better statistics, as EAV is just not
statistics friendly. The values of an attribute depend on the type of
the attribute, and different entities have different attributes defined.
The planner has no idea of these correlations.

Now, my question is: if I disable nested loops completely for the users
of the EAV database what kind of worst case performance loss can I
expect? I don't mind if a query that normally runs in 100ms now takes
200ms, but about problems where the query will take much more time to
complete than with nested loops enabled. As far as I understand these
cases should be pretty rare if non-existent?

  - Anssi

Re: Disabling nested loops - worst case performance

От

Pavel Stehule

Дата:

18 марта 2011 г., 07:02:56

Hello

for example queries with LIMIT clause can be significantly faster with
nested loop. But you don't need to disable nested loop globally.

You can wrap your query to sql functions and disable nested loop just
for these functions.

Regards

Pavel Stehule

2011/3/18 Anssi Kääriäinen <anssi.kaariainen@thl.fi>:
> Hello list,
>
> I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL
> 8.4.7. The basic problem is that when joining multiple times different
> entities the planner thinks that there is vastly less rows to join than
> there is in reality and decides to use multiple nested loops for the join
> chain. This results in queries where when nested loops are enabled, query
> time is somewhere around 35 seconds, but with nested loops disabled, the
> performance is somewhere around 100ms. I don't think there is much hope for
> getting better statistics, as EAV is just not statistics friendly. The
> values of an attribute depend on the type of the attribute, and different
> entities have different attributes defined. The planner has no idea of these
> correlations.
>
> Now, my question is: if I disable nested loops completely for the users of
> the EAV database what kind of worst case performance loss can I expect? I
> don't mind if a query that normally runs in 100ms now takes 200ms, but about
> problems where the query will take much more time to complete than with
> nested loops enabled. As far as I understand these cases should be pretty
> rare if non-existent?
>
>  - Anssi
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Disabling nested loops - worst case performance

От

Anssi Kääriäinen

Дата:

18 марта 2011 г., 10:41:13

On 03/18/2011 09:02 AM, Pavel Stehule wrote:
> for example queries with LIMIT clause can be significantly faster with
> nested loop. But you don't need to disable nested loop globally.
>
> You can wrap your query to sql functions and disable nested loop just
> for these functions.

Thank you for your help, the LIMIT example was something I was not aware of.

The problem is we are replacing an old database, and we need to
replicate certain views for external users. Minimal impact for these
users is required. Maybe it would be best to create special user
accounts for these external users and disable nested loops only for
those accounts. Otherwise we will disable nested loops when absolutely
necessary.

  - Anssi

Re: Disabling nested loops - worst case performance

От

Vitalii Tymchyshyn

Дата:

18 марта 2011 г., 10:52:05

18.03.11 09:15, Anssi Kääriäinen написав(ла):
> Hello list,
>
> I am working on a Entity-Attribute-Value (EAV) database using
> PostgreSQL 8.4.7. The basic problem is that when joining multiple
> times different entities the planner thinks that there is vastly less
> rows to join than there is in reality and decides to use multiple
> nested loops for the join chain. This results in queries where when
> nested loops are enabled, query time is somewhere around 35 seconds,
> but with nested loops disabled, the performance is somewhere around
> 100ms. I don't think there is much hope for getting better statistics,
> as EAV is just not statistics friendly. The values of an attribute
> depend on the type of the attribute, and different entities have
> different attributes defined. The planner has no idea of these
> correlations.

Hello.

If your queries work on single attribute, you can try adding partial
indexes for different attributes. Note that in this case parameterized
statements may prevent index usage, so check also with attribute id inlined.

Best regards, Vitalii Tymchyshyn

Re: Disabling nested loops - worst case performance

От

Thomas Kellerer

Дата:

18 марта 2011 г., 11:14:27

Anssi Kääriäinen, 18.03.2011 08:15:
> Hello list,
>
> I am working on a Entity-Attribute-Value (EAV) database using
> PostgreSQL 8.4.7. The basic problem is that when joining multiple
> times different entities the planner thinks that there is vastly less
> rows to join than there is in reality and decides to use multiple
> nested loops for the join chain.

Did you consider using hstore instead?

I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well.

Regards
Thomas

Re: Disabling nested loops - worst case performance

От

Anssi Kääriäinen

Дата:

18 марта 2011 г., 11:41:56

On 03/18/2011 12:52 PM, Vitalii Tymchyshyn wrote:
> If your queries work on single attribute, you can try adding partial
> indexes for different attributes. Note that in this case parameterized
> statements may prevent index usage, so check also with attribute id inlined.
>
> Best regards, Vitalii Tymchyshyn

Unfortunately this does not help for the statistics, and (I guess)
nested loops will still be used when joining:

hot2=> explain analyze select * from attr_value where attr_tunniste =
'suhde_hyvaksytty' and arvo_text = 't';
                                                                   QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using attr_value_arvo_text_idx1 on attr_value
(cost=0.00..343.59 rows=152 width=118) (actual time=0.076..7.768
rows=3096 loops=1)
    Index Cond: (arvo_text = 't'::text)
    Filter: ((attr_tunniste)::text = 'suhde_hyvaksytty'::text)
  Total runtime: 10.855 ms
(4 rows)

hot2=> create index suhde_hyvaksytty_idx on attr_value(arvo_text) where
attr_tunniste = 'suhde_hyvaksytty';
CREATE INDEX
hot2=> analyze attr_value;
hot2=> explain analyze select * from attr_value where attr_tunniste =
'suhde_hyvaksytty' and arvo_text = 't';
                                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using suhde_hyvaksytty_idx on attr_value  (cost=0.00..43.72
rows=152 width=118) (actual time=0.093..4.776 rows=3096 loops=1)
    Index Cond: (arvo_text = 't'::text)
  Total runtime: 7.817 ms
(3 rows)

  - Anssi

Re: Disabling nested loops - worst case performance

От

Anssi Kääriäinen

Дата:

18 марта 2011 г., 12:01:40

On 03/18/2011 01:14 PM, Thomas Kellerer wrote:
> Did you consider using hstore instead?
>
> I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well.
No, we did not. The reason is that we want to track each attribute with
bi-temporal timestamps. The actual database schema for the attribute
value table is:

CREATE TABLE attr_value (
     id SERIAL PRIMARY KEY,
     olio_id INTEGER NOT NULL REFERENCES base_olio, -- entity identifier
     attr_tunniste VARCHAR(20) NOT NULL REFERENCES base_attr, -- attr
identifier
     kieli_tunniste VARCHAR(20) REFERENCES kieli, -- lang identifier
     arvo_number DECIMAL(18, 9), -- value number
     arvo_ts timestamptz, -- value timestamp
     arvo_text TEXT, -- value text
     arvo_valinta_tunniste VARCHAR(20), -- for choice lists:
"value_choice_identifier"
     real_valid_from TIMESTAMPTZ NOT NULL, -- real_valid_from -
real_valid_until define when things have been in "real" world
     real_valid_until TIMESTAMPTZ NOT NULL,
     db_valid_from TIMESTAMPTZ NOT NULL, -- db_valid_* defines when
things have been in the database
     db_valid_until TIMESTAMPTZ NOT NULL,
     tx_id_insert INTEGER default txid_current(),
     tx_id_delete INTEGER,
     -- foreign keys & checks skipped
);

Naturally, we have other tables defining the objects, joins between
objects and metadata for the EAV. All data modifications are done
through procedures, which ensure uniqueness etc. for the attributes and
joins.

The data set is small, and performance in general is not that important,
as long as the UI is responsive and data can be transferred to other
systems in reasonable time. Insert performance is at least 10x worse
than when using traditional schema, but it doesn't matter (we have
somewhere around 1000 inserts / updates a day max). The only real
problem so far is the chained nested loop problem, which really kills
performance for some queries.

Surprisingly (at least to me) this schema has worked really well,
although sometimes there is a feeling that we are implementing a
database using a database...

  - Anssi

Re: Disabling nested loops - worst case performance

От

Claudio Freire

Дата:

18 марта 2011 г., 15:26:33

On Fri, Mar 18, 2011 at 7:52 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
> 18.03.11 09:15, Anssi Kääriäinen написав(ла):
> Hello.
>
> If your queries work on single attribute, you can try adding partial indexes
> for different attributes. Note that in this case parameterized statements
> may prevent index usage, so check also with attribute id inlined.

And if your queries work on a single entity instead, you can partition
the table per-entity thus "teach" the database enging about the
correlation.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Disabling nested loops - worst case performance