Обсуждение: Disabling nested loops - worst case performance
Hello list, I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 8.4.7. The basic problem is that when joining multiple times different entities the planner thinks that there is vastly less rows to join than there is in reality and decides to use multiple nested loops for the join chain. This results in queries where when nested loops are enabled, query time is somewhere around 35 seconds, but with nested loops disabled, the performance is somewhere around 100ms. I don't think there is much hope for getting better statistics, as EAV is just not statistics friendly. The values of an attribute depend on the type of the attribute, and different entities have different attributes defined. The planner has no idea of these correlations. Now, my question is: if I disable nested loops completely for the users of the EAV database what kind of worst case performance loss can I expect? I don't mind if a query that normally runs in 100ms now takes 200ms, but about problems where the query will take much more time to complete than with nested loops enabled. As far as I understand these cases should be pretty rare if non-existent? - Anssi
Hello for example queries with LIMIT clause can be significantly faster with nested loop. But you don't need to disable nested loop globally. You can wrap your query to sql functions and disable nested loop just for these functions. Regards Pavel Stehule 2011/3/18 Anssi Kääriäinen <anssi.kaariainen@thl.fi>: > Hello list, > > I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL > 8.4.7. The basic problem is that when joining multiple times different > entities the planner thinks that there is vastly less rows to join than > there is in reality and decides to use multiple nested loops for the join > chain. This results in queries where when nested loops are enabled, query > time is somewhere around 35 seconds, but with nested loops disabled, the > performance is somewhere around 100ms. I don't think there is much hope for > getting better statistics, as EAV is just not statistics friendly. The > values of an attribute depend on the type of the attribute, and different > entities have different attributes defined. The planner has no idea of these > correlations. > > Now, my question is: if I disable nested loops completely for the users of > the EAV database what kind of worst case performance loss can I expect? I > don't mind if a query that normally runs in 100ms now takes 200ms, but about > problems where the query will take much more time to complete than with > nested loops enabled. As far as I understand these cases should be pretty > rare if non-existent? > > - Anssi > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 03/18/2011 09:02 AM, Pavel Stehule wrote: > for example queries with LIMIT clause can be significantly faster with > nested loop. But you don't need to disable nested loop globally. > > You can wrap your query to sql functions and disable nested loop just > for these functions. Thank you for your help, the LIMIT example was something I was not aware of. The problem is we are replacing an old database, and we need to replicate certain views for external users. Minimal impact for these users is required. Maybe it would be best to create special user accounts for these external users and disable nested loops only for those accounts. Otherwise we will disable nested loops when absolutely necessary. - Anssi
18.03.11 09:15, Anssi Kääriäinen написав(ла): > Hello list, > > I am working on a Entity-Attribute-Value (EAV) database using > PostgreSQL 8.4.7. The basic problem is that when joining multiple > times different entities the planner thinks that there is vastly less > rows to join than there is in reality and decides to use multiple > nested loops for the join chain. This results in queries where when > nested loops are enabled, query time is somewhere around 35 seconds, > but with nested loops disabled, the performance is somewhere around > 100ms. I don't think there is much hope for getting better statistics, > as EAV is just not statistics friendly. The values of an attribute > depend on the type of the attribute, and different entities have > different attributes defined. The planner has no idea of these > correlations. Hello. If your queries work on single attribute, you can try adding partial indexes for different attributes. Note that in this case parameterized statements may prevent index usage, so check also with attribute id inlined. Best regards, Vitalii Tymchyshyn
Anssi Kääriäinen, 18.03.2011 08:15: > Hello list, > > I am working on a Entity-Attribute-Value (EAV) database using > PostgreSQL 8.4.7. The basic problem is that when joining multiple > times different entities the planner thinks that there is vastly less > rows to join than there is in reality and decides to use multiple > nested loops for the join chain. Did you consider using hstore instead? I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well. Regards Thomas
On 03/18/2011 12:52 PM, Vitalii Tymchyshyn wrote: > If your queries work on single attribute, you can try adding partial > indexes for different attributes. Note that in this case parameterized > statements may prevent index usage, so check also with attribute id inlined. > > Best regards, Vitalii Tymchyshyn Unfortunately this does not help for the statistics, and (I guess) nested loops will still be used when joining: hot2=> explain analyze select * from attr_value where attr_tunniste = 'suhde_hyvaksytty' and arvo_text = 't'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using attr_value_arvo_text_idx1 on attr_value (cost=0.00..343.59 rows=152 width=118) (actual time=0.076..7.768 rows=3096 loops=1) Index Cond: (arvo_text = 't'::text) Filter: ((attr_tunniste)::text = 'suhde_hyvaksytty'::text) Total runtime: 10.855 ms (4 rows) hot2=> create index suhde_hyvaksytty_idx on attr_value(arvo_text) where attr_tunniste = 'suhde_hyvaksytty'; CREATE INDEX hot2=> analyze attr_value; hot2=> explain analyze select * from attr_value where attr_tunniste = 'suhde_hyvaksytty' and arvo_text = 't'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using suhde_hyvaksytty_idx on attr_value (cost=0.00..43.72 rows=152 width=118) (actual time=0.093..4.776 rows=3096 loops=1) Index Cond: (arvo_text = 't'::text) Total runtime: 7.817 ms (3 rows) - Anssi
On 03/18/2011 01:14 PM, Thomas Kellerer wrote: > Did you consider using hstore instead? > > I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well. No, we did not. The reason is that we want to track each attribute with bi-temporal timestamps. The actual database schema for the attribute value table is: CREATE TABLE attr_value ( id SERIAL PRIMARY KEY, olio_id INTEGER NOT NULL REFERENCES base_olio, -- entity identifier attr_tunniste VARCHAR(20) NOT NULL REFERENCES base_attr, -- attr identifier kieli_tunniste VARCHAR(20) REFERENCES kieli, -- lang identifier arvo_number DECIMAL(18, 9), -- value number arvo_ts timestamptz, -- value timestamp arvo_text TEXT, -- value text arvo_valinta_tunniste VARCHAR(20), -- for choice lists: "value_choice_identifier" real_valid_from TIMESTAMPTZ NOT NULL, -- real_valid_from - real_valid_until define when things have been in "real" world real_valid_until TIMESTAMPTZ NOT NULL, db_valid_from TIMESTAMPTZ NOT NULL, -- db_valid_* defines when things have been in the database db_valid_until TIMESTAMPTZ NOT NULL, tx_id_insert INTEGER default txid_current(), tx_id_delete INTEGER, -- foreign keys & checks skipped ); Naturally, we have other tables defining the objects, joins between objects and metadata for the EAV. All data modifications are done through procedures, which ensure uniqueness etc. for the attributes and joins. The data set is small, and performance in general is not that important, as long as the UI is responsive and data can be transferred to other systems in reasonable time. Insert performance is at least 10x worse than when using traditional schema, but it doesn't matter (we have somewhere around 1000 inserts / updates a day max). The only real problem so far is the chained nested loop problem, which really kills performance for some queries. Surprisingly (at least to me) this schema has worked really well, although sometimes there is a feeling that we are implementing a database using a database... - Anssi
On Fri, Mar 18, 2011 at 7:52 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote: > 18.03.11 09:15, Anssi Kääriäinen написав(ла): > Hello. > > If your queries work on single attribute, you can try adding partial indexes > for different attributes. Note that in this case parameterized statements > may prevent index usage, so check also with attribute id inlined. And if your queries work on a single entity instead, you can partition the table per-entity thus "teach" the database enging about the correlation.