Обсуждение: index organized tables use case
Hello, I'm thinking about migrating from another DBMS to postgresql. I have an almost working proof of concept, but still have some doubts about the following use case. I have a table like the following CREATE TABLE test ( code character varying(32) NOT NULL, tag integer NOT NULL, value double precision, CONSTRAINT test_pkey PRIMARY KEY (code, tag) ); It represents a sequence (with holes) of values associated with a code. The application code usually performs selection queries like select tag, value from test where code='XXX'; also, deletions are like delete from test where code='XXX'; and insertions follow the same pattern (all the data for a code is inserted using a loop in a single transaction). That's more or less all. so this type of workload is greatly enhanced by an index-organized table (oracle) or a clustered index (SQL Server/Sybase). From what I understood this kind of table is presently not supported by postgresql (is it?) so, what do you advice? I would think about something like CREATE TABLE testnew ( code character varying(32) NOT NULL, first_tag integer, /* the tag value associated with the first value */ "values" double precision[], /* the datum, or NaN if not valid */ valid_values bit(1)[], /* true if a datum is present */ CONSTRAINT testnew_pkey PRIMARY KEY (code) ); but this would require an application refactoring. Any idea? TIA, e.
On 12/12/07, Enrico Sirola <enrico.sirola@gmail.com> wrote: > Hello Isak, > > Isak Hansen ha scritto: > > > Have a look at the cluster operation; > > <http://www.postgresql.org/docs/8.3/static/sql-cluster.html>. > > > > AFAIK it does lock & duplicate the whole table during reordering, > > which may or may not be an issue for you. > Sorry Enrico and list, the respond-to setting on this list gets me every time.. > thanks for the reply; I was aware about this option, anyway I think > probably it won't be practical: the table is very big and this exclusive > lock would probably be a pain. I think probably I will start with the > present relation and then redesign the application at a second stage This is a long shot, but if the table is huge you could also consider partitioning. That really depends on the contents of 'code', though.. See <http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html>. Isak > Thanks, > e. >
Enrico Sirola wrote: > The application code usually performs selection queries like > > select tag, value from test where code='XXX'; > > also, deletions are like > > delete from test where code='XXX'; > > and insertions follow the same pattern (all the data for a code is > inserted using a loop in a single transaction). That's more or less all. > > so this type of workload is greatly enhanced by an index-organized table > (oracle) or a clustered index (SQL Server/Sybase). Hmm - I'm not sure it does benefit that much. I mean, if you're going to be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps to have the table with the same order as your primary key. Otherwise, I'd be doubtful you'd see that much benefit. -- Richard Huxton Archonet Ltd
Richard Huxton, 12.12.2007 16:12: > Hmm - I'm not sure it does benefit that much. I mean, if you're going to > be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps > to have the table with the same order as your primary key. Otherwise, > I'd be doubtful you'd see that much benefit. > At least for Oracle it's not mainly the order that improves the performance, but the fact that all the data is kept in the index, so Oracle does not need to go back to the table data after looking up the index entry. There is no "table data" for an index-organized table in Oracle, so only a single lookup is needed. Thomas
On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote: > Richard Huxton, 12.12.2007 16:12: > > Hmm - I'm not sure it does benefit that much. I mean, if you're going to > > be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps > > to have the table with the same order as your primary key. Otherwise, > > I'd be doubtful you'd see that much benefit. > > At least for Oracle it's not mainly the order that improves the > performance, but the fact that all the data is kept in the index, so > Oracle does not need to go back to the table data after looking up the > index entry. There is no "table data" for an index-organized table in > Oracle, so only a single lookup is needed. > Yeah, thats a nice feature, and one thats not directly available in Postgres. The thing to concentrate on here is the techniques that are available in Postgres that might help keep the same schema performant. I think looking at partitioning or at partial indexing (making a new index on the code clause) could help keep performance with minimal impact to the schema. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Dec 12, 2007 9:16 PM, Robert Treat <xzilla@users.sourceforge.net> wrote: > On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote: > > At least for Oracle it's not mainly the order that improves the > > performance, but the fact that all the data is kept in the index, so > > Oracle does not need to go back to the table data after looking up the > > index entry. There is no "table data" for an index-organized table in > > Oracle, so only a single lookup is needed. > > > > Yeah, thats a nice feature, and one thats not directly available in Postgres. > The thing to concentrate on here is the techniques that are available in > Postgres that might help keep the same schema performant. I think looking at > partitioning or at partial indexing (making a new index on the code clause) > could help keep performance with minimal impact to the schema. And keep in mind, there's no such thing as a free lunch here. Oracle makes tradeoffs to do this that mean that rolling back a transaction is MUCH more expensive than it is in pgsql. I'm sure there are other tradeoffs as well. BTW, apparently, InnoDB does the same type of thing, and also suffers from the VERY expensive rollback issues as well.