Обсуждение: Joins and links
Hello all! You probably remember me - recently I complained about speed of joins in Postgres. After a short investigation the way was found in which Postgres's optimizer can do the right job. It was constructive discussion. Now I want to tell you what could make Postgres better and faster. And what will make us (our development group) happy. Maybe I am bothering someone, if I do - tell me that. Let me begin. First of all, some accounting reports need to be delivered very fast - within minutes or so. And what's bad is that quite a few of these reports are quite time-consuming and search intensive. In particular, internals of these reports include a lot of joins on tables. Secondly, almost all of accounting information naturally fits into network data model, which can be implemented very efficiently. This stuff described here is not accounting-specific, it can be found in every database which uses master-detail tables and other such types of relations. So. How is join being performed in such cases? Although I am not an expert, I can imagine the way: first there is an (index) scan on first table, and then an (index) scan on the second. It is the best way, reality could be much worse as we have seen. How can we radically improve performance in such cases? There is a simple and quite obvious way. (For you not to think that I am hallucinating I will tell you that there exist some real servers that offer such features I am talking about) We should make a real reference in one table to another! That means there could be special data type called, say, "link", which is a physical record number in the foreign table. Queries could look like this: table1: a int4 b link (->table2) table2: c int4 recnum (system auxiliary field, really a record number in the table) select * from table2 where table1.a > 5 and table1.b = table2.recnum Such joins can fly really fast, as practice shows :) Just consider: the thing table1.b = table2.recnum is a READY-MADE join, so server doesn't have to build anything on top of that. It can simply perform lookup through link, and since it is a physical record number, this is done with the efficiency of C pointers! Thus performance gain is ENORMOUS. And it simplifies the optimizer, because it doesn't have to decide anything about whether to use indices and such like. The join is performed always the same way, and it is the best way. This feature, being implemented, could bring Postgres ahead of most commercial servers, so proving creative abilities of free software community. Let us make a step in the future! Best regards, Leon
At 15:46 +0300 on 05/07/1999, Leon wrote: > ow can we radically improve performance in such cases? There > is a simple and quite obvious way. (For you not to think that > I am hallucinating I will tell you that there exist some > real servers that offer such features I am talking about) > We should make a real reference in one table to another! That > means there could be special data type called, say, "link", > which is a physical record number in the foreign table. > > Queries could look like this: > > table1: > a int4 > b link (->table2) > > table2: > c int4 > recnum (system auxiliary field, really a record number in the table) > > select * from table2 where table1.a > 5 and table1.b = table2.recnum > > Such joins can fly really fast, as practice shows :) If you are interested in such a feature, I would send it to the hackers list and not the general list, which is not intended for development issues, but for general problems and issues with existing versions. The best would be, of course, to get hold of CVS and develop the needed code yourself. That's what open software is all about. Perhaps if it's so important to you, you could pay PostgreSQL incorporated, buying programmer time for this feature. In any case, a message to the hackers list may help you understand how things are implemented in Postgres and how much work will be needed for such a development. On the face of it, I can see several problems with this idea, namely inefficiency in deletions, the need for fixed-length records with no ability to add or drop columns or have variable-length fields without maximum length, and needing to know all the tables that reference a table for the sake of vacuuming. But maybe the hackers (who write Postgres) would think differently. Ask them. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
Leon, I have a few questions abo8ut what you are proposing. My background was in non SQL dbms (DataFlex) which supported recnums which as you pointed out had a number of advantages. However, recnums also have a significant number of problems. Some features like replication have significant dificulties. Others such as exporting and re-loading your data also need special work-arounds. A solution I have seen in some sql dbms (eg MS SQL Server) is to be able to choose one index and have the database table sorted by this index. Then you don't need a separate index for that sort-order. It means that almost any index can work like a recnum and avoid looking in both the index and the data. I am trying to remember the name of this feature but cannot at the moment. Regards Dave -- David Warnock Sundayta Ltd
> Leon, > > I have a few questions abo8ut what you are proposing. > > My background was in non SQL dbms (DataFlex) which supported recnums > which as you pointed out had a number of advantages. However, recnums > also have a significant number of problems. Some features like > replication have significant dificulties. Others such as exporting and > re-loading your data also need special work-arounds. > > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able > to choose one index and have the database table sorted by this index. > Then you don't need a separate index for that sort-order. It means that > almost any index can work like a recnum and avoid looking in both the > index and the data. I am trying to remember the name of this feature but > cannot at the moment. > We have CLUSTER command, but it is something that has to be run manually. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
David Warnock wrote: > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able > to choose one index and have the database table sorted by this index. > Then you don't need a separate index for that sort-order. It means that > almost any index can work like a recnum and avoid looking in both the > index and the data. I am trying to remember the name of this feature but > cannot at the moment. CLUSTER ? -- Maarten Boekhold, boekhold@tibco.com TIBCO Finance Technology Inc. The Atrium Strawinskylaan 3051 1077 ZX Amsterdam, The Netherlands tel: +31 20 3012158, fax: +31 20 3012358 http://www.tibco.com
Maarten Boekhold wrote: > > David Warnock wrote: > > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able > > to choose one index and have the database table sorted by this index. > > Then you don't need a separate index for that sort-order. It means that > > almost any index can work like a recnum and avoid looking in both the > > index and the data. I am trying to remember the name of this feature but > > cannot at the moment. > > CLUSTER ? Thats the one. Thanks. Dave
Bruce, I did not know Postgresql had that. I have just looked at the docs and it seems that the postgresql CLUSTER is similar to the feature in MS SQL Server but at present is rather more limited as a) It is static whereas CLUSTER can be set as an attribute on an index in MS SQL Server and then ther index is not created separately the table is kept permanently sorted by the index. This obviously is very very slow if you add to the middle of the index and wasteful of space if you delete rows from the table. However, for a sequential ID it is supposed to work well. b) as the Postgresql CLUSTER is static it does not replace the index. I should say at this point that I never actually used the CLUSTER feature in MS SQL Server as we decided not to use that product after evaluating it. So I have no practical experience to know how much of the speed improvement wanted by Leon it would deliver. Dave -- David Warnock Sundayta Ltd
> > > David Warnock wrote: > > A solution I have seen in some sql dbms (eg MS SQL Server) is to be able > > to choose one index and have the database table sorted by this index. > > Then you don't need a separate index for that sort-order. It means that > > almost any index can work like a recnum and avoid looking in both the > > index and the data. I am trying to remember the name of this feature but > > cannot at the moment. > > CLUSTER ? > man cluster. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Bruce, > > I did not know Postgresql had that. I have just looked at the docs and > it seems that the postgresql CLUSTER is similar to the feature in MS SQL > Server but at present is rather more limited as It is in the FAQ under Performance too. > a) It is static whereas CLUSTER can be set as an attribute on an index > in MS SQL Server and then ther index is not created separately the table > is kept permanently sorted by the index. This obviously is very very > slow if you add to the middle of the index and wasteful of space if you > delete rows from the table. However, for a sequential ID it is supposed > to work well. Well, sometimes it is better to have control over when this happens. What is why cluster is nice, and we don't have time to add dynamic cluster right now. > > b) as the Postgresql CLUSTER is static it does not replace the index. > > I should say at this point that I never actually used the CLUSTER > feature in MS SQL Server as we decided not to use that product after > evaluating it. So I have no practical experience to know how much of the > speed improvement wanted by Leon it would deliver. See the performance FAQ. If you look in pgsql/contrib/fulltextindex/README, we mention it too because without it, fulltext indexing is slow. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello David, Monday, July 05, 1999 you wrote: D> I should say at this point that I never actually used the CLUSTER D> feature in MS SQL Server as we decided not to use that product after D> evaluating it. So I have no practical experience to know how much of the D> speed improvement wanted by Leon it would deliver. Not much. Because the main idea is to eliminate index scans entirely, whereas CLUSTER is merely "CLUSTER will help because once the index identifies the heap page for the first row that matches, all other rows that match are probably already on the same heap page, saving disk accesses and speeding up the query." - this is at best a few percent gain and means nothing if the database is entirely in memory (as it often is). Best regards, Leon
Bruce, Thanks for your informative messages. > Well, sometimes it is better to have control over when this happens. > What is why cluster is nice, and we don't have time to add dynamic > cluster right now. I personally would not see it as a high priority. My only reason for suggesting it was as a possible way to help provide more performance for Leon without adding the full record number support that he was asking for. Dave -- David Warnock Sundayta Ltd
> Bruce, > > Thanks for your informative messages. > > > Well, sometimes it is better to have control over when this happens. > > What is why cluster is nice, and we don't have time to add dynamic > > cluster right now. > > I personally would not see it as a high priority. > > My only reason for suggesting it was as a possible way to help provide > more performance for Leon without adding the full record number support > that he was asking for. > In fact, you were mentioning that inserting into the middle is slow, but that sequential adding to the end is good, but in fact, heap already does this, doesn't it? I guess if you only add occasionally, it is OK. Also, our no-over-write table structure had a tendency to mess up that ordering because updated rows do not go into the same place as the original row. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Leon, I agree that the static CLUSTER that Postgresql currently supports will not help you much. When I suggested looking for a CLUSTER type feature I only knew of dynamic clustering that removed the need for an index. The discussion is not going anywhere as static clustering will not help you and dynamic clustering is not about to be added. If you are interested in other solutions that do not involve adding record number support (which I personally still feel to be a mistake in a set orientated dbms) then have you considered an application server linked to triggers. For some applications is is mposible for an application server to maintain the latest reports on-line, recalculating as required by a trigger notifying it of relevant changes. Then reporting comes instantly from the app server. If there are a large number of different reports or the reports have a lot of selections and options this may not be possible (but a half way house might still be by using a slightly flattened table structure for reporting). Regards Dave -- David Warnock Sundayta Ltd
Bruce, It is amazing when you get responses written this fast (so that the reponse arrives before the copy of the message from the list). > In fact, you were mentioning that inserting into the middle is slow, but > that sequential adding to the end is good, Yes this is what I was told about the way MS SQL Server does clustering. > but in fact, heap already does this, doesn't it? heap? I am not sure what you mean. > I guess if you only add occasionally, it is OK. > Also, our no-over-write table structure had a tendency to mess up that > ordering because updated rows do not go into the same place as the > original row. I have just been thinking a bit more and have realised that the multi-generational architecture of 6.5 (which I have used in Interbase) means that probably both clustering (in thr dynamic sense) and full record number support as request by Leon are impractical. It seems to me that record number relationships will fail completely if there can be more than one version of a record. (well even if they are forced to work they will lose some/all of their speed advantage). Dynamically clustered indexes might still work but unless tables are appended to only with no inserts or updates then maintainig the table in index order when there can be multiple version of each row would be very slow. Dave -- David Warnock Sundayta Ltd
> Bruce, > > It is amazing when you get responses written this fast (so that the > reponse arrives before the copy of the message from the list). Yep. > > but in fact, heap already does this, doesn't it? > > heap? I am not sure what you mean. Heap is our base table structure. Records are always inserted on the end of the heap file. Vacuum removes old, superceeded rows. > > > I guess if you only add occasionally, it is OK. > > Also, our no-over-write table structure had a tendency to mess up that > > ordering because updated rows do not go into the same place as the > > original row. > > I have just been thinking a bit more and have realised that the > multi-generational architecture of 6.5 (which I have used in Interbase) > means that probably both clustering (in thr dynamic sense) and full > record number support as request by Leon are impractical. Yes, would be a big problem. Most commercial databases have found that the network data model is impractical in most cases. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello David, Monday, July 05, 1999 you wrote: D> If you are interested in other solutions that do not involve adding D> record number support (which I personally still feel to be a mistake in D> a set orientated dbms) Why? There will be no such field as "record number", the only place where it can exist is the field which references another table. I can quite share your feeling about wrongness of physical-oriented things in abstract tables, but don't plain old indices deal with physical record numbers? We could do the same - hide the value stored in such field and only offer the user ability to use it in queries without knowing the value. D> then have you considered an application server D> linked to triggers. Unfortunately, every day user demands new types of reports for financial analysis. And nobody knows what will be user's wish tomorrow. And, besides, it is not only my personal wish. What I am proposing is huge (dozen-fold) performance gain on widespread tasks. If you implement this, happy users will erect a gold monument to Postgres development team. Best regards, Leon
> And, besides, it is not only my personal wish. What I am > proposing is huge (dozen-fold) performance gain on widespread > tasks. If you implement this, happy users will erect a gold > monument to Postgres development team. We(Vadim) did MVCC, and I haven't seen any monuments yet. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Bruce, Monday, July 05, 1999 you wrote: >> I have just been thinking a bit more and have realised that the >> multi-generational architecture of 6.5 (which I have used in Interbase) >> means that probably both clustering (in thr dynamic sense) and full >> record number support as request by Leon are impractical. B> Yes, would be a big problem. Most commercial databases have found that B> the network data model is impractical in most cases. That's exacly why such powerful tool as SQL is incomparably slower than plain dbf in most cases. Ignoring network data model was a big mistake. Best regards, Leon
Hello David, Monday, July 05, 1999 you wrote: D> I have just been thinking a bit more and have realised that the D> multi-generational architecture of 6.5 (which I have used in Interbase) D> means that probably both clustering (in thr dynamic sense) and full D> record number support as request by Leon are impractical. D> It seems to me that record number relationships will fail completely if D> there can be more than one version of a record. Maybe it is a silly question, but what are "more than one version of a record"? In my opinion record is a atomic unique entity. Isn't it? D> (well even if they are D> forced to work they will lose some/all of their speed advantage). Best regards, Leon
Leon wrote: > Why? There will be no such field as "record number", the only > place where it can exist is the field which references another > table. I can quite share your feeling about wrongness of > physical-oriented things in abstract tables, but don't > plain old indices deal with physical record numbers? We could > do the same - hide the value stored in such field and only > offer the user ability to use it in queries without knowing > the value. Leon, In my understanding, pointer based approaches like you are recommending have been implemented in several prototype objected oriented databases. They have been shown to be orders of magnitude slower than set oriented techniques,thus many OO databases are implemented as wrappers over relational systems! In general, the best way to handle stuff like this for reports is to cashe small tables which are joined (like product lookups) in memory to make the queries run much faster. To do this, your design has to be smart, by seperating those tuples which are "active" products from those "inactive" products so that the database can cashe the active records and not the inactive records. Perhaps something like: 1. CREATE VIEW PRODUCT AS ( SELECT * FROM PRODUCT_ACTIVE_CASHED UNION ALL SELECT * FROM PRODUCT_INACTIVE); 2. SELECT ORDER_NO, PRODUCT_NAME FROM ORDER_LINE, PRODUCT WHERE PRODUCT.PRODUCT = ORDER_LINE.PRODUCT and ORDER_LINE.ORDER = 120; Would be a general like solution, where orders with active products are brought up quickly since the join is done in memory, but orders with inactive products take much longer, since the query on the active table is a cashe miss, leaving a disk access on the inactive table. Perhaps there are several other nicer ways do to this, from my understanding a HASH based cashe could allow frequently accesed tuples to be cahsed in memory? ... anyway, I'm no expert. A more traditional method (which I use all the time), is to have canned reports that are pre-generated using common conditions. These are then saved on a web server and updated daily. It is a bit less accurate, but often for 99% of the purposes, day old information is just fine.... Hope this helps! ;) Clark
Hello Clark, Monday, July 05, 1999 you wrote: C> In my understanding, pointer based approaches like you C> are recommending have been implemented in several prototype C> objected oriented databases. They have been shown to be C> orders of magnitude slower than set oriented techniques,thus C> many OO databases are implemented as wrappers over C> relational systems! I can't guess where you got such information. Contrarily, I know at least one (commercial) network database server which orders of magnitude faster than ANY SQL server. It simply no match to them. That experience is exactly what made me write to Postgres mailing list. As I wrote (maybe to hackers' list) pointer lookup takes ideally three CPU commands - read, multiply, lookup, whereas index scan takes dozens of them and puts a strain on optimizer's intellectual abilities, and as we have seen it can hardly choose the optimum way of performing a join. In pointer-field case optimizer can be quite dumb, because there is only one way to perform a query. Best regards, Leon
> Hello David, > > Monday, July 05, 1999 you wrote: > > D> I have just been thinking a bit more and have realised that the > D> multi-generational architecture of 6.5 (which I have used in Interbase) > D> means that probably both clustering (in thr dynamic sense) and full > D> record number support as request by Leon are impractical. > > D> It seems to me that record number relationships will fail completely if > D> there can be more than one version of a record. > > Maybe it is a silly question, but what are "more than one version > of a record"? In my opinion record is a atomic unique entity. > Isn't it? Read how MVCC works in the manuals. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hello Bruce, Tuesday, July 06, 1999 you wrote: >> >> Maybe it is a silly question, but what are "more than one version >> of a record"? In my opinion record is a atomic unique entity. >> Isn't it? B> Read how MVCC works in the manuals. Ah, you mean MVCC! That's what I replied to Tom Lane: > This problem can be solved. An offhand solution is to have > an additional system field which will point to new tuple left after > update. It is filled at the same time as the original tuple is > marked invalid. So the scenario is as follows: we follow the link, > and if we find that in the tuple where we arrived this system field > is not NULL, we go to (the same table of course) where it is pointing > to. Sure VACUUM will eliminate these. Performance penalty is small. Best regards, Leon
Hi Leon, In the long wars between the object databases, Versant which has logical record ids, and ObjectStore which has physical record ids, Versant has consistently beaten ObjectStore in the sort of queries that financial people are likely to do. (On graphic design type apps, ObjectStore tends to win). Versants object ids actually go through an index to get to the physical location. Admittedly a VERY highly optimised index, but an index nevertheless. What this says to me is that Postgres's concept of oids are ok as is. If your queries are too slow either the optimiser is not using an index when it should, or else the indexing mechanism is not fast enough. I suspect it would be nice, from an object database perspective if a special case oid-index index type was written. Physical record ids, have lots of problems. You can't re-organise space dynamically so you have to take your database off-line for a while to totally re-organise it. You lose space because it can't be re-used dynamically. There are problems with backup. I'm writing a web page on what would be needed to make Postgres into an Object database..... http://www.tech.com.au/postgres Leon wrote: > > Hello all! > > You probably remember me - recently I complained about speed > of joins in Postgres. After a short investigation the way was > found in which Postgres's optimizer can do the right job. It > was constructive discussion. Now I want to tell you what could > make Postgres better and faster. And what will make us (our > development group) happy. Maybe I am bothering someone, if > I do - tell me that. > > Let me begin. > > First of all, some accounting reports need to be delivered > very fast - within minutes or so. And what's bad is that > quite a few of these reports are quite time-consuming and search > intensive. In particular, internals of these reports include > a lot of joins on tables. > > Secondly, almost all of accounting information naturally > fits into network data model, which can be implemented very > efficiently. > > This stuff described here is not accounting-specific, it > can be found in every database which uses master-detail > tables and other such types of relations. > > So. How is join being performed in such cases? Although I am > not an expert, I can imagine the way: first there is an (index) > scan on first table, and then an (index) scan on the second. > It is the best way, reality could be much worse as we have seen. > > How can we radically improve performance in such cases? There > is a simple and quite obvious way. (For you not to think that > I am hallucinating I will tell you that there exist some > real servers that offer such features I am talking about) > We should make a real reference in one table to another! That > means there could be special data type called, say, "link", > which is a physical record number in the foreign table. > > Queries could look like this: > > table1: > a int4 > b link (->table2) > > table2: > c int4 > recnum (system auxiliary field, really a record number in the table) > > select * from table2 where table1.a > 5 and table1.b = table2.recnum > > Such joins can fly really fast, as practice shows :) > Just consider: the thing table1.b = table2.recnum is a READY-MADE > join, so server doesn't have to build anything on top of that. It > can simply perform lookup through link, and since it is a physical > record number, this is done with the efficiency of C pointers! Thus > performance gain is ENORMOUS. > > And it simplifies the optimizer, because it doesn't have to decide > anything about whether to use indices and such like. The join is > performed always the same way, and it is the best way. > > This feature, being implemented, could bring Postgres ahead > of most commercial servers, so proving creative abilities of > free software community. Let us make a step in the future! > > Best regards, > Leon
Leon wrote: > > Ah, you mean MVCC! That's what I replied to Tom Lane: > > > This problem can be solved. An offhand solution is to have > > an additional system field which will point to new tuple left after > > update. It is filled at the same time as the original tuple is > > marked invalid. So the scenario is as follows: we follow the link, > > and if we find that in the tuple where we arrived this system field > > is not NULL, we go to (the same table of course) where it is pointing > > to. Sure VACUUM will eliminate these. Performance penalty is small. Old tuple version points to new version right now -:). O how else could we handle updates in read committed mode (new tuple version are not visible to concurrent update). Let's go to hackers list. But also let's relax for some more time, Leon... -:) Vadim