Re: Table Design question for gurus (without going to "NoSQL")...
| От | Gavin Flower |
|---|---|
| Тема | Re: Table Design question for gurus (without going to "NoSQL")... |
| Дата | |
| Msg-id | 4EC98CAE.20105@archidevsys.co.nz обсуждение исходный текст |
| Ответ на | Table Design question for gurus (without going to "NoSQL")... (Phoenix Kiula <phoenix.kiula@gmail.com>) |
| Ответы |
Re: Table Design question for gurus (without going to "NoSQL")...
|
| Список | pgsql-general |
On 21/11/11 02:33, Phoenix Kiula wrote: > Hi. > > Want to start another thread, loosely related to the performance > problems thread I have going. > > Need some DB design guidance from the gurus here. > > My big table now has about 70 million rows, with the following columns: > > > alias | character varying(35) > url | text > modify_date | timestamp without time zone > ip | bigint > > > For each IP address (user of my application) I want to have a unique > URL. So I used to have a UNIQUE constraint on IP, URL. But the index > based on this became huge, as some URLs are gigantic. so I introduced > an md5 of the URL: > > > url_md5 | varchar(32) > > > I now have two scenarios: > > 1. To have an index (unique?) on "(ip, url_md5)" > > 2. To not have an index on just the "ip". This way a query that tries > to match "...WHERE ip = 999 AND url_md5 = '<md5 here>'..." will > still look only at the ip bit of the index, then refine it with the > url_md5. > > The good thing about #2 is the size of index remains very small with > only a bigint field (ip) being indexed. > > The bad thing about #2 is that each query of "...WHERE ip = 999 AND > url_md5 = '<md5 here>'..." will have to refine the indexed IP. If one > IP address has put in a lot of URLs, then this becomes a bit slow. As > is now happening, where I have users who have over 1 million URLs > each! > > Questions: > > 1. Instead of md5, is there any integer hashing algorithm that will > allow me to have a bigint column and save a lot hopefully in both > storage space and speed? (Some very useful points mentioned here: > http://stackoverflow.com/questions/1422725/represent-md5-hash-as-an-integer > ) > > 2. If I do go with the above scenario #1 of a joint index, is there > any way I can save space and maintain speed? Partitioning etc are out > of the question. > > With a growing web database, I am sure many people face this > situation. Are nosql type databases the only sane solution to such > massive volumes and throughput expectations (e.g., CouchDb's MemBase)? > > Many thanks for any ideas or pointers! > How about having 2 indexes: one on each of ip & url_md5? Pg will combine the indexes as required, or will just use one if that is best. Why don't you have a time zone on your timestamp??? Regards, Gavin
В списке pgsql-general по дате отправления: