Table Design question for gurus (without going to "NoSQL")...

Поиск
Список
Период
Сортировка
От Phoenix Kiula
Тема Table Design question for gurus (without going to "NoSQL")...
Дата
Msg-id CAFWfU=sS4C1-FT=EyxjKAtWzqoxMHrDL7+o_unWDbUB9kNwQFg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Table Design question for gurus (without going to "NoSQL")...  (Phoenix Kiula <phoenix.kiula@gmail.com>)
Re: Table Design question for gurus (without going to "NoSQL")...  (David Johnston <polobo@yahoo.com>)
Re: Table Design question for gurus (without going to "NoSQL")...  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Re: Table Design question for gurus (without going to "NoSQL")...  (Ondrej Ivanič <ondrej.ivanic@gmail.com>)
Список pgsql-general
Hi.

Want to start another thread, loosely related to the performance
problems thread I have going.

Need some DB design guidance from the gurus here.

My big table now has about 70 million rows, with the following columns:


 alias           | character varying(35)
 url             | text
 modify_date     | timestamp without time zone
 ip              | bigint


For each IP address (user of my application) I want to have a unique
URL. So I used to have a UNIQUE constraint on IP, URL. But the index
based on this became huge, as some URLs are gigantic. so I introduced
an md5 of the URL:


 url_md5             | varchar(32)


I now have two scenarios:

1. To have an index (unique?) on "(ip, url_md5)"

2. To not have an index on just the "ip". This way a query that tries
to match   "...WHERE ip = 999 AND url_md5 = '<md5 here>'..." will
still look only at the ip bit of the index, then refine it with the
url_md5.

The good thing about #2 is the size of index remains very small with
only a bigint field (ip) being indexed.

The bad thing about #2 is that each query of "...WHERE ip = 999 AND
url_md5 = '<md5 here>'..."  will have to refine the indexed IP. If one
IP address has put in a lot of URLs, then this becomes a bit slow. As
is now happening, where I have users who have over 1 million URLs
each!

Questions:

1. Instead of md5, is there any integer hashing algorithm that will
allow me to have a bigint column and save a lot hopefully in both
storage space and speed?  (Some very useful points mentioned here:
http://stackoverflow.com/questions/1422725/represent-md5-hash-as-an-integer
)

2. If I do go with the above scenario #1 of a joint index, is there
any way I can save space and maintain speed? Partitioning etc are out
of the question.

With a growing web database, I am sure many people face this
situation. Are nosql type databases the only sane solution to such
massive volumes and throughput expectations (e.g., CouchDb's MemBase)?

Many thanks for any ideas or pointers!

В списке pgsql-general по дате отправления:

Предыдущее
От: Phoenix Kiula
Дата:
Сообщение: Re: Huge number of INSERTs
Следующее
От: Rob Sargentg
Дата:
Сообщение: 9.1.1 build failure : postgres link fails