Re: Postgresql as a dictionary coder backend?

Поиск
Список
Период
Сортировка
От Attila Nagy
Тема Re: Postgresql as a dictionary coder backend?
Дата
Msg-id 4D3DE255.7030605@fsn.hu
обсуждение исходный текст
Ответ на Re: Postgresql as a dictionary coder backend?  (Fredric Fredricson <Fredric.Fredricson@bonetmail.com>)
Список pgsql-general
  On 01/24/2011 05:27 AM, Fredric Fredricson wrote:
>>
>> I can easily do this with any RDBMS, with a table of three columns:
>> auto incremented ID, word and refcount, with a unique index on word.
>> The challenge could be:
>> - that it should scale to several TBs of size and several (hundred)
>> billion of records. One scenario would be to store about 40 TBs of
>> words and the average word length would be about 50-60 bytes (that's
>> about 800*10^9 records). It should work well both for inserting and
>> searching (encoding and decoding) words.
> 50-60 bytes! Oh, so we are not talking about natural language here.
> Sorry, I just assumed that.
Yes, sorry.
> Still, I think performance will be a big issue here. I have never
> tried postgresql on anything faster than a fast PC but in my humble
> experience an insert will take at least one ms. With this speed
> 800*10^9 records would take 25 years to insert.
> I think you have to think bigger than a single server (ok, that was
> stating the obvious).
Well, to be honest, that would be for a single server. And I would like
to have more than a single server. :)
One ms is 1000 TPS, if you can't utilize the DB more with multiple
threads. I haven't really got a fast PC yet, but I could easily achieve
12k TPS with some multicore stuff and BBWC RAID controllers.
It's only 771 years then. :)
>> - I need atomicity and durability, but having these on a word
>> (record) level takes too much IOPS and have no use, so it would be
>> good to have an interface for inserting about 1000-500000 words in
>> one call, assign a unique ID to each unique words and store them (if
>> the word has had already an ID, increment its refcount) and give back
>> the IDs for each words. This transaction could be committed as one,
>> so the transactions could be big, sparing IOPS.
>> - I need concurrency, so when the above happens from two sources at
>> the same time, the same word in the two transactions must get the
>> same ID
> Unless the ID is some kind of hash you will have to serialize inserts
> of new words.
I start to think this too, if the unique constraint is there...
>>
>> Is postgresql a good choice for doing this and if yes, what would be
>> the optimal (for both time and space efficiency at encoding and
>> decoding) use case?
> I might be wrong but this kind of project should probably not rely on
> a "standard" RDBMS. The data structure itself does not seem to complex
> and the performance requirements are quite demanding.
I've done some experiments with BerkeleyDB and Tokyo/Kyoto Cabinet, but
I was curious about the opinion of the RDBMS masters.

Thanks for sharing.

В списке pgsql-general по дате отправления:

Предыдущее
От: Attila Nagy
Дата:
Сообщение: Re: Postgresql as a dictionary coder backend?
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: error while trying to change the database encoding on a database