Re: Millions of tables
От | Álvaro Hernández Tortosa |
---|---|
Тема | Re: Millions of tables |
Дата | |
Msg-id | 7413c596-f2e0-77eb-ad6c-0ce456fc2665@8kdata.com обсуждение исходный текст |
Ответ на | Millions of tables (Greg Spiegelberg <gspiegelberg@gmail.com>) |
Ответы |
Re: Millions of tables
|
Список | pgsql-performance |
On 26/09/16 05:50, Greg Spiegelberg wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a > time has said not to have millions of tables. I too have long > believed it until recently. > > AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) > for PGDATA. Over the weekend, I created 8M tables with 16M indexes on > those tables. Table creation initially took 0.018031 secs, average > 0.027467 and after tossing out outliers (qty 5) the maximum creation > time found was 0.66139 seconds. Total time 30 hours, 31 minutes and > 8.435049 seconds. Tables were created by a single process. Do note > that table creation is done via plpgsql function as there are other > housekeeping tasks necessary though minimal. > > No system tuning but here is a list of PostgreSQL knobs and switches: > shared_buffers = 2GB > work_mem = 48 MB > max_stack_depth = 4 MB > synchronous_commit = off > effective_cache_size = 200 GB > pg_xlog is on it's own file system > > There are some still obvious problems. General DBA functions such as > VACUUM and ANALYZE should not be done. Each will run forever and > cause much grief. Backups are problematic in the traditional pg_dump > and PITR space. Large JOIN's by VIEW, SELECT or via table inheritance > (I am abusing it in my test case) are no-no's. A system or database > crash could take potentially hours to days to recover. There are > likely other issues ahead. > > You may wonder, "why is Greg attempting such a thing?" I looked at > DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face > it, it's antiquated and don't get me started on "Hadoop". I looked at > many others and ultimately the recommended use of each vendor was to > have one table for all data. That overcomes the millions of tables > problem, right? > > Problem with the "one big table" solution is I anticipate 1,200 > trillion records. Random access is expected and the customer expects > <30ms reads for a single record fetch. > > No data is loaded... yet Table and index creation only. I am > interested in the opinions of all including tests I may perform. If > you had this setup, what would you capture / analyze? I have a job > running preparing data. I did this on a much smaller scale (50k > tables) and data load via function allowed close to 6,000 > records/second. The schema has been simplified since and last test > reach just over 20,000 records/second with 300k tables. > > I'm not looking for alternatives yet but input to my test. Takers? > > I can't promise immediate feedback but will do my best to respond with > results. > > TIA, > -Greg Hi Greg. This is a problem (creating a large number of tables; really large indeed) that we researched in my company a while ago. You might want to read about it: https://www.pgcon.org/2013/schedule/events/595.en.html Cheers, Álvaro -- Álvaro Hernández Tortosa ----------- 8Kdata
В списке pgsql-performance по дате отправления:
Предыдущее
От: Dev NopДата:
Сообщение: Re: Storing large documents - one table or partition by doc?