Re: Half billion records in one table? RDS

Поиск
Список
Период
Сортировка
От Aaron Werman
Тема Re: Half billion records in one table? RDS
Дата
Msg-id CAER1-9rYmb=UMM8psNeOPVMAvYgXsuzfC1Yw4YBGKEWWZsNdAg@mail.gmail.com
обсуждение исходный текст
Ответ на Half billion records in one table? RDS  (Jean Baro <jfbaro@gmail.com>)
Список pgsql-performance
Why not store metadata in pg and the payload in S3? 

On Mon, Nov 27, 2017 at 11:58 AM Jean Baro <jfbaro@gmail.com> wrote:
Hi there,

We are creating a new DB which will behave most like a file system, I mean, there will be no complex queries or joins running in the DB. The idea is to grab the WHOLE set of messages for a particular user and then filter, order, combine or full text search in the function itself (AWS Lambda). The maximum number of messages is limited to 1.000 messages per user. So we expect Postgres to have an amazing performance for this scenario.

As I am not really familiar with PG (9.6, or 10, in case RDS release it before February) I would like to share what we are planning to do for this DB. So if you guys could share your thoughts, that would be great! :)

Table structure:

 

·         MessageID (UUID) - PK

·         UserCountry (ISO)

·         UserRole (TEXT 15)

·         UserID (TEXT 30) – FK (although there is no constraint)

·         LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and end_date?)

·         Channel (TEXT 15)

·         Tags (TEXT 2000)

·         Menu (TEXT 200)

·         Icon (TEXT 500) – URL to an image which will be used as an icon;

·         Title (TEXT 150)

·         Body (JSON – up to 10K) – Meta data describing all the data to a specific type of message. The JSON changes according to the type of message. We are assuming most messages will use less than 1K for this field.

·         Delete (BOOLEAN) – Soft Delete

·         Created (Timestamp – With TZ)

·         CreatedBy (TEXT 50)

 

Only 1 table

·         Messages

3 indexes:

·         MessageID PK (UUID)

·         Main fetch key (UserCountry + UserID) - *****

·         End_date (To locate old messages that can be moved to another DB - which will hold the old messages);

 

Sizing and worst case scenario:

 

·         500MM messages in the main DB

·         4K queries per second (by UserID) – Max time of 500ms per query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a particular user. MAX 1000 messages per USER.

·         1K inserts per second on average (So that in 1 hour we can insert around 3MM messages)

·         1K deletes per second on average (So that in 1 hour we can remove around 3MM messages)


My question is:

  • Can we use any kind of compression for PostgreSQL which would result in reduced IO and disk size?
  • We are not relying on any kind of table partitioning, is that the best approach for this scenario?
  • Is PG on RDS capable of delivering this type of performance while requiring low maintenance?
  • What about Auto Vacuum? Any suggestion how to optimize it for such a work load (we will insert and delete millions of rows every day).
P.S.: We are going to test all this, but if we don't get the performance we are expecting, all optimization tips from you guys will be really appreciated. :)

Thanks



--

Regards,
/Aaron

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bitmap scan is undercosted? - boolean correlation
Следующее
От: Matthew Hall
Дата:
Сообщение: Re: insert and query performance on big string table with pg_trgm