Обсуждение: Creating A GIN index on JSONB column (large database)

Поиск
Список
Период
Сортировка

Creating A GIN index on JSONB column (large database)

От
Taylor Smith
Дата:
Hi all,

I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.

There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to make the json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxes out which crashes the create process resulting in unhealthy created indexes. 

This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the temp storage to blow up and crash out and is it possible to mitigate or a necessary evil. 

The only solution we have so far is scale up our RDS instance.

Kind regards,

Taylor Smith

Re: Creating A GIN index on JSONB column (large database)

От
Rob Sargent
Дата:

> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Hi all,
>
> I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.
>
> There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to
makethe json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage
maxesout which crashes the create process resulting in unhealthy created indexes.  
>
> This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the
tempstorage to blow up and crash out and is it possible to mitigate or a necessary evil.  
>
> The only solution we have so far is scale up our RDS instance.
>
> Kind regards,
>
> Taylor Smith

The scale-up might be needed only for the create index.  Scale down afterwards?


Re: Creating A GIN index on JSONB column (large database)

От
Taylor Smith
Дата:
Thanks Rob,

This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?




On Tue, Aug 9, 2022 at 3:39 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Hi all,
>
> I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.
>
> There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to make the json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxes out which crashes the create process resulting in unhealthy created indexes.
>
> This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the temp storage to blow up and crash out and is it possible to mitigate or a necessary evil.
>
> The only solution we have so far is scale up our RDS instance.
>
> Kind regards,
>
> Taylor Smith

The scale-up might be needed only for the create index.  Scale down afterwards?

Re: Creating A GIN index on JSONB column (large database)

От
Rob Sargent
Дата:

> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own
indexesif needed? 
>
>

No I would not.  The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up
youcan scale down to what you need.  Was the original spec expecting 20+T?  Any chance of rolling older months off to
someother node? 




Re: Creating A GIN index on JSONB column (large database)

От
Taylor Smith
Дата:
Yes, the database was always going to be a minimum of 20TB however we build the indexes on the partition which at max is about 1.3TB, which in temp files will use about 800GB temporary files. Note it doesn't have the same effect when doing non-concurrently. I don't think rolling off would help as its likely largest partition will either be the current month or the previous month so the problem would still persist.



On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?
>
>

No I would not.  The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up you can scale down to what you need.  Was the original spec expecting 20+T?  Any chance of rolling older months off to some other node?

Re: Creating A GIN index on JSONB column (large database)

От
Ron
Дата:
On 8/9/22 09:39, Rob Sargent wrote:
>
>> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>>
>> Hi all,
>>
>> I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.
>>
>> There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to
makethe json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage
maxesout which crashes the create process resulting in unhealthy created indexes.
 
>>
>> This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the
tempstorage to blow up and crash out and is it possible to mitigate or a necessary evil.
 
>>
>> The only solution we have so far is scale up our RDS instance.
>>
>> Kind regards,
>>
>> Taylor Smith
> The scale-up might be needed only for the create index.  Scale down afterwards?

Scaling down disks means using DMS (which costs money) to migrate the 
instance to a new instance.

-- 
Angular momentum makes the world go 'round.



Re: Creating A GIN index on JSONB column (large database)

От
Ron
Дата:
800GB/20TB = 4%.

You've got a 20TB database, and run out of space when 4% is added.

Either you're running way to close to the edge, or I misunderstand something.

Either way, enable auto-scaling.  https://aws.amazon.com/about-aws/whats-new/2019/06/rds-storage-auto-scaling/


On 8/9/22 10:29, Taylor Smith wrote:
Yes, the database was always going to be a minimum of 20TB however we build the indexes on the partition which at max is about 1.3TB, which in temp files will use about 800GB temporary files. Note it doesn't have the same effect when doing non-concurrently. I don't think rolling off would help as its likely largest partition will either be the current month or the previous month so the problem would still persist.



On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent <robjsargent@gmail.com> wrote:


> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?
>
>

No I would not.  The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up you can scale down to what you need.  Was the original spec expecting 20+T?  Any chance of rolling older months off to some other node?


--
Angular momentum makes the world go 'round.