JSON vs Text + Regexp Index Searching

Поиск
Список
Период
Сортировка
От Eliot Gable
Тема JSON vs Text + Regexp Index Searching
Дата
Msg-id CAD-6L_WZpd2oEPWK6jOzZYnxwHwytziFoSuwnxu7=ODiJ=+j3g@mail.gmail.com
обсуждение исходный текст
Ответы Re: JSON vs Text + Regexp Index Searching  (Merlin Moncure <mmoncure@gmail.com>)
Re: JSON vs Text + Regexp Index Searching  (David Johnston <polobo@yahoo.com>)
Список pgsql-general
I am working on a project for which I require some assistance. Any input would be appreciated.

We have a table with millions of records and dozens of columns. On some systems, it takes up >10GB of disk space, and it runs on a single disk which is rather slow (7200 RPM). We typically don't do any complicated queries on it, so performance has not posed a problem previously. The table is actually defined as 3 tables using inheritance. One table shows only the last 7 days of data, and then a process moves the 8th day into the history table for archiving. For performance reasons, we typically only query the table with 7 days of data, which generally does not exceed 1 million rows. In fact, it is typically under 50k rows for most systems. However, a few systems may contain a few million records, at most. There are indices on each of the fields we query, and we also put limits on the number of records returned. The table is practically an append-only table.

We are looking at adding some additional columns to the table, and planning for some future features. One of those features will allow cross referencing the records in this table with a list of IDs. If the record's list of IDs contains the one we are looking for, we want to include the record in our query. I advocated creating a separate mapping table which maps the ID of these records to the other ID we are searching for and performing a JOIN on the two tables with appropriate foreign key relationships and indices. However, I was ask to instead put the list into a single column on each row to reduce implementation complexity. 

Assuming the list of IDs is in a column on each row as TEXT in the format of a JSON array, what is the best way to index the column so I can quickly find the rows with the given ID?


Limitations:

We run version 9.0.x on these systems right now. We plan on updating later this year, but for now we don't have the JSON type. Once we do, we plan to make use of it. The column value will also be passed to the UI, which is JavaScript-based, so a JSON format would be ideal.

We cannot modify the disks we run on to make them faster. It is cost prohibitive in our particular scenario.

The systems in question are real-time systems, and are carefully optimized. Any excessively lengthy queries which have high disk IO or CPU usage will adversely affect the entire system.


My Thoughts:

Is it possible to use a REGEXP INDEX in this case? For example: column ~ '^\[67|,67,|,67\]$'
Will such an index perform well? How would I create it?

How badly would a simple LIKE operation perform? Is there any way to optimize it?

Would using a JSON type in the future allow us to do any better type of index on the column?

If we wanted to put an index on a JSON data type column whose values was a simple array of IDs, what would the index look like, and how would we construct the query to make use of it?


Thanks in advance for any input / insight on this.

В списке pgsql-general по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: execute query from inside C function
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: JSON vs Text + Regexp Index Searching