Re: How To: A large [2D] matrix, 100,000+ rows/columns

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: How To: A large [2D] matrix, 100,000+ rows/columns
Дата
Msg-id 3A7FE461-18C4-4109-9997-F28C3D4CD314@gmail.com
обсуждение исходный текст
Ответ на How To: A large [2D] matrix, 100,000+ rows/columns  (Pat Trainor <pat.trainor@gmail.com>)
Список pgsql-general
> On 9 Jun 2023, at 04:17, Pat Trainor <pat.trainor@gmail.com> wrote:

(…)

> Imagine something akin to stocks, where you have a row for every stock, and a column for every stock. Except where
thesame stock is the row & col, a number is at each X-Y (row/column), and that is the big picture. I need to have a
verylarge matrix to maintain & query, and if not (1,600 column limit), then how could such data be broken down to work? 

If your matrix contains values that are all of the same type, as matrices usually do, then a matrix can be described as
theCarthesian product of rows and columns, with values connecting those. 

For rows and columns you could enumerate them using generate_series() or a pair of recursive CTEs, or you could put
theminto their own table. 
For the values (or cells), a tuple of (row, column, value) would be sufficient.

Then in the end, the matrix would be a presentation of the left joins of the Carthesian product of rows and columns
withyour cell values. The left joins are to account for missing cell values (empty cells), or you could explicitly add
tuplesfor those with an ‘empty’ value. 

For presentation, I would use something like Python Pandas and the xlsxwriter.

Data-entry is going to be a bit of a pain if you cannot automate it, and it’s not going to be very space-efficient, but
itdoes fit the relational model this way and it would be easy to expand the matrix in either direction. 

> By wanting postgresql as a solution, am I round-hole, square-pegging myself?

I expect that there are solutions that were explicitly designed for handling (large) matrices and that those would
perhapsperform better. 

> I don't mind keeping, say, a 1,500 column max per table, but then adding new items (stocks in the above analogy)
mightmake it difficult to keep track of things...  

That’s also a possibility, but that sort of pushes the column lookups down to the catalog level and induces overhead on
allother catalog lookups as well. It’s not a huge amount though. 

An alternative approach would be a table of rows (or columns) with each an array of values, especially since you don’t
expectmany updates. That would be a first attempt at optimisation if the pure relational model doesn’t work out. 

> Hoping someone has tackled this before, and performance isn't too big a concern, as the data changes seldom.

Not in practice, AFAIR, but I was thinking of a solution like this for small matrices (Sudoku’s, I hate the things, but
Ineed some solutions to a few to help me with a number of test scripts). 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




В списке pgsql-general по дате отправления:

Предыдущее
От: Alex Lee
Дата:
Сообщение: How to securely isolate databases/users in a multi-tenant Postgresql?
Следующее
От: Ron
Дата:
Сообщение: Re: How to store query result into another table using stored procedure