Re: Atomic access to large arrays

Поиск
Список
Период
Сортировка
От Victor de Buen
Тема Re: Atomic access to large arrays
Дата
Msg-id f432ce640907221128u7310da7eqb1076db56e0acf14@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Atomic access to large arrays  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
Thank a lot, Merlin.

I will try to fill a sample of grids in a new table with different sizes of subgrids in order to get the better relation between space and speed.

Regards

2009/7/22 Merlin Moncure <mmoncure@gmail.com>
On Tue, Jul 21, 2009 at 7:43 PM, Victor de Buen
(Bayes)<vdebuen@bayesinf.com> wrote:
> Hi
>
> I'm storing historical meteorological gridded data from GFS
> (http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in a
> table like this:
>
> CREATE TABLE grid_f_data_i2 (
>   //Specifies the variable and other features of data
>   id_inventory integer REFERENCES grid_d_inventory(id_inventory),
>   //A new grid is available each 3 hours since 5 years ago
>   dh_date timestamp,
>   //Values are scaled to be stored as signed integers of 2 bytes
>   vl_grid smallint[361][720],
> CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
>   (co_inventory, dh_date)
> );
>
> Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> vl_grid[y][x] stores the value at dh_date of a meteorological variable
> specified by id_inventory in the geodesic point
>
> latitude  = -90 + y*0.5
> longitude = x*0.5
>
> The reverse formula for the closest point in the grid of an arbitary
> geodesic point will be
>
> y = Round((latitude+90) * 2
> x = Round(longitude*2)
>
> Field vl_grid is stored in the TOAST table and has a good compression level.
> PostgreSql is the only one database that is able to store this huge amount
> of data in only 34 GB of disk. It's really great system. Queries returning
> big rectangular areas are very fast, but the target of almost all queries is
> to get historical series for a geodesic point
>
> SELECT  dh_date, vl_grid[123][152]
> FROM  grid_f_data_i2
> WHERE  id_inventory = 6
> ORDER BY dh_date
>
> In this case, atomic access to just a cell of each one of a only few
> thousands of rows becomes too slow.

That's a side effect of your use of arrays.  Arrays are very compact,
and ideal if you always want the whole block of data at once, but
asking for particular point is the down side of your trade off.  I
would suggest maybe experimenting with smaller grid sizes...maybe
divide your big grid into approximately 16 (4x4) separate subgrids.
This should still 'toast', and give decent compression, but mitigate
the impact of single point lookup somewhat.

merlin



--
Víctor de Buen Remiro
Consultor estadístico
Bayes Forecast
www.bayesforecast.com
Tol Development Team member
www.tol-project.org

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: regression ? 8.4 do not apply One-Time Filter to subquery
Следующее
От: David Rees
Дата:
Сообщение: Re: Master/Slave, DB separation or just spend $$$?