Re: Database/Table Design for Global Country Statistics

Поиск
Список
Период
Сортировка
От Richard Huxton
Тема Re: Database/Table Design for Global Country Statistics
Дата
Msg-id 46E7A91B.5070706@archonet.com
обсуждение исходный текст
Ответ на Database/Table Design for Global Country Statistics  (Stefan Schwarzer <stefan.schwarzer@grid.unep.ch>)
Список pgsql-general
Stefan Schwarzer wrote:
> Hi there,
>
> I learned in another posting that my table design - in a polite way -
> "could be improved".
>
> So, before doing any additional design errors, I would like to get
> feedback, if possible.
>
> I am dealing with some 500 tables for worldwide national statistics
> (GDP, population, environment etc.), covering approx. 30 years each. For
> each of these variables, I usually have as well (pre-prepared)
> subregional and regional aggregations too. These could - and should - at
> the end be calculated on-the-fly, and not pre-calculated and imported
> from Excel as it is for the moment.

Might want a trigger to keep them up to date.

> My (national) table for a given variable is in the moment as follows (id
> being the identifier for a specific country):
>
> id   |    1970    |    1971    |    ...    |    2004    |    2005
> -------------------------------------------------------------------
>  1   |   NULL    |      36       |   ...     |      42      |      45
>  2 ......

Ick - fiddly if you we ever reach 2006...

> The new design would be like this:
>
> id   |    year    |    value
> -------------------------------
>  1   |   1970    |     NULL
>  1   |   1971    |      36
>  1   ....
>  1   |   2005    |      45
>  2   |   1970    |      ....
>  2   .....
>
>
> Would that be considered as "good table design" then?

Well if "id" is a country I'd rename it "country_id".

If all years+countries should have the same set of measurements you
might want it in one table:
  (country_id, year, population, area, roads_in_km, ...)

However, if some measurements aren't relevant you're probably better off
with separate table for each measurement: country_population,
country_area etc.

The issue with NULLs is what do they mean. They should mean "unknown",
nothing more and nothing less. However, frequently you want to
distinguish between "no figure available" and "not applicable" (e.g.
"monarch" isn't relevant except in monarchies).

In this case it's probably best practice to separate out the values:
  (country_id integer, year integer, has_monarch boolean, monarch text)
This way you can distinguish between has_monarch=false and monarch=''

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-general по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: Question to schema public
Следующее
От: Tino Wildenhain
Дата:
Сообщение: Re: Database/Table Design for Global Country Statistics