Обсуждение: General 'big data' advice....

Поиск
Список
Период
Сортировка

General 'big data' advice....

От
James David Smith
Дата:
Hi,

Bit of an abstract question I appreciate, however I just thought I'd
see what people thought. I have an anonymosied dataset of travel
behaviour of some people in a major city (I'd rather not go into
details if that's ok). What I intend to do is to work out where they
each are for every minute of the day. So for ~80,000 people x 1440
minutes = 115,200,000 rows of data! So a few questions:

1) Is PostgreSQL going to be able to cope with this? In terms of the
table size? I think so...

2) My columns will be something like
person_id integer,
person_timestamp timestamp,
person_location_geom geometry
Any thoughts on those? The format of the columns?

3) I'll probably create a Primary Key which is a combination of
person_id and person_timestamp. Does this sound like a good idea?

4) Should I use some indexes to improve performance maybe?

Best wishes

James


Re: General 'big data' advice....

От
Simon Riggs
Дата:
On 5 August 2013 12:38, James David Smith <james.david.smith@gmail.com> wrote:

> Bit of an abstract question I appreciate, however I just thought I'd
> see what people thought. I have an anonymosied dataset of travel
> behaviour of some people in a major city (I'd rather not go into
> details if that's ok). What I intend to do is to work out where they
> each are for every minute of the day. So for ~80,000 people x 1440
> minutes = 115,200,000 rows of data! So a few questions:
>
> 1) Is PostgreSQL going to be able to cope with this? In terms of the
> table size? I think so...
>
> 2) My columns will be something like
> person_id integer,
> person_timestamp timestamp,
> person_location_geom geometry
> Any thoughts on those? The format of the columns?
>
> 3) I'll probably create a Primary Key which is a combination of
> person_id and person_timestamp. Does this sound like a good idea?
>
> 4) Should I use some indexes to improve performance maybe?

Try it and see. It really depends on the queries you will run.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: General 'big data' advice....

От
Luca Ferrari
Дата:
On Mon, Aug 5, 2013 at 1:38 PM, James David Smith
<james.david.smith@gmail.com> wrote:

> 1) Is PostgreSQL going to be able to cope with this? In terms of the
> table size? I think so...
>

Yes.

> 2) My columns will be something like
> person_id integer,
> person_timestamp timestamp,
> person_location_geom geometry
> Any thoughts on those? The format of the columns?
>

I would think about partitioning data on person_id (ranges) or
person_timestamp (ranges) so to have a few smaller tables than a huge
table.


> 3) I'll probably create a Primary Key which is a combination of
> person_id and person_timestamp. Does this sound like a good idea?
>

Is the same person at different timestamps a different person for your logic?


> 4) Should I use some indexes to improve performance maybe?

It depends on which queries you are going to run, on which is
organized your database (partitioning or not) and other details.

Luca