Обсуждение: General 'big data' advice....
Hi, Bit of an abstract question I appreciate, however I just thought I'd see what people thought. I have an anonymosied dataset of travel behaviour of some people in a major city (I'd rather not go into details if that's ok). What I intend to do is to work out where they each are for every minute of the day. So for ~80,000 people x 1440 minutes = 115,200,000 rows of data! So a few questions: 1) Is PostgreSQL going to be able to cope with this? In terms of the table size? I think so... 2) My columns will be something like person_id integer, person_timestamp timestamp, person_location_geom geometry Any thoughts on those? The format of the columns? 3) I'll probably create a Primary Key which is a combination of person_id and person_timestamp. Does this sound like a good idea? 4) Should I use some indexes to improve performance maybe? Best wishes James
On 5 August 2013 12:38, James David Smith <james.david.smith@gmail.com> wrote: > Bit of an abstract question I appreciate, however I just thought I'd > see what people thought. I have an anonymosied dataset of travel > behaviour of some people in a major city (I'd rather not go into > details if that's ok). What I intend to do is to work out where they > each are for every minute of the day. So for ~80,000 people x 1440 > minutes = 115,200,000 rows of data! So a few questions: > > 1) Is PostgreSQL going to be able to cope with this? In terms of the > table size? I think so... > > 2) My columns will be something like > person_id integer, > person_timestamp timestamp, > person_location_geom geometry > Any thoughts on those? The format of the columns? > > 3) I'll probably create a Primary Key which is a combination of > person_id and person_timestamp. Does this sound like a good idea? > > 4) Should I use some indexes to improve performance maybe? Try it and see. It really depends on the queries you will run. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Aug 5, 2013 at 1:38 PM, James David Smith <james.david.smith@gmail.com> wrote: > 1) Is PostgreSQL going to be able to cope with this? In terms of the > table size? I think so... > Yes. > 2) My columns will be something like > person_id integer, > person_timestamp timestamp, > person_location_geom geometry > Any thoughts on those? The format of the columns? > I would think about partitioning data on person_id (ranges) or person_timestamp (ranges) so to have a few smaller tables than a huge table. > 3) I'll probably create a Primary Key which is a combination of > person_id and person_timestamp. Does this sound like a good idea? > Is the same person at different timestamps a different person for your logic? > 4) Should I use some indexes to improve performance maybe? It depends on which queries you are going to run, on which is organized your database (partitioning or not) and other details. Luca