Re: Partioning with overlapping and non overlapping constraints
От | Brent Wood |
---|---|
Тема | Re: Partioning with overlapping and non overlapping constraints |
Дата | |
Msg-id | 458ed1b06da840d98ce5d01a0b052dbd@welwex02.niwa.local обсуждение исходный текст |
Ответ на | Re: Partioning with overlapping and non overlapping constraints (Melvin Davidson <melvin6925@gmail.com>) |
Список | pgsql-general |
Hi Mark, We have a somewhat similar situation - we have years of continuous data which are managed in Postgis. The tables are partitionedinto annual subsets. The previous (static) years' underlying tables have a clustered index on UTC timestamp, thecurrent year table has a conventional index. It works well, with 20 odd partitions to date. An annual basis for partitions may not be ideal in your case, but you have not specified how long it takes for your datato become fixed - or if there is a way the database can tell that records are now static. If there is, a scheduled taskwhich migrates such records from a live to fixed partition would perhaps be appropriate. Organising your data by UTC timestamp may be the simplest approach for you. Cheers Brent Wood Programme leader: Environmental Information Delivery NIWA DDI: +64 (4) 3860529 ________________________________________ From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Melvin Davidson [melvin6925@gmail.com] Sent: Tuesday, February 10, 2015 6:01 AM To: Marc Mamin Cc: Tim Uckun; pgsql-general Subject: Re: [GENERAL] Partioning with overlapping and non overlapping constraints Well, without knowing too much about your application, it certainly sounds like using the metricts_YYYYMMDD is the way togo. As for modifying the constraint daily, couldn't you just use where timestamp > current_date - Interval '1 Day' ? On Mon, Feb 9, 2015 at 5:14 AM, Marc Mamin <M.Mamin@intershop.de<mailto:M.Mamin@intershop.de>> wrote: >I have two partitioning questions I am hoping somebody can help me with. > >I have a fairly busy metric(ish) table. It gets a few million records per day, the data is transactional for a while butthen settles down and is used for analytical purposes later. > >When a metric is reported both the UTC time and the local times are stored along with the other data belonging to the metric. Don't you have duplicate information within your UTC, location and local_time data ? Maybe you can just attach a timezone to each location... >I want to partition this table to both make it faster to query and also to spread out the writes. Ideally the partitionswould be based on the UTC timestamp and the sending location. For example > >metrics_location_XXXXX_2015_01_01 > >First problem with this approach is that there could be tens of thousands of locations so this is going to result hundredsof thousands of tables. I know there are no upper limits to how many tables there are but I am thinking this mightreally get me into trouble later. With only a few millions rows per day, weekly or even monthly partitions without regard of locations should be sufficientfor older data. It should be possible to partition your hot data differently; But Instead of using one partition per location, you may usea hash/modulo approach to keep the number of partitions in a reasonable count if required at all (This can be helpful:https://github.com/markokr/pghashlib). Here I would avoid to include time information except for the limit betweenold and hot tables. And depending on the pattern and performance requirement of your analytic queries this may besufficient (i.e. don't partition on the time at all). With smaller partitions for hot data, it should be quite fast to move them one by one to the old data. I have no experiencewith the trigger based partitioning of Postgres (we handle partitions logically at the application level), so I'mnot sure how difficult this approach is. I suppose that you'll need a function that move data from hot to old partitonsand that fix the triggers accordingly. > >Second and possibly more vexing problem is that often the local time is queried. Ideally I would like to put three constraintson the child tables. Location id, UTC timestamp and the local time but obviously the local timestamps would overlapwith other locations in the same timezone Even if I was to only partition by UTC the local timestamps would overlapbetween tables. > >So the questions in a nutshell are. > >1. Should I be worried about having possibly hundreds of thousands of shards. >2. Is PG smart enough to handle overlapping constraints on table and limit it's querying to only those tables that havethe correct time constraint. If you partition on the UTC time only, you don't have overlapping. When querying on the local time, the planner will considerall partitions, but an additional index or constraint on this column should be sufficient as long as your partitioncount remains small. regards, Marc Mamin -- Melvin Davidson I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you. [http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys2/01.gif]
В списке pgsql-general по дате отправления: