Re: [GENERAL] Large data and slow queries
От | Martijn Tonies \(Upscene Productions\) |
---|---|
Тема | Re: [GENERAL] Large data and slow queries |
Дата | |
Msg-id | 5D5368E3CC1D4078AE18C506C5CE5867@MARTIJNWS обсуждение исходный текст |
Ответ на | [GENERAL] Large data and slow queries (Samuel Williams <space.ship.traveller@gmail.com>) |
Ответы |
Re: [GENERAL] Large data and slow queries
Re: [GENERAL] Large data and slow queries |
Список | pgsql-general |
Samuel, others, Perhaps I'm missing something, but I'd be interested in the reasoning behind this. For column 'what', it seems you have no index on all values, only indices with specific values for 'what'. How does this speed up the search? Will PostgreSQL use those indices, instead of using a generic index on 'what' and optionally other columns? With regards, Martijn Tonies Upscene Productions http://www.upscene.com -----Original Message----- From: Samuel Williams Sent: Wednesday, April 19, 2017 6:01 AM To: pgsql-general Subject: [GENERAL] Large data and slow queries Hi. We have 400,000,000 records in a table (soon to be 800,000,000), here is the schema (\d+) https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 We want the following kinds of query to be fast: SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE "user_event"."what" IN ('poll', 'location_change', 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 01:23:55') AND (latitude > -37.03079375089291 AND latitude < -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < 175.0805140220076); We have a btree index and it appears to be working. However, it's still pretty slow. EXPLAIN ANALYZE gives the following: https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 I'm thinking that I need to do the following to help: CLUSTER user_event ON index_user_event_for_visits_3 followed by analyze... Our data is mostly time series but sometimes we get some dumps with historical records. Perhaps add a BRIN index on created_at I'm wondering if... we can use an index to cache, all user_ids seen on a given day. If we change our index to be more discrete, e.g. created_at::date, would this help? The set union of user_ids for 365 days should be pretty fast? I'm open to any ideas or suggestions, ideally we can keep optimisations within the database, rather than adding a layer of caching on top. Kind regards, Samuel -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
В списке pgsql-general по дате отправления: