Re: [GENERAL] Large data and slow queries

Поиск
Список
Период
Сортировка
От Martijn Tonies \(Upscene Productions\)
Тема Re: [GENERAL] Large data and slow queries
Дата
Msg-id 5D5368E3CC1D4078AE18C506C5CE5867@MARTIJNWS
обсуждение исходный текст
Ответ на [GENERAL] Large data and slow queries  (Samuel Williams <space.ship.traveller@gmail.com>)
Ответы Re: [GENERAL] Large data and slow queries  (Samuel Williams <space.ship.traveller@gmail.com>)
Re: [GENERAL] Large data and slow queries  (vinny <vinny@xs4all.nl>)
Список pgsql-general
Samuel, others,

Perhaps I'm missing something, but I'd be interested in the reasoning behind
this.

For column 'what', it seems you have no index on all values, only indices
with specific values for 'what'.

How does this speed up the search? Will PostgreSQL use those indices,
instead of using a generic index on 'what' and optionally other columns?


With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com


-----Original Message-----
From: Samuel Williams
Sent: Wednesday, April 19, 2017 6:01 AM
To: pgsql-general
Subject: [GENERAL] Large data and slow queries

Hi.

We have 400,000,000 records in a table (soon to be 800,000,000), here
is the schema (\d+)

https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121

We want the following kinds of query to be fast:

SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
"user_event"."what" IN ('poll', 'location_change',
'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
01:23:55') AND (latitude > -37.03079375089291 AND latitude <
-36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
175.0805140220076);

We have a btree index and it appears to be working. However, it's
still pretty slow.

EXPLAIN ANALYZE gives the following:
https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314

I'm thinking that I need to do the following to help:

CLUSTER user_event ON index_user_event_for_visits_3 followed by
analyze... Our data is mostly time series but sometimes we get some
dumps with historical records.

Perhaps add a BRIN index on created_at

I'm wondering if... we can use an index to cache, all user_ids seen on
a given day. If we change our index to be more discrete, e.g.
created_at::date, would this help? The set union of user_ids for 365
days should be pretty fast?

I'm open to any ideas or suggestions, ideally we can keep
optimisations within the database, rather than adding a layer of
caching on top.

Kind regards,
Samuel


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



В списке pgsql-general по дате отправления:

Предыдущее
От: vinny
Дата:
Сообщение: Re: [GENERAL] Large data and slow queries
Следующее
От: Samuel Williams
Дата:
Сообщение: Re: [GENERAL] Large data and slow queries