Re: [GENERAL] Large data and slow queries

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: [GENERAL] Large data and slow queries
Дата
Msg-id 20170419124815.GL9812@tamriel.snowman.net
обсуждение исходный текст
Ответ на [GENERAL] Large data and slow queries  (Samuel Williams <space.ship.traveller@gmail.com>)
Ответы Re: [GENERAL] Large data and slow queries  (Samuel Williams <space.ship.traveller@gmail.com>)
Список pgsql-general
Greetings,

* Samuel Williams (space.ship.traveller@gmail.com) wrote:
> We want the following kinds of query to be fast:

"kinds of query" isn't helpful, you should be reviewing exactly the
queries you care about because statistics and your exact data set and
what the exact query you're running is will all have an impact.

> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
> "user_event"."what" IN ('poll', 'location_change',
> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
> 01:23:55') AND (latitude > -37.03079375089291 AND latitude <
> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
> 175.0805140220076);

Ugh.  You should really look at and consider PostGIS whenever you're
working with geospatial data.

> We have a btree index and it appears to be working. However, it's
> still pretty slow.

You're finding some 17M records and then reducing that with DISTINCT to
only 114k.  Hard to say if it's faster to just brute-force your way
through that with a HashAgg (as your explain analyze shows), or if a
loose index scan would work better (check out how to do one in PG here:
https://wiki.postgresql.org/wiki/Loose_indexscan).

> EXPLAIN ANALYZE gives the following:
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314

That explain analyze shows a whole ton of heap fetches.  When was the
last time a VACUUM was run on this table, to build the visibility map?
Without the visibility map being current, an Index-Only Scan, as is
happening here, can really suck.

> CLUSTER user_event ON index_user_event_for_visits_3 followed by
> analyze... Our data is mostly time series but sometimes we get some
> dumps with historical records.
>
> Perhaps add a BRIN index on created_at

BRIN would be helpful if you wanted a smaller index.  That doesn't seem
to be the case here.

> I'm wondering if... we can use an index to cache, all user_ids seen on
> a given day. If we change our index to be more discrete, e.g.
> created_at::date, would this help? The set union of user_ids for 365
> days should be pretty fast?

Materialized views are often useful, particularly when the results are
(relatively) slow moving.

> I'm open to any ideas or suggestions, ideally we can keep
> optimisations within the database, rather than adding a layer of
> caching on top.

If you need to go the materialized view route, I'd definitely recommend
doing that in the database rather than outside it.

Thanks!

Stephen

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [GENERAL] potential extension of psql's \df+ ?
Следующее
От: Samuel Williams
Дата:
Сообщение: Re: [GENERAL] Large data and slow queries