Re: Using indexes for partial index builds

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Using indexes for partial index builds
Дата
Msg-id 5137E498.1050901@nasby.net
обсуждение исходный текст
Ответ на Using indexes for partial index builds  (Paul Norman <penorman@mac.com>)
Ответы Re: Using indexes for partial index builds  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 2/2/13 4:05 AM, Paul Norman wrote:
> Hello,
> After a discussion on IRC in #postgresql, I had a feature suggestion and it
> was suggested I write it up here.
>
> I have a large (200GB, 1.7b rows) table with a number of columns, but the
> two of interest here are a hstore column, tags and a postgis geometry
> column, geom.
>
> There is a GIN index on tags and a gist index on geom. These took about
> 36-48 hours to build in total. Obviously index building on a table this size
> is not trivial.
>
> Periodically I want to do a number of specialized queries on objects with a
> particular tag or in a particular area. To do this I often want to create a
> partial index. For example, I created the index btree ((tags ->
> 'name_1'::text) text_pattern_ops) WHERE tags ? 'name_1'::text. My
> understanding is to create this index PostgreSQL does a scan of the entire
> table, even though the GIN index on tags could be used to identify which
> rows could belong in the index. Where the WHERE condition selects only a
> small portion of the table this is scanning a lot more data than is
> necessary.
>
> Another case where it would be useful is when I am conducting a detailed
> analysis of some aspect of the rows in a particular city. This leads to all
> the queries being of the form SELECT ... FROM ... WHERE
> is_in_my_area(geom)[1].
>
> My current project is doing analysis involving addresses. The ability to
> create an index like btree((tags -> 'addr:housenumber'), (tags ->
> 'addr:street'), (tags -> 'addr:city')) WHERE is_in_my_area(geom) in a
> reasonable time would allow me to use a view instead of copying the local
> area to a temporary table and indexing that table. The local area is about
> 350k rows, or about 0.02% of the database.
>
> [1] The actual function for determining if it's in my area is long and not
> really essential to the point here.

Something worth considering on this... I suspect it's possible to use an index-only scan to do this, regardless of
whetherthe heap page is all visible. The reason is that the newly created index would just use the same access
methodologyas the original index, so any dead rows would be ignored.
 

We'd almost certainly need to block vacuums during the build however. Obviously not an issue for regular index builds,
butit would be for concurrent ones.
 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: Enabling Checksums
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement