Re: queries on xmin

Поиск

Список

Период

Сортировка

От	Marko Kreen
Тема	Re: queries on xmin
Дата	11 июня 2009 г. 11:22:59
Msg-id	e51f66da0906110422y2953acb4k40430acc9a6524aa@mail.gmail.com обсуждение исходный текст
Ответ на	queries on xmin (Matt Amos <zerebubuth@gmail.com>)
Ответы	Re: queries on xmin (Brett Henderson <brett@bretth.com>)
Список	pgsql-general

Дерево обсуждения

On 6/11/09, Matt Amos <zerebubuth@gmail.com> wrote:
> the openstreetmap project (http://osm.org/) recently moved from using
>  mysql to postgres and we're trying to improve some of our tools using
>  the new functionality that postgres provides.
>
>  in particular, we are dumping changes to the database at short
>  intervals (currently every minute, hour and day [1,2]) so that 3rd
>  party sites can use this to keep up-to-date with the main database. it
>  previously worked by examining the timestamp of each modified element,
>  but this is no longer practical due to new features in the
>  openstreetmap API which can cause long-running transactions [3].
>
>  we've been working out a scheme based on taking txid_snapshots at
>  short intervals and dumping the new rows (due to the way it's
>  implemented, all edits are inserted rows) and querying xmin. the query
>  looks something like this:
>
>  select id,version from (nodes|ways|relations) where timestamp > (now()
>  - '1 hour'::interval) and xmin in (...)
>
>  and we build up the txid list from the two snapshots we're dumping
>  between on the client. however, we're finding that this becomes much
>  less efficient as the txid list becomes longer. in an effort to reduce
>  the query time we're looking to index the xmin column. it seems that
>  hash indexes are already supported on the txid type, but btree are not
>  [4].
>
>  the queries we're doing would usually be of the form "xmin in
>  previous_unfinished_txids or (xmin > previous_max_txid and xmin <=
>  current_max_txid and not in current_unfinished_txids)" except when
>  wrap-around occurs, so it would seem that a btree index would be
>  superior to building this list client-side and using a hash index.
>
>  what problems are we going to create for ourselves if we create a
>  btree index on xmin casted to int4? would it be as efficient to use a
>  hash index, create a temporary table of txids that we're querying with
>  a hash index and do an explicit join? have i missed the point
>  entirely?

Hash index would not work as you cannot do range queries on that.

4-byte xids on btree may create data corruption.

Solution is to use 8-byte txids via txid_current() for indexing. [1]

See pgq.batch_event_sql() function in Skytools [2] for how to
query txids between snapshots efficiently and without being affected
by long transactions.

In fact perhaps you can use PgQ directly instead building your own.
It is built quite similarly to what you are planning - periodic
snapshots and then queries on txids to get the data.

--
marko

[1] http://www.postgresql.org/docs/8.3/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
[2] http://wiki.postgresql.org/wiki/Skytools

В списке pgsql-general по дате отправления:

Предыдущее

От: Greg Stark
Дата: 11 июня 2009 г., 11:11:13
Сообщение: Re: queries on xmin

Следующее

От: Steve Clark
Дата: 11 июня 2009 г., 11:41:50
Сообщение: Re: When to use cascading deletes?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: queries on xmin

Предыдущее

Следующее