Re: Growth planning

Поиск

Список

Период

Сортировка

От	Israel Brewster
Тема	Re: Growth planning
Дата	6 октября 2021 г. 02:03:54
Msg-id	ED254FD0-853A-4EA4-9B7A-43F3A2745394@alaska.edu обсуждение исходный текст
Ответ на	Re: Growth planning (Ron <ronljohnsonjr@gmail.com>)
Список	pgsql-general

Дерево обсуждения

Just a quick update to this topic from my testing: I whipped up a quick python script to create the partition tables for me, which went smoothly enough, and created a table LIST partitioned on station. Once populated with my current data, this table proved marginally faster than the unpartitioned table, especially on the initial select (~4 seconds vs ~6 seconds, speeding up to around 2.4 seconds on subsequent queries). Of course, it is entirely possible that performance will remain higher than with the unpartitioned table as the column count grows.

Then I tried partitioning by station, with the station tables sub-partitioned by channel, on the logic that most queries (especially the ones that need to be fast) are only interested in a single channel on a single station. This made a HUGE improvement (relatively speaking). Initial query time dropped to ~2.5 seconds, with subsequent queries coming in at closer to 1 second!

I’ll have to think about the maintenance side - what happens if a new station/channel comes online that I don’t have a partition for? I’m thinking try to catch the error in my python code when I try to insert such a record, create the relevant table(s), then try the INSERT again, but I’ll have to investigate more to figure out if this is an option (what sort of error do I get), or if there is a better one.

I guess time will tell if this is a sustainable/good schema, but at least for the 1,171,575,191 rows I currently have, this gives much better performance than the non-partitioned table, and presumably will continue to do so into the future.

Of course, if anyone else has any other suggestions other than simple partitioning, I’m all ears! Like I said, this is far outside my experience in terms of sheer data size (I will be talking to the timescaledb people tomorrow)!

Thanks again!

---

Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320

Work: 907-474-5172
cell: 907-328-9145

On Oct 4, 2021, at 12:46 PM, Ron <ronljohnsonjr@gmail.com> wrote:

On 10/4/21 12:36 PM, Israel Brewster wrote:
[snip]
Indeed. Table per station as opposed to partitioning? The *most* I can reasonably envision needing is to query two stations, i.e. I could see potentially wanting to compare station a to some “baseline” station b. In general, though, the stations are independent, and it seems unlikely that we will need any multi-station queries. Perhaps query one station, then a second query for a second to display graphs for both side-by-side to look for correlations or something, but nothing like that has been suggested at the moment.

Postgresql partitions are tables. What if you partition by station (or range of stations)?

--
Angular momentum makes the world go 'round.

В списке pgsql-general по дате отправления:

Предыдущее

От: Brent Wood
Дата: 05 октября 2021 г., 11:11:52
Сообщение: Re: help implementing OGR Postgres foreign data wrapper

Следующее

От: ml@ft-c.de
Дата: 06 октября 2021 г., 06:50:25
Сообщение: type bug?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Growth planning

Предыдущее

Следующее