Natural key woe

Поиск
Список
Период
Сортировка
От Oliver Kohll - Mailing Lists
Тема Natural key woe
Дата
Msg-id 713571A8-7DDF-4A06-B3EE-22D5B45D2A8F@agilebase.co.uk
обсуждение исходный текст
Ответы Re: Natural key woe
Re: Natural key woe
Список pgsql-general
I'm sure no one else on this list has done anything like this, but here's a cautionary tale.

I wanted to synchronise data in two tables (issue lists) - i.e. whenever a record is added into one, add a similar
recordinto the other. The two tables are similar in format but not exactly the same so only a subset of fields are
copied.Both tables have synthetic primary keys, these can't be used to match data as they are auto-incrementing
sequencesthat might interfere. What I could have done perhaps is get both tables to use the same sequence, but what I
actuallydid is: 

* join both tables based on a natural key
* use that to copy any missing items from table1 to table2
* truncate table1 and copy all of table2's rows to table1
* run this routine once an hour

The natural key was based on the creation timestamp (stored on insert) and the one of the text fields, called
'subject'.

The problem came when someone entered a record with no subject, but left it null. When this was copied over and present
inboth tables, the *next* time the join was done, a duplicate was created because the join didn't see them as matching
(null!= null). 

So after 1 hour there were two records. After two there were four, after 3, 8 etc.

When I logged in after 25 hrs and noticed table access was a little slow, there were 2^25 = 33 million records.

That's a learning experience for me at least. It's lucky I did check it at the end of that day rather than leaving it
overnight,otherwise I think our server would have ground to a halt. 

One other wrinkle to note. After clearing out these rows, running 'VACUUM table2', 'ANALYZE table2' and 'REINDEX table
table2',some queries with simple sequence scans were taking a few seconds to run even though there are only a thousand
rowsin the table. I finally found that running CLUSTER on the table sorted that out, even though we're on an SSD so I
wouldhave thought seeking all over the place for a seq. scan wouldn't have made that much difference. It obviously does
stillmake some. 

Oliver Kohll
www.agilebase.co.uk






В списке pgsql-general по дате отправления:

Предыдущее
От: Dorian Hoxha
Дата:
Сообщение: Re: Log Data Analytics : Confused about the choice of Database
Следующее
От: Robin
Дата:
Сообщение: Re: Natural key woe