Thanks Shaun.
We reindexed all the primary and unique keys of all the tables, But we
did not reindex the tables. You think we should do that also ?
Also, do you think we should do a clean dump restore to eliminate all
data inconsistencies.
One more query :
We managed to get the old server up. We are trying to play the
difference in data by checking the log files(statement logs). You think
there is any other easy alternatives ?
Thanks in advance,
- Karthik
On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote:
> On 06/25/2014 06:29 AM, Karthik Iyer wrote:
>
>> [2] We also have a daemon process which copies the latest partial WAL
>> log file (which postgres is currently writing to, under pg_xlog/) every
>> 3 secs to a different location.
>
> No. No, no, no, no no. No.
>
> Also, no. Partial WAL files are not valid for recovery. In fact, I'm
> surprised the standby node even applied it at all.
>
>> We are seeing these problems in the newly promoted master now:
>>
>> 1. when we run queries on primary key, we don't get the rows even
>> if it exist in db. However if we force query not to use index, we get
>> those entries.
>> 2. there are duplicate values for primary keys
>
> This is no surprise. Your slave has partial data commits, which means
> your table, index, or possibly both, are corrupt.
>
> The first thing you need to do is back up any tables you've noticed
> are having this behavior. Second, try to reindex the tables that are
> having problems. The errors you are seeing are due to the data and
> indexes being out of sync. If you get an error that says the reindex
> fails due to duplicate values, you can do this:
>
> SET enable_indexscan TO false;
> SET enable_bitmapscan TO false;
>
> SELECT primary_key, count(1)
> FROM broken_table
> GROUP BY 1
> HAVING count(1) > 1;
>
> For any ID that comes back, do this:
>
> SELECT ctid, *
> FROM broken_table
> WHERE primary_key = [value(s) from above];
>
> Then you need to delete one of the bad rows after deciding which. Use
> the CTID of the row you want to delete:
>
> DELETE FROM broken_table
> WHERE ctid = 'whatever';
>
> Then reindex the table so the correct values are properly covered.
> Doing this for all of your corrupt tables may take a while depending
> on how many there are.
>