Обсуждение: Addled index

Поиск

Список

Период

Сортировка

Addled index

От

Oleg Alexeev

Дата:

15 марта 2013 г., 18:29:55

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От

Steve Crawford

Дата:

15 марта 2013 г., 21:21:21

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
> We've faced with strange index problem.
>
> At some moment index became bad and queries does not return any data.
>
> For example, there are two tables - A (id, name) and B (id, name,
> a_id). B.a_id is foreign key to A. Both name columns in tables
> contains identical values for A.id = B.a_id. A.name column has unique
> constraint and additional index by it.
>
> So, in some moment results for queries like [select id from A where
> name = 'petya'] became empty (row with 'petya' name exist in  A).
>
> But query [select a_id from B where name = 'petya'] returns A.id and
> [select * from A where id = <found id>] returns row.
>
> This problem can be solved by index recreation only.
>
> How can we avoid such situation?
>
What version??

Cheers,
Steve

Re: Addled index

От

Oleg Alexeev

Дата:

16 марта 2013 г., 08:33:22

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От

Alban Hertroys

Дата:

16 марта 2013 г., 10:32:39

On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:

Maybe you turned fsync off?

What type of index is that? A standard btree or one of the newer types?

Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

Alban Hertroys

If you can't see the forest for the trees,

cut the trees and you'll find there is no forest.

Re: Addled index

От

Tom Lane

Дата:

16 марта 2013 г., 15:10:43

Alban Hertroys <haramrae@gmail.com> writes:
> If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of
syncsomehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do
withit: 
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back
file-systemtransactions? It this database perhaps a replicated node? 

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation.  You've provided no information that would suggest what,
though.  Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from?  Some distro (whose)?
  If they're self-built, what compiler and configuration settings did
  you use?
* what platform is this?  I would not rule out kernel bugs or flaky
  hardware.

            regards, tom lane

Re: Addled index

От

Oleg Alexeev

Дата:

16 марта 2013 г., 20:39:34

On 16 March 2013 14:32, Alban Hertroys <haramrae@gmail.com> wrote:

On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

Oh, it is not about short experiments. :)

Both failed queries are part of 24/7 application. And one of the query results became empty for existing key at some moment. We've recreated index and same query returned to the normal work.

fsync is in off state, yes

Both failed indexes are btree type.

Database located on software md raid 1 based on two SSD disks array. Ext4 filesystem. Database is master node.

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От

Oleg Alexeev

Дата:

16 марта 2013 г., 20:54:47

On 16 March 2013 19:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alban Hertroys <haramrae@gmail.com> writes:
> If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation. You've provided no information that would suggest what,
though. Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from? Some distro (whose)?
If they're self-built, what compiler and configuration settings did
you use?
* what platform is this? I would not rule out kernel bugs or flaky
hardware.

regards, tom lane

* it is varchar columns, 256 and 32 symbols length

* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8

* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms, commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min, checkpoint_completion_target = 0.7

* postgres 9.2.3 installed via yum repository for version 9.2

* 64 bit Centos 6, installed and updated from yum repository

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От

Tom Lane

Дата:

17 марта 2013 г., 04:30:34

Oleg Alexeev <oalexeev@gmail.com> writes:
> * it is varchar columns, 256 and 32 symbols length
> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> checkpoint_completion_target = 0.7
> * postgres 9.2.3 installed via yum repository for version 9.2
> * 64 bit Centos 6, installed and updated from yum repository

fsync off?  Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

> Database located on software md raid 1 based on two SSD disks array. Ext4
> filesystem. Database is master node.

Meh.  I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there.  I'd certainly suggest making
sure you are on a *recent* kernel.  If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about.  But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

            regards, tom lane

Re: Addled index

От

Greg Jaskiewicz

Дата:

18 марта 2013 г., 11:55:38

On 17 Mar 2013, at 04:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Oleg Alexeev <oalexeev@gmail.com> writes:
>> * it is varchar columns, 256 and 32 symbols length
>> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
>> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
>> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
>> checkpoint_completion_target = 0.7
>> * postgres 9.2.3 installed via yum repository for version 9.2
>> * 64 bit Centos 6, installed and updated from yum repository
>
> fsync off?  Have you had any power failures or other system crashes?
> ext4 is *way* more prone than ext3 was to corrupt data when fsync is
> disabled, because it caches and reorders writes much more aggressively.
>
>> Database located on software md raid 1 based on two SSD disks array. Ext4
>> filesystem. Database is master node.
>
> Meh.  I quote from the RHEL6 documentation (Storage Administration
> Guide, Chapter 20: Solid-State Disk Deployment Guidelines):
>
>> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
>> recommended for use on SSDs.
>
>
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html
>
> The part of the docs I'm looking at only asserts that performance is
> bad, but considering that it's a deprecated combination, it may well be
> that there are data-loss bugs in there.  I'd certainly suggest making
> sure you are on a *recent* kernel.  If that doesn't help, reconsider
> your filesystem choices.
>
Yeah, I don't think I'd consider using software raid for SSDs any time a good idea

Re: Addled index

От

Oleg Alexeev

Дата:

18 марта 2013 г., 19:55:56

On 17 March 2013 08:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleg Alexeev <oalexeev@gmail.com> writes:
> * it is varchar columns, 256 and 32 symbols length
> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> checkpoint_completion_target = 0.7
> * postgres 9.2.3 installed via yum repository for version 9.2
> * 64 bit Centos 6, installed and updated from yum repository

fsync off? Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

> Database located on software md raid 1 based on two SSD disks array. Ext4
> filesystem. Database is master node.

Meh. I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there. I'd certainly suggest making
sure you are on a *recent* kernel. If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about. But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

regards, tom lane

I think fsync=off was really bad idea.

--
Oleg V Alexeev
E:oalexeev@gmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Addled index