Обсуждение: Addled index

Поиск
Список
Период
Сортировка

Addled index

От
Oleg Alexeev
Дата:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in  A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От
Steve Crawford
Дата:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
> We've faced with strange index problem.
>
> At some moment index became bad and queries does not return any data.
>
> For example, there are two tables - A (id, name) and B (id, name,
> a_id). B.a_id is foreign key to A. Both name columns in tables
> contains identical values for A.id = B.a_id. A.name column has unique
> constraint and additional index by it.
>
> So, in some moment results for queries like [select id from A where
> name = 'petya'] became empty (row with 'petya' name exist in  A).
>
> But query [select a_id from B where name = 'petya'] returns A.id and
> [select * from A where id = <found id>] returns row.
>
> This problem can be solved by index recreation only.
>
> How can we avoid such situation?
>
What version??

Cheers,
Steve


Re: Addled index

От
Oleg Alexeev
Дата:
On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in  A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??


The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)


--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От
Alban Hertroys
Дата:
On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in  A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??


The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Addled index

От
Tom Lane
Дата:
Alban Hertroys <haramrae@gmail.com> writes:
> If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of
syncsomehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do
withit: 
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back
file-systemtransactions? It this database perhaps a replicated node? 

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation.  You've provided no information that would suggest what,
though.  Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from?  Some distro (whose)?
  If they're self-built, what compiler and configuration settings did
  you use?
* what platform is this?  I would not rule out kernel bugs or flaky
  hardware.

            regards, tom lane


Re: Addled index

От
Oleg Alexeev
Дата:
On 16 March 2013 14:32, Alban Hertroys <haramrae@gmail.com> wrote:
On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in  A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??


The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?



Oh, it is not about short experiments. :)

Both failed queries are part of 24/7 application. And one of the query results became empty for existing key at some moment. We've recreated index and same query returned to the normal work.

fsync is in off state, yes

Both failed indexes are btree type.

Database located on software md raid 1 based on two SSD disks array. Ext4 filesystem. Database is master node.

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От
Oleg Alexeev
Дата:
On 16 March 2013 19:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:
> If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
> Maybe you turned fsync off?
> What type of index is that? A standard btree or one of the newer types?
> Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation.  You've provided no information that would suggest what,
though.  Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from?  Some distro (whose)?
  If they're self-built, what compiler and configuration settings did
  you use?
* what platform is this?  I would not rule out kernel bugs or flaky
  hardware.

                        regards, tom lane

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms, commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,  checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

--
Oleg V Alexeev
E:oalexeev@gmail.com

Re: Addled index

От
Tom Lane
Дата:
Oleg Alexeev <oalexeev@gmail.com> writes:
> * it is varchar columns, 256 and 32 symbols length
> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> checkpoint_completion_target = 0.7
> * postgres 9.2.3 installed via yum repository for version 9.2
> * 64 bit Centos 6, installed and updated from yum repository

fsync off?  Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

> Database located on software md raid 1 based on two SSD disks array. Ext4
> filesystem. Database is master node.

Meh.  I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.


https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there.  I'd certainly suggest making
sure you are on a *recent* kernel.  If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about.  But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

            regards, tom lane


Re: Addled index

От
Greg Jaskiewicz
Дата:
On 17 Mar 2013, at 04:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Oleg Alexeev <oalexeev@gmail.com> writes:
>> * it is varchar columns, 256 and 32 symbols length
>> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
>> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
>> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
>> checkpoint_completion_target = 0.7
>> * postgres 9.2.3 installed via yum repository for version 9.2
>> * 64 bit Centos 6, installed and updated from yum repository
>
> fsync off?  Have you had any power failures or other system crashes?
> ext4 is *way* more prone than ext3 was to corrupt data when fsync is
> disabled, because it caches and reorders writes much more aggressively.
>
>> Database located on software md raid 1 based on two SSD disks array. Ext4
>> filesystem. Database is master node.
>
> Meh.  I quote from the RHEL6 documentation (Storage Administration
> Guide, Chapter 20: Solid-State Disk Deployment Guidelines):
>
>> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
>> recommended for use on SSDs.
>
>
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html
>
> The part of the docs I'm looking at only asserts that performance is
> bad, but considering that it's a deprecated combination, it may well be
> that there are data-loss bugs in there.  I'd certainly suggest making
> sure you are on a *recent* kernel.  If that doesn't help, reconsider
> your filesystem choices.
>
Yeah, I don't think I'd consider using software raid for SSDs any time a good idea

Re: Addled index

От
Oleg Alexeev
Дата:
On 17 March 2013 08:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleg Alexeev <oalexeev@gmail.com> writes:
> * it is varchar columns, 256 and 32 symbols length
> * encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
> * autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
> commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
> checkpoint_completion_target = 0.7
> * postgres 9.2.3 installed via yum repository for version 9.2
> * 64 bit Centos 6, installed and updated from yum repository

fsync off?  Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

> Database located on software md raid 1 based on two SSD disks array. Ext4
> filesystem. Database is master node.

Meh.  I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

> Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there.  I'd certainly suggest making
sure you are on a *recent* kernel.  If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about.  But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

                        regards, tom lane


I think fsync=off was really bad idea.



--
Oleg V Alexeev
E:oalexeev@gmail.com