Обсуждение: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

Поиск
Список
Период
Сортировка

weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Rajesh Kumar Mallah
Дата:

I recovered my database by pg_resetxlog and then did a dump , initdb ,
reload.
one of the tables exhibited this phenomenon. I thought it was worth
mentioning
to the developers.

Regds
mallah.


rt2=# SELECT name from users group by name having count(*) > 1 limit 20;

+----------------------+
| name |
+----------------------+
| �˾ |
| ����163.com |
| �@hotmail.com |
| p� |
+----------------------+
(4 rows)

Time: 1961.199 ms
rt2=#
rt2=# SELECT name from users group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)

Time: 789.184 ms


Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Tom Lane
Дата:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> rt2=# SELECT name from users group by name having count(*) > 1 limit 20;
> (4 rows)
> rt2=# SELECT name from users group by name having count(*) > 1 ;
> (0 rows)

Weird.  What PG version is this?  What does EXPLAIN show for each of
these queries (maybe EXPLAIN ANALYZE too)?  Can you provide a
self-contained test case?

            regards, tom lane

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Rajesh Kumar Mallah
Дата:

Dear Tom,

Sorry i could not replicate that issue . But
here is something which was the original problem
and is replicable.


Regds
Mallah.



rt2=# CREATE UNIQUE INDEX users_1 on users_sample(name);
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.
rt2=#
rt2=# SELECT name from users_sample group by name having count(*) > 1 limit 20;
+------+
| name |
+------+
+------+
(0 rows)

Time: 766.725 ms
rt2=# SELECT version();
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)

Time: 905.586 ms
rt2=# SELECT version();
+----------------------------------------------------------------------------------------------------------+
|                                                 version                                                  |
+----------------------------------------------------------------------------------------------------------+
| PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-34) |
+----------------------------------------------------------------------------------------------------------+
(1 row)

Time: 338.191 ms


Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
rt2=# SELECT name from users group by name having count(*) > 1 limit 20;
(4 rows)
rt2=# SELECT name from users group by name having count(*) > 1 ;
(0 rows)   
Weird.  What PG version is this?  What does EXPLAIN show for each of
these queries (maybe EXPLAIN ANALYZE too)?  Can you provide a
self-contained test case?
		regards, tom lane
 

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Tom Lane
Дата:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> here is something which was the original problem
> and is replicable.

Hm.  Does EXPLAIN show that the GROUP BY query is using hash
aggregation?  Does its behavior change if you turn off enable_hashagg?

            regards, tom lane

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Rajesh Kumar Mallah
Дата:

Dear Tom,

It does.


rt2=# explain SELECT name from users_sample group by name having count(*) > 1 ;
+-----------------------------------------------------------------------+
|                              QUERY PLAN                               |
+-----------------------------------------------------------------------+
| HashAggregate  (cost=25.00..26.00 rows=200 width=78)                  |
|   Filter: (count(*) > 1)                                              |
|   ->  Seq Scan on users_sample  (cost=0.00..20.00 rows=1000 width=78) |
+-----------------------------------------------------------------------+
(3 rows)

Time: 375.619 ms
rt2=# explain analyze SELECT name from users_sample group by name having count(*) > 1 ;
+------------------------------------------------------------------------------------------------------------------------+
|                                                       QUERY PLAN                                                       |
+------------------------------------------------------------------------------------------------------------------------+
| HashAggregate  (cost=25.00..26.00 rows=200 width=78) (actual time=567.981..567.981 rows=0 loops=1)                     |
|   Filter: (count(*) > 1)                                                                                               |
|   ->  Seq Scan on users_sample  (cost=0.00..20.00 rows=1000 width=78) (actual time=0.046..160.706 rows=100489 loops=1) |
| Total runtime: 577.212 ms                                                                                              |
+------------------------------------------------------------------------------------------------------------------------+
(4 rows)





Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
here is something which was the original problem
and is replicable.   
Hm.  Does EXPLAIN show that the GROUP BY query is using hash
aggregation?  Does its behavior change if you turn off enable_hashagg?
		regards, tom lane
 

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Tom Lane
Дата:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> It does.

And?

> Tom Lane wrote:
>> Hm.  Does EXPLAIN show that the GROUP BY query is using hash
>> aggregation?  Does its behavior change if you turn off enable_hashagg?
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

            regards, tom lane

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Rajesh Kumar Mallah
Дата:
Tom Lane wrote:




Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
It does.   
And?
 
Yep the problem of original posting could be replicated on
disabling hash aggregates. On disabling i could get the repeated rows.


Regds
Mallah.


rt2=# SET enable_hashagg TO off;
SET
Time: 329.533 ms
rt2=# explain analyze SELECT name from users_sample group by name having count(*) > 1 ;
+------------------------------------------------------------------------------------------------------------------------------+
|                                                          QUERY PLAN                                                          |
+------------------------------------------------------------------------------------------------------------------------------+
| GroupAggregate  (cost=69.83..78.33 rows=200 width=78) (actual time=2411.411..2749.733 rows=4 loops=1)                        |
|   Filter: (count(*) > 1)                                                                                                     |
|   ->  Sort  (cost=69.83..72.33 rows=1000 width=78) (actual time=2411.300..2532.821 rows=100489 loops=1)                      |
|         Sort Key: name                                                                                                       |
|         ->  Seq Scan on users_sample  (cost=0.00..20.00 rows=1000 width=78) (actual time=0.024..170.258 rows=100489 loops=1) |
| Total runtime: 2915.439 ms                                                                                                   |
+------------------------------------------------------------------------------------------------------------------------------+
(6 rows)

Time: 3265.102 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+----------------------+
|         name         |
+----------------------+
| ���           |
| ����163.com     |
| �@hotmail.com |
| p�                |
+----------------------+
(4 rows)

Time: 3358.030 ms
rt2=# SET enable_hashagg TO on;
SET
Time: 330.148 ms
rt2=# SELECT name from users_sample group by name having count(*) > 1 ;
+------+
| name |
+------+
+------+
(0 rows)

Time: 752.395 ms
rt2=#








Tom Lane wrote:   
Hm.  Does EXPLAIN show that the GROUP BY query is using hash
aggregation?  Does its behavior change if you turn off enable_hashagg?     
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		regards, tom lane
 

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Tom Lane
Дата:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> Yep the problem of original posting could be replicated on
> disabling hash aggregates. On disabling i could get the repeated rows.

Okay.  What I suspect is happening is that there are entries in the
column that are equal according to the datatype's comparison function,
but are not bitwise equal and therefore yield different hash codes.
This makes it a crapshoot whether they are seen to be equal or not
when hash aggregation is used.  We identified a similar bug in the
inet/cidr datatypes just a few weeks ago.

What exactly is the datatype of the "name" column?  If it's a text
type, what database encoding and locale settings (LC_COLLATE/LC_CTYPE)
are you using?  Can you investigate exactly what's stored within each
of these groups of matching names?

            regards, tom lane

Re: weired behavior... after pg_resetxlog-> dump->initdb-->reload.

От
Rajesh Kumar Mallah
Дата:
Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
Yep the problem of original posting could be replicated on
disabling hash aggregates. On disabling i could get the repeated rows.   
Okay.  What I suspect is happening is that there are entries in the
column that are equal according to the datatype's comparison function,
but are not bitwise equal and therefore yield different hash codes.
This makes it a crapshoot whether they are seen to be equal or not
when hash aggregation is used.  We identified a similar bug in the
inet/cidr datatypes just a few weeks ago.

What exactly is the datatype of the "name" column?

name                  | character varying(120)   | not null
  If it's a text
type, what database encoding and locale settings (LC_COLLATE/LC_CTYPE)
are you using? 

 List of databases
+-----------------+----------+-----------+
|      Name       |  Owner   | Encoding  |
+-----------------+----------+-----------+
| bric            | postgres | UNICODE   |

+--------------------------------+-----------------+
|              name              |     setting     |
+--------------------------------+-----------------+
| lc_collate                     | en_US.UTF-8     |
| lc_ctype                       | en_US.UTF-8     |
| lc_messages                    | en_US.iso885915 |
| lc_monetary                    | en_US.iso885915 |
| lc_numeric                     | en_US.iso885915 |
| lc_time                        | en_US.iso885915 |


 Can you investigate exactly what's stored within each
of these groups of matching names? 

Can you tell me how to do it please?
		regards, tom lane
 
Regds
Mallah.