Обсуждение: weird empty return from select problem; periodically get no data returned - could it be a network issue?

Поиск
Список
Период
Сортировка

I have a number of Perl programs of similar form to this:

 

$dbh=DBI->connect("dbi:Pg:dbname=$dbname;host=${dbserver};", $dbuser, $dbpasswd,

        {PrintError => 0, PrintWarn => 0, AutoCommit => $autocommit}) or

     errexit( "Unable to connect to dbname $dbname, err: $DBI::errstr");

errexit("No db handle") unless ($dbh);

 

#update statement definition here

my $update_info_sth=$dbh->prepare($stmt) or errexit("Cannot prepare handle for $stmt; ", $DBI::errstr);

 

#stmt=select statement definition here; selects some data ordered by date, limit n, where n is about 300 or so, depending on the exact program

my $select_info_sth=$dbh->prepare($stmt) or errexit("Cannot prepare handle for $stmt; ", $DBI::errstr);

trace_output("after prepare of select stmt");

$select_info_sth->execute() or errexit("Cannot execute select_info_sth; ",$select_info_sth->errstr);

trace_output("after execute of select stmt");

my (%info, @data);

trace_output("fetching domain info");

while (@data = $select_info_sth->fetchrow_array) {

  foreach (@data) { $_='' unless defined}

  next if ($data[0] eq '');

  $info{$data[0]}=$data[1];

  $update_sth->execute($data[0]) or errexit("Cannot update table processing column for id $data[0]; ",$update_sth->errstr);

  trace_output("processing set true for id $data[0], dom: $data[1]");

}

##check for problems with premature termination

errexit("Error in fetching:", $select_info_sth->errstr) if $select_info_sth->err;

 

 

#not really an error, just nothing to process:

if ((scalar keys %info) == 0) {

  trace_output("No ids returned");

  $dbh->disconnect;

  exit 0;

}

 

The trace_output and errexit subroutines are standard logging-type things.

 

 

After the SELECT runs, the program should take the ids returned, and process each, doing whatever it is supposed to do.  The SELECT, in this case, is ordering data by a date, so that we are processing the oldest data.  Therefore, data should always be returned.

 

This is a pg cluster installation, using version 8.3.5.

 

Many instances of these programs run all day long, some on a regular Debian Lenny server, others through exec hosts in a Sun Grid.  Most of the time, data is returned, and the program proceeds along its way, no problem.

 

Periodically (I see no pattern to the times), the program will exit with the “No ids returned” message in the log.  No errors or anything are in the database log, that I can find.  I have seen in the log processes connecting and running the main SELECT at apparently the appropriate time, then a “rollback” (presumably due to the disconnect), and disconnect.

 

I don’t really understand why the query returns nothing periodically, then works fine again seconds later.  The database server is quite busy, doing thousands of queries all the time.

 

Any explanations or ideas?  The processing works, because other iterations of the program are constantly running, so the next attempt returns data, and runs as normal.  However, it bugs me that sometimes a query that should work is returning no results, for no discernable reason.

 

Thanks,

Susan

 

 

 

 

 

 

On 9/07/2010 2:58 AM, Susan Cassidy wrote:

> This is a pg cluster installation, using version 8.3.5.

"Pg cluster"?

There are quite a few different clustering setups for Pg.

Do you mean PgCluster from http://pgfoundry.org/projects/pgcluster/ ? If
so, which version and how is it set up?

Or some other Pg-based cluster using Bucardo, Slony-II, etc?


As for your issue: have you excluded the possibility that there is no
data to return? Issues sometimes arise where the data you're expecting
to retrieve hasn't been committed by another transaction yet, so it's in
the database but not yet visible. As you haven't provided your queries
or schema it's hard to know what's going on there.

--
Craig Ringer

Please reply to the list, not just to me. "reply all" or (in smarter
mail clients) "reply to list" will do the trick.

I've cc'd the list.

On 10/07/10 00:15, Susan Cassidy wrote:
> I didn't set up the cluster (just started working here a few months ago), so I don't know for sure.  A comment in one
ofthe scripts in the bin directory for the pg_* cluster commands says something about the postgresql-common package.
Thisis Debian 4.0 (etch).  dpkg -l has a line with: 
> postgresql-common        91       PostgreSQL database-cluster manager

Oh, you're probably not using clustering at all, then, just being thrown
by the terminology.

PostgreSQL uses the term "cluster" to refer to a group of databases
managed by a postmaster. The chosen terminology is becoming increasingly
confusing as real clustering increases in prevalence.

It'd still be helpful to confirm that it's just a vanilla install of
PostgreSQL from debian packages on etch, as it sounds like.

As for the empty row sets ... in your position, I'd be increasing my
tracing/logging levels both on the database backend and in the dbi
driver, then trying to match up empty row return incidents with those
trace logs. When you can't reproduce a problem on demand tracing is
often the only option unless you can figure it out with the information
at hand.

I'd also want to verify that my indexes were in good condition, as a
damaged index can cause all sorts of wacky results. They shouldn't
happen, but in reality have been known to whether due to
hardware/filesystem issues or the occasional PostgreSQL bug. The easiest
way to make sure your indexes are good - if you can schedule some
downtime - is to REINDEX. If you can't, there are other alternatives.

--
Craig Ringer

    Susan Cassidy wrote:

> Any explanations or ideas?  The processing works, because other iterations
> of the program are constantly running, so the next attempt returns data, and
> runs as normal.  However, it bugs me that sometimes a query that should work
> is returning no results, for no discernable reason.

In a producer-consumer model, at some point in time there should be between 0
and N items to consume. 0 item is no more weird than 1, or 2, or any other
particular value.
Personally, I would find it weird if a consumer process never had nothing to
do. I would see this as a hint that the producer is over-producing.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org