Search then Delete Performance

Поиск
Список
Период
Сортировка
От Michael Hull
Тема Search then Delete Performance
Дата
Msg-id AANLkTikV95RDMcoaDjXeLX+3xe-yHmG4Ffdqg2AtSYMb@mail.gmail.com
обсуждение исходный текст
Ответы Re: Search then Delete Performance  (Arjen Nienhuis <a.g.nienhuis@gmail.com>)
Re: Search then Delete Performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Search then Delete Performance  (John R Pierce <pierce@hogranch.com>)
Список pgsql-general
Hi Everyone,
I am fairly new to practical databases, but I am trying out the c
interface to postgres and am wondering how to improve performance. I
am a researcher, and I am trying to perform a large parameter sweep.
Since this will involve a couple of thousand simulations, I have a
process that manages which simulations have been done, and which still
need to be done, so I can run it easily on a cluster.

So, I have a fairly simple schema of 4 tables.

-- na, ca,ks,kf,lk,iinj are the parameters for my simulation.
CREATE TABLE alljobs (
    id SERIAL,
    ca int,
    na int,
    lk int,
    ks int,
    kf int,
    iinj int,
    PRIMARY KEY(id)
);


CREATE TABLE assignedjobs (
    jobid int,
    nodeid varchar(100),
    assignedtime timestamp,
    PRIMARY KEY(jobid)
);


CREATE TABLE completedjobs (
    jobid int,
    PRIMARY KEY(jobid)
);

CREATE TABLE unassignedjobs(
    jobid int,
    PRIMARY KEY(jobid)
);



alljobs is initially populated, and contains all the simulations that
will ever be run
unassignedjobs contains the ids in alljobs that havent been run yet
assignedjobs contains the ids in alljobs that have been dispatched to
some cpu on the cluster and are currently simulating
completedjobs contains all the completed jobs.

So fairly simply, I have a daemon running on a machine, which accesses
this DB. Clients connect and request the details for say 1000
simulations, at which point the daemon takes 1000 entries from the
unassigned table and moves them to the assigned table. The once the
client is finished with those jobs, it signals this to the daemon,
which then move those jobs from 'assigned' to 'complete'.

So this is fairly simple to implement, but my problem is that it is very slow.


In particular, I have a 'select' waiting for network connections,
which then calls this function:

typedef vector<long> VectorLong;
VectorLong assignJobs(PGconn* pDB, int number, string nodename)
{
    char buffer[1000];
    sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT  %d",number);
    PGresult* pRes = PQexec(pDB, buffer);

    printf("assigning jobs");

    //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100");
    //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs");

    int nRes = PQntuples(pRes);
    printf("Results found: %d",nRes);

    VectorLong jobs;
    for(int i=0;i<nRes;i++)
    {
        long id = atol( PQgetvalue(pRes,i,0) );
        cout << id << " ";
        jobs.push_back(id);

        sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id);
        PQexec(pDB, buffer);

        sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES
(%ld, %s)", id, nodename.c_str() );
        PQexec(pDB, buffer);
    }


    return jobs;
}

but it is painfully slow. I was wondering if there is a way to improve
this? I feel there should be since I already have a 'pointer' to the
rows I want to delete.


Any help would be greatly appreciated.

Many thanks


Mike Hull

В списке pgsql-general по дате отправления:

Предыдущее
От: Diego Schulz
Дата:
Сообщение: Re: Extracting data from BYTEA column to binary file using libpq
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: workaround steps for autovaccum problem