Re: Update with last known location?
От | James David Smith |
---|---|
Тема | Re: Update with last known location? |
Дата | |
Msg-id | CAMu32ADBsx7Bx3G1az__7xVNL-ATcR0QyLwRmd4F+f+45=i+_g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Update with last known location? (Kevin Grittner <kgrittn@ymail.com>) |
Ответы |
Re: Update with last known location?
|
Список | pgsql-novice |
Hi Kevin / all, I tried to run the update with your code but got the following error (which I realise is now diverging from my original question): ERROR: could not write to hash-join temporary file: No space left on device ********** Error ********** Any thoughts? I didn't tweak the memory allocation or cost factors before I ran it - maybe that would help? I didn't bother doing it initially as I wasn't too fussed if it took say 4 hours instead of 3 etc. I'm not in a huge rush. Regarding the system, it's a virtual Ubuntu 12.04 desktop with PostgreSQL 9.3 and PostGIS 2.0. I asked our IT guy for the details about it and he gave the below. I'm the only user of the visualization set-up at the moment, so all the resource *should* be available to me he says: 20 cores of Intel Xeon E-2690 v2 processor @ 3GHz 48GB of memory at 1866 MHz Hard drive based on 10 x 15K RPM SAS hard disks Cheers James On 30 January 2014 20:08, Kevin Grittner <kgrittn@ymail.com> wrote: > James David Smith <james.david.smith@gmail.com> wrote: >> On 30 January 2014 16:45, Kevin Grittner <kgrittn@ymail.com> wrote: > >>> update test_data x >>> set the_geom = y.the_geom >>> from test_data y >>> where x.the_geom is null >>> and y.ppid = x.ppid >>> and y.the_geom is not null >>> and y.point_time < x.point_time >>> and not exists >>> ( >>> select * from test_data z >>> where z.ppid = y.ppid >>> and z.the_geom is not null >>> and z.point_time > y.point_time >>> and z.point_time < x.point_time >>> ) >>> ; > >> I can't run it on my actual data at the moment, or rather I don't want >> too, as I set the below query going an hour or so ago and thought I >> should let it finish really. >> >> If it hasn't finished when I come into work tomorrow (I'm leaving the >> office shortly) then I'll cancel it and give yours a crack instead. To >> my novice eye, your code looks like it'll be quicker than the below >> anyway. > >> SELECT >> data.ppid, >> data.point_time, >> CASE >> WHEN data.the_geom IS NULL >> THEN ( >> --Get all locations with an earlier time stamp for that ppid >> SELECT geom.the_geom >> FROM hybrid_location geom >> WHERE data.ppid = geom.ppid >> AND geom.point_time < data.point_time >> AND geom.the_geom IS NOT NULL >> AND NOT EXISTS ( >> -- Cull all but the most recent one >> SELECT * >> FROM hybrid_location cull >> WHERE cull.ppid = geom.ppid >> AND geom.the_geom IS NOT NULL >> AND cull.point_time < data.point_time >> AND cull.point_time > geom.point_time >> AND cull.the_geom IS NOT NULL >> ) >> ) >> ELSE data.the_geom >> end >> FROM hybrid_location data; > > Yeah, that's basically the same approach, but it uses a subquery > which I don't think can get pulled up -- so I think it will need to > do a lot of the work once for each row with a NULL the_geom column > where my version can do it once, period. > > If you haven't tuned your configuration, you can probably speed up > any of these versions with a few tweaks to memory allocation and > cost factors. The most significant for this query would probably > be to set work_mem to something around 25% of machine RAM divided > by the number of active connections you can have. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
В списке pgsql-novice по дате отправления: