Обсуждение: SELECTing every Nth record for better performance
I have a big table that is used for datalogging. I'm designing graphing interface that will visualise the data. When the user is looking at a small daterange I want the database to be queried for all records, but when the user is 'zoomed out', looking at an overview, I want run a query that skips every nth record and returns a managable dataset that still gives a correct overview of the data without slowing the programme down. Is there an easy way to do this that I have overlooked? I looked at: 1. taking the primary key (which is an auto increasing integer) and running modulo on it. This will not work in my case as the database is being used for a number of different logging 'runs', and these runs are not necessarily equally distributed over the primary keys. 2. Subqueries that do SELECT COUNT to generate a row number are too slow as im dealing with thousands to tens of thousands of records. 3. My last idea was to create a sequence with CYCLE and min_Value 0 and max_value 1: SELECT * FROM ( SELECT *, (SELECT nextval('counter_seq')) as counter FROM table) WHERE counter = 0 this didnt work (counter was always the same for all rows), so i put SELECT nextval('counter_seq') in a function called counter(): SELECT *, counter() as counter FROM table this gives the table i'm looking for, however, I am unable to use WHERE counter = 0. when I run EXPLAIN, it tells me that it is actually not looking at the values in the table but just running the function again to filter. So I tried this: SELECT *, (counter()+id-id) as counter FROM table where Id is the primary key of the table. im trying to fool the interpreter into looking at the table instead of running the function itself. Again, this query generates the right table. So, I tried adding WHERE counter = 0. Again it doesnt work: it returns the same number of rows, but changes the values of all rows in the 'counter' column to 1. EXPLAIN does not help me (gives no information about the filtering). Any general thoughts on how to achieve my original goal or on how to fix issues with my 3d attempt are appreciated. Tom
On Thu, Dec 3, 2009 at 9:26 PM, Tom <tom@cstcomposites.com> wrote: > I > want run a query that skips every nth record and returns a managable > dataset that still gives a correct overview of the data without > slowing the programme down. Is there an easy way to do this that I > have overlooked? I looked at: I've played with datalogging. It was very easy to find nth records when using date_trunc() on a timestamp. The only minor problem with data_trunc was that I couldn't create arbitrary granularity. For example it is easy to date_trunc() on an year, month, week, day, hour or a minute but I wanted 5, 10 and 15 minute increments. I bet there could be a solution to this, but I never looked into it. To improve the select performance, I created functional indexes using different data_trunc() granularities. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
In response to Tom : > I have a big table that is used for datalogging. I'm designing > graphing interface that will visualise the data. When the user is > looking at a small daterange I want the database to be queried for all > records, but when the user is 'zoomed out', looking at an overview, I > want run a query that skips every nth record and returns a managable > dataset that still gives a correct overview of the data without > slowing the programme down. Is there an easy way to do this that I > have overlooked? I looked at: Do you have 8.4? If yes: test=# create table data as select s as s from generate_Series(1,1000) s; SELECT test=*# select s from (select *, row_number() over (order by s) from data) foo where row_number % 3 = 0 limit 10; s ---- 3 6 9 12 15 18 21 24 27 30 (10 rows) -- or skip every 5. record: test=*# select s from (select *, row_number() over (order by s) from data) foo where row_number % 5 != 0 limit 10; s ---- 1 2 3 4 6 7 8 9 11 12 (10 rows) Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
In response to Richard Broersma : > On Thu, Dec 3, 2009 at 9:26 PM, Tom <tom@cstcomposites.com> wrote: > > > I > > want run a query that skips every nth record and returns a managable > > dataset that still gives a correct overview of the data without > > slowing the programme down. Is there an easy way to do this that I > > have overlooked? I looked at: > > I've played with datalogging. It was very easy to find nth records > when using date_trunc() on a timestamp. The only minor problem with > data_trunc was that I couldn't create arbitrary granularity. For > example it is easy to date_trunc() on an year, month, week, day, hour > or a minute but I wanted 5, 10 and 15 minute increments. I bet there > could be a solution to this, but I never looked into it. How about: test=# select * from data limit 10; ts --------------------- 2009-12-01 00:00:00 2009-12-01 00:01:00 2009-12-01 00:02:00 2009-12-01 00:03:00 2009-12-01 00:04:00 2009-12-01 00:05:00 2009-12-01 00:06:00 2009-12-01 00:07:00 2009-12-01 00:08:00 2009-12-01 00:09:00 (10 rows) -- now with 5 miutes increments, using date_trunc and extract: test=# select * from data where extract(epoch from date_trunc('minute', ts))::int % (5*60) = 0 limit 10; ts --------------------- 2009-12-01 00:00:00 2009-12-01 00:05:00 2009-12-01 00:10:00 2009-12-01 00:15:00 2009-12-01 00:20:00 2009-12-01 00:25:00 2009-12-01 00:30:00 2009-12-01 00:35:00 2009-12-01 00:40:00 2009-12-01 00:45:00 (10 rows) Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
A. Kretschmer wrote: > In response to Tom : >> I have a big table that is used for datalogging. I'm designing >> graphing interface that will visualise the data. When the user is >> looking at a small daterange I want the database to be queried for all >> records, but when the user is 'zoomed out', looking at an overview, I >> want run a query that skips every nth record and returns a managable >> dataset that still gives a correct overview of the data without >> slowing the programme down. Is there an easy way to do this that I >> have overlooked? I looked at: > > > Do you have 8.4? If yes: > > test=# create table data as select s as s from generate_Series(1,1000) s; > SELECT > > > > test=*# select s from (select *, row_number() over (order by s) from > data) foo where row_number % 3 = 0 limit 10; Won't this still read in the entire table and only then filter the records out?
In response to Ivan Voras : > A. Kretschmer wrote: > >In response to Tom : > >>I have a big table that is used for datalogging. I'm designing > >>graphing interface that will visualise the data. When the user is > >>looking at a small daterange I want the database to be queried for all > >>records, but when the user is 'zoomed out', looking at an overview, I > >>want run a query that skips every nth record and returns a managable > >>dataset that still gives a correct overview of the data without > >>slowing the programme down. Is there an easy way to do this that I > >>have overlooked? I looked at: > > > > > >Do you have 8.4? If yes: > > > >test=# create table data as select s as s from generate_Series(1,1000) s; > >SELECT > > > > > > > >test=*# select s from (select *, row_number() over (order by s) from > >data) foo where row_number % 3 = 0 limit 10; > > Won't this still read in the entire table and only then filter the > records out? Yes. But i think, the problem is the graphing interface. He wants anly a 'zoomed out' - overview, he needs a 'managable dataset'. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
why create another table, straightforward on 8.4:
SELECT * FROM (SELECT *, (row_number() OVER( ORDER BY id))%10 AS rn FROM table) sa WHERE sa.rn=1 LIMIT 10;
SELECT * FROM (SELECT *, (row_number() OVER( ORDER BY id))%10 AS rn FROM table) sa WHERE sa.rn=1 LIMIT 10;
Tom wrote: > 1. taking the primary key (which is an auto increasing integer) and > running modulo on it. This will not work in my case as the database is > being used for a number of different logging 'runs', and these runs > are not necessarily equally distributed over the primary keys. > Problem #1 with what you're trying to do is that it's tricky to get SQL to have a notion of "row number" in a result set, so that you can then filter on that number. The best approach to this is to use PostgreSQL 8.4 where the SQL Window functions can be used for this purpose. > 2. Subqueries that do SELECT COUNT to generate a row number are too > slow as im dealing with thousands to tens of thousands of records. > Problem #2 is that if you're looking at a only a mildly filtered version of your data, you're going to pull the whole set in anyway. Random data point in this area: if you do a scan on a table that needs to look at 20% of a table using an index, what will happen when you execute it? The correct answer is likely "sequential scan of the entire table", because that's actually more efficient than trying to grab only a few records once the percentage gets large enough. The way multiple records get packed onto a single page, you're likely to actually read every page of the data anyway even when trying to grab a subset of them, unless the subset is very small relative to the data and you can traverse an index usefully. Once the "n" in your zoom gets large enough, it's possible to make this worthwhile. Unless your records are really wide, I would guess that it would take a 1000:1 compression or more before you'd end up with a query that's truly shorter than scanning the whole set. And that wouldn't work like what you're trying to do right now at all: you'd instead have to know the bounds of the data set, generate a sequence of points from within that set, and then grab the records best matching those to get a useful zoomed-out subset. Basically, determine where the sequence of records you need should be, then go into the data set to find just them using something like "WHERE ts>x LIMIT 1"; that's the only way to not scan the whole thing. I think this whole approach isn't likely to ever converge on what you want. The direction I think you should be going is to consider whether it's possible to create materialized views of your data that summarize it at wider time scales. You can't compute such a thing in real-time usefully without reading the whole data set, and once you realize that you might as well figure out how to only compute the summarized version once. The last comment in this thread as I write this, from Grzegorz, suggests one approach for something like that. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Hi list, I'm using FWTOOLS ogr2gr to import a s57 chart. Im getting a warning when I'm importing in Linux but not in Windows XP, seebelow. Could anyone tell me why and what the difference is in the result (I can't see it because the lnam_refs is empty), will Imiss anything? ogr2ogr -f "PostgreSQL" PG:"host=localhost dbname=US3MI01M user=ddd password=pwd" ./US3MI01M/US3MI01M.000 -lco OVERWRITE=yes-nln depare DEPARE -t_srs "EPSG:4326" Warning 6: Can't create field lnam_refs with type StringList on PostgreSQL layers. Creating as VARCHAR.
On Monday 21 December 2009 2:24:02 am Malm Paul wrote: > Hi list, > I'm using FWTOOLS ogr2gr to import a s57 chart. Im getting a warning when > I'm importing in Linux but not in Windows XP, see below. Could anyone tell > me why and what the difference is in the result (I can't see it because the > lnam_refs is empty), will I miss anything? > > ogr2ogr -f "PostgreSQL" PG:"host=localhost dbname=US3MI01M user=ddd > password=pwd" ./US3MI01M/US3MI01M.000 -lco OVERWRITE=yes -nln depare DEPARE > -t_srs "EPSG:4326" > > Warning 6: Can't create field lnam_refs with type StringList on PostgreSQL > layers. Creating as VARCHAR. The StringList type is not a native Postgres type. It would seem to be something FWTOOLS/OGR is supposed to supply and is not, in your Linux setup. My guess is you would get a better answer from the FWTOOLS mailing list: http://lists.maptools.org/mailman/listinfo/fwtools -- Adrian Klaver aklaver@comcast.net