Обсуждение: A speed comparison with sqlite

Поиск
Список
Период
Сортировка

A speed comparison with sqlite

От
Shane Ambler
Дата:
Just thought I would share some rough numbers here.
A bit of an unusual edge case but a big time difference...


A guy using RealBasic (GUI development IDE) which uses sqlite as it's
inbuilt db engine wanted to generate a series of unique codes and
decided sql will help (wanting 30 million codes starts to rule out ram
based solutions)

This is a 7 character alphanumeric code.

His program generated 30M codes in about 15 minutes into the sqlite db
file without any duplicate checks.

Select distinct(pincode) from codes; returned the results (that is to
his client not out to file) after 22 hours and he was after a faster
solution.

Using a unique index to check as he went, inserts dropped from 18,000
per second to about 200 a second after 8 hours (without completing)




(the following times are taken from psql's timing option)

With a P4 3Ghz - 80GB IDE drive - running XP pro - pg 8.2.6 -

Using postgresql (with psql not Realbasic) I created the table and
inserted 30M codes in 5.9 mins

select count(distinct(pincode)) from codes; took about 7.2 minutes
(29,993,182 unique codes of the 30M created)

setting output to a text file - select distinct(pincode) from codes took
18.45 minutes to send them all out (283MB file).



The same hardware running FreeBSD 7.0RC1 GENERIC Kernel -

create the table and insert 30M codes in 5.6 mins

select count(distinct(pincode)) from codes; took 5.9 minutes
(29,993,279 unique codes of the 30M created)

setting output to a text file - select distinct(pincode) from codes took
9.7 minutes to send them all out to file.




It would appear that the method of locating distinct results is the key
here - postgresql sorts the distinct column allowing it to identify a
duplicate or new value quicker - sqlite returns the distinct columns in
the order they are inserted - so it must be scanning all previous
entries in the output to ensure distinct.




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


Re: A speed comparison with sqlite

От
drh@hwaci.com
Дата:
> A guy using RealBasic (GUI development IDE) which uses sqlite as it's
> inbuilt db engine wanted to generate a series of unique codes and
> decided sql will help (wanting 30 million codes starts to rule out ram
> based solutions)
>
> This is a 7 character alphanumeric code.
>
> His program generated 30M codes in about 15 minutes into the sqlite db
> file without any duplicate checks.
>
> Select distinct(pincode) from codes; returned the results (that is to
> his client not out to file) after 22 hours and he was after a faster
> solution.
>
> Using a unique index to check as he went, inserts dropped from 18,000
> per second to about 200 a second after 8 hours (without completing)
>
> (the following times are taken from psql's timing option)
>
> With a P4 3Ghz - 80GB IDE drive - running XP pro - pg 8.2.6 -
> ...
> The same hardware running FreeBSD 7.0RC1 GENERIC Kernel -
>
> create the table and insert 30M codes in 5.6 mins
>
> select count(distinct(pincode)) from codes; took 5.9 minutes
> (29,993,279 unique codes of the 30M created)
>
> setting output to a text file - select distinct(pincode) from codes took
> 9.7 minutes to send them all out to file.
>
> It would appear that the method of locating distinct results is the key
> here - postgresql sorts the distinct column allowing it to identify a
> duplicate or new value quicker - sqlite returns the distinct columns in
> the order they are inserted - so it must be scanning all previous
> entries in the output to ensure distinct.
>

Come on, guys.  SQLite does have performance issues with sorting
large tables.  It is something we are working on.  But it isn't
*that* bad.  I reran the test with no special tricks and SQLite
took 15 minutes.  Still 3x slower than PostgreSQL, I admit.  But
it is within a factor of 3....

Here is the output I get when I create 2^25 random 7-character
strings, then did a CREATE TABLE t2 AS SELECT DISTINCT * FROM t1:

   2008-01-18 23:29:22: i=16
   2008-01-18 23:29:22: i=32
   2008-01-18 23:29:22: i=64
   2008-01-18 23:29:22: i=128
   2008-01-18 23:29:22: i=256
   2008-01-18 23:29:22: i=512
   2008-01-18 23:29:22: i=1024
   2008-01-18 23:29:22: i=2048
   2008-01-18 23:29:22: i=4096
   2008-01-18 23:29:22: i=8192
   2008-01-18 23:29:22: i=16384
   2008-01-18 23:29:22: i=32768
   2008-01-18 23:29:23: i=65536
   2008-01-18 23:29:23: i=131072
   2008-01-18 23:29:24: i=262144
   2008-01-18 23:29:25: i=524288
   2008-01-18 23:29:27: i=1048576
   2008-01-18 23:29:31: i=2097152
   2008-01-18 23:29:40: i=4194304
   2008-01-18 23:29:59: i=8388608
   2008-01-18 23:30:37: i=16777216
   2008-01-18 23:31:54: start the distinct...
   2008-01-18 23:46:46: finish the distinct...

So you can see, the SELECT distinct took about 15 minutes.
The TCL script used to run this test is:

   sqlite db test.db
   db eval {
     CREATE TABLE t1(pincode TEXT);
     INSERT INTO t1 VALUES(randstr(7,7));
   }
   proc msg {txt} {
     set now [db one {SELECT datetime('now')}]
     puts "$now: $txt"
     flush stdout
   }
   for {set i 1} {$i<30000000} {incr i $i} {
     msg "i=$i"
     flush stdout
     db eval {
       INSERT INTO t1 SELECT randstr(7,7) FROM t1;
     }
   }
   msg "start the distinct..."
   db eval {
     CREATE TABLE t2 AS SELECT distinct(pincode) FROM t1;
   }
   msg "finish the distinct..."

--
D. Richard Hipp <drh@hwaci.com>



Re: A speed comparison with sqlite

От
Shane Ambler
Дата:
Shane Ambler wrote:
> Just thought I would share some rough numbers here.
> A bit of an unusual edge case but a big time difference...
>
>

Just following up with this comparison.


I have been in contact with the SQLite creator about this issue and have
found that SQLite's design leans it's default settings toward low memory
usage keeping with it's design goals of having a small footprint.

Even with that design goal, it doesn't lock you into that. You can tell
SQLite to use any memory it needs as well as how much of your db is
cached in RAM. You can also build SQLite to use these options as default.

One drawback with large datasets is that temp_store can get then get
larger than available ram invoking swapping and still negating the
benefits. (though I found this to still be quicker than the original tests)

Having said that - setting SQLite's temp_store to memory allows SQLite
to return SELECT DISTINCT(pincode) FROM codes; in around 15 minutes
compared to PostgreSQL's 9.7 minutes.


It seems that SQLite's sorting algorithm's are better suited for in
memory use than on disk use. This issue really only appears in large
datasets as used in my original timing examples. More specifically when
temp storage exceeds the OS disk cache size.
They are aware of this situation and are working on improvements.



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: A speed comparison with sqlite

От
Fernando Ike
Дата:
Em Fri, 25 Jan 2008 06:26:26 +1030
Shane Ambler <pgsql@Sheeky.Biz> escreveu:


  Added...



  When acess competitor uses this change because the SQLite does not
have a backend to manage that.


Regards,
--
Fernando Ike
http://www.midstorm.org/~fike/weblog

Re: A speed comparison with sqlite

От
Decibel!
Дата:
On Fri, Jan 25, 2008 at 06:26:26AM +1030, Shane Ambler wrote:
> Shane Ambler wrote:
> >Just thought I would share some rough numbers here.
> >A bit of an unusual edge case but a big time difference...
> >
> >
>
> Just following up with this comparison.
>
>
> I have been in contact with the SQLite creator about this issue and have
> found that SQLite's design leans it's default settings toward low memory
> usage keeping with it's design goals of having a small footprint.
>
> Even with that design goal, it doesn't lock you into that. You can tell
> SQLite to use any memory it needs as well as how much of your db is
> cached in RAM. You can also build SQLite to use these options as default.

Yeah, at least SQLite doesn't try or promise to be something that it's
not. They're intended for embedded apps, and they seem to do that quite
well.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Вложения