Need another way to do this, my sql much to slow...

Поиск
Список
Период
Сортировка
От Jerry Wintrode
Тема Need another way to do this, my sql much to slow...
Дата
Msg-id 4E676B0AAF74B443A18D7BC7AAB3CFFD1D44D6@s01-exch01.tripos.com
обсуждение исходный текст
Список pgsql-sql
I am attempting to write a spam management add-on for my corporate
server, I have the database and the apps written but one part is WAY to
slow to be usefull.

The following view:
       Column         |  Type   | Modifiers
-----------------------+---------+-----------env_sender_num        | integer |msg_from_claims_count | bigint  |

is built on the follow table:
    Column      |            Type             | Modifiers
-----------------+-----------------------------+-----------msg_sender_num  | integer
|env_sender_num | integer                     |msg_from_domain | character varying(255)      |env_from_domain |
charactervarying(255)      |recorded_date   | timestamp without time zone | 
Indexes: record_of_claims_env_sender_num_idx btree (env_sender_num),        record_of_claims_msg_sender_num_idx btree
(msg_sender_num)

With the following SQL:


SELECT record_of_claims.env_sender_num, count(DISTINCT
record_of_claims.msg_sender_num)AS msg_from_claims_count FROM record_of_claims GROUP BY
record_of_claims.env_sender_num;

A sample dataset follows:

msg_sender_num | env_sender_num |    msg_from_domain    |
env_from_domain    |    recorded_date
----------------+----------------+-----------------------+--------------
---------+---------------------             1 |              1 | yahoo.com             | yahoo.com
| 2003-11-18 13:21:07             2 |              2 | mx128.optinvc13y.com  |
mx128.optinvc13y.com  | 2003-11-18 13:21:16             3 |              3 | yahoo.com             | yahoo.com
| 2003-11-18 13:21:17             4 |              4 | yahoo.com             | yahoo.com
| 2003-11-18 13:21:21             5 |              5 | biomarketgroup.com    |
biomarketgroup.com    | 2003-11-18 13:21:24             6 |              6 | sohu.com              | sohu.com
| 2003-11-18 13:21:28             7 |              7 | lycos.com             | lycos.com
| 2003-11-18 13:21:38             8 |              8 | mail.expressrx.info   |
mail.expressrx.info   | 2003-11-18 13:21:41             9 |              9 | approveddeals.com     |
approveddeals.com     | 2003-11-18 13:21:41            10 |             10 | conceptholidays.co.uk |
conceptholidays.co.uk | 2003-11-18 13:21:48


The msg_sender_num and env_sender_num come from another table of unique
names of senders. What I am attempting to do is see how many times
msg_sender_num 1 claims to be a different env_sender_num. So I have to
find all the entries in msg_sender_num equal to 1 and build a count of
the distinct numbers in env_sender_num. This number is then used later
to say that if a msg_sender_num claims to be more then 2
env_sender_num's then the sender is a spammer and gets added to a list.

Everything is working fine except the SQL above. It takes WAY to long to
process on a 500000+ record database. Hell it takes 12 seconds or so on
a 50000 record database. I have included the query plan to show that the
indexes are being used.

Query Plan:

Aggregate  (cost=0.00..166.16 rows=264 width=8) (actual
time=0.98..7768.19 rows=62911 loops=1)  ->  Group  (cost=0.00..159.57 rows=2635 width=8) (actual
time=0.56..3179.14 rows=80466 loops=1)        ->  Index Scan using record_of_claims_env_sender_num_idx on
record_of_claims  (cost=0.00..152.99 rows=2635 width=8) (actual
time=0.55..2240.15 rows=80466 loops=1)

Total runtime: 7931.63 msec


Is there a better, read "Faster", way to achieve this?


Jerry Wintrode
Very Newbie Postgres User








В списке pgsql-sql по дате отправления:

Предыдущее
От: Erik Thiele
Дата:
Сообщение: current_date timezone documentation suggestion
Следующее
От: Christoph Haller
Дата:
Сообщение: Re: Need another way to do this, my sql much to slow...