Обсуждение: searching archives should be a weeeee bit faster ...

Поиск
Список
Период
Сортировка

searching archives should be a weeeee bit faster ...

От
"Marc G. Fournier"
Дата:
I spent this aft enabling debugging on the postmaster side, and determined
that there was a critical index missing on one of the tables ... there was
no index on the url.rec_id field, so a query that looks like:

SELECT rec_id, site_id, pop_rank FROM url WHERE rec_id IN

('31356','31364','32786','32787','32857','32858','32871','32872','32873','32874','32877','32878','32891','32892','32894','32922','32923','32928','32929','33003','33004','33030','33090','33091','35

146','37026','37028','39730','44317','44339','44348','44359','44413','44420','44438','45314','45322','45861','47914','48756','49587','50564','50567','51222','52330','53029','53550','53581','53898'

,'53901','53932','54235','54944','54951','54955','54965','54973','55041','55295','55370','55374','55609','55885','55889','55898','55903','55905','55906','55907','56442','57144','57228','57230','57

233','58497','58499','58512','58519','58540','58581','58585','59281','59865','59874','60476','60478','60482','60486','60590','60827','61465','61531','61778','62272','62585','62602','62609','62731'

,'63933','66733','66740','66743','66747','66758','66760','66763','66765','66768','66774','66777','68100','68191','68195','68212','68213','68257','68266','68288','68295','68300','68315','68332','68

335','68349','68354','69422','69435','69446','69447','69987','69991','69995','70003','70007','70010','70084','70089','70095','70468','70692','70699','70739','71022','74531','74810','77576','77736'

,'78346','78608','79208','79291','79312','79349','80034','80038','82203','82852','84155','84456','85945','86709','87055','87061','87065','87078','87406','87413','87518','89229','89740','91262','94

205','94209','94215','94217','94940','96242','96295','96303','97442','97827','97833','97854','98262','98845','98846','98847','98848','98849','98850','98851','98852','98854','98855','98856','98857'

,'98858','98859','98860','99239','100360','100407','100459','100731','100840','102020','102026','103782','106017','108523','109645','109654','109667','109670','111213','111232','111233','111349','

111351','111356','111620','116673','116677','116734','117709','117733','118075','118103','122444','126754','127945','127949','128132','131062','131066','131067','131068','131079','131084','131085'

,'131094','131221','131222','131223','131225','131226','131227','134426','135465','135466','135468','135470','137034','137035','137536','142057')
ORDERBY rec_id; 

was doing a sequence scan over 230k+ records in the URL file each time ...
created the index, and now I can actually get results ~32sec for the
single string (mvcc) that I've been using for testing, instead of it never
seeming to come back ...

hopefully that was the major part of it ...


Re: searching archives should be a weeeee bit faster ...

От
Sean Chittenden
Дата:
Is "index" a stop word in the archives?  It doesn't seem to return any
results when typing in "index" with anything (ex: index correlation).
FWIW, just a data point.  -sc

--
Sean Chittenden

Re: searching archives should be a weeeee bit faster ...

От
"Marc G. Fournier"
Дата:
yes it is/was ...I've just removed it from the list of stop words, but am
not going to force a re-index of 250k URLs ;(  So, any new index'ng will
pull in index as a valid search word, and when the current URLs
expire/re-index,they will get pulled in then also ...



On Thu, 4 Sep 2003, Sean Chittenden wrote:

> Is "index" a stop word in the archives?  It doesn't seem to return any
> results when typing in "index" with anything (ex: index correlation).
> FWIW, just a data point.  -sc
>
> --
> Sean Chittenden
>

Re: searching archives should be a weeeee bit faster ...

От
Sean Chittenden
Дата:
> yes it is/was ...I've just removed it from the list of stop words,
> but am not going to force a re-index of 250k URLs ;( So, any new
> index'ng will pull in index as a valid search word, and when the
> current URLs expire/re-index,they will get pulled in then also ...

Hrm... if I search for "correlation," I get some results.  If I search
for "index," I don't get anything (expected), but if I search for
"index correlation," I get nothing.  Why isn't it returning the same
results as when I searched for "correlation?"

Just a scratch that I finally had to scratch and ask about...

-sc

--
Sean Chittenden

Re: searching archives should be a weeeee bit faster ...

От
"Marc G. Fournier"
Дата:

On Thu, 4 Sep 2003, Sean Chittenden wrote:

> > yes it is/was ...I've just removed it from the list of stop words,
> > but am not going to force a re-index of 250k URLs ;( So, any new
> > index'ng will pull in index as a valid search word, and when the
> > current URLs expire/re-index,they will get pulled in then also ...
>
> Hrm... if I search for "correlation," I get some results.  If I search
> for "index," I don't get anything (expected), but if I search for
> "index correlation," I get nothing.  Why isn't it returning the same
> results as when I searched for "correlation?"
>
> Just a scratch that I finally had to scratch and ask about...

my understanding of how it searches is it defaults to an 'and' ... so it
would have to find all docs that have both index (0) *and* correlation (n)
... which would equal zero results ...