Re: NOT IN doesn't use index? (fwd)

Поиск
Список
Период
Сортировка
От Joe Conway
Тема Re: NOT IN doesn't use index? (fwd)
Дата
Msg-id 3EB42586.6080301@joeconway.com
обсуждение исходный текст
Ответ на Re: NOT IN doesn't use index? (fwd)  (Becky Neville <rebecca.neville@yale.edu>)
Список pgsql-performance
Becky Neville wrote:
> I think that list is actually (gulp) hard coded.  It's not my query.
> I am trying to speed it up for someone else - to hopefully learn something
> in the process that isn't dependent on what version of postgres i'm
> running :)
>
> I assume it's from another table but can't find it on their data
> model at the moment.  Those are all valid billing codes.  The query
> is checking to see if anyone was billed under an invalid code.  So if
> everything is ok, the query returns nothing.

Yeah -- that sounds like there has to be a table of valid codes
somewhere. In that case you can substitute the "valid_codes" table in
the left join where I had the subselect with all the UNIONs.
Alternatively you might find a NOT EXISTS method would work best. If
there isn't a "valid_codes" table, but that hard coded list is static,
perhaps you could build one and use that.

> But there must be more to it than that...otherwise, they could just
> add a Valid flag to the lookup table.

Well I certainly wouldn't query a whole table of historical information
over and over. Can you use and date column (suitably indexed) to just
check recent transactions (like since the last time you checked)?

> If you have any ideas for speeding it up other than using another
> table please let me know.  It only takes me 9 min to run with 2 mil
> rows but it takes them 7 hours (51 mil rows in Oracle with many other
> jobs running and poor system maintenance.)

As above, are all 51 million rows recent transactions, or is that all of
eternity? If its the latter, I'd scan the whole thing once and produce a
report, or maybe a "transactions_with_invalid_codes" table.

 From that point on, I'd only check the transactions since the last time
I'd checked, either based on a timestamp or even a sequence generated id
field. All you need to do is save off the max value each time you run,
and then use that as the starting point next time.

HTH,

Joe


В списке pgsql-performance по дате отправления:

Предыдущее
От: Becky Neville
Дата:
Сообщение: Re: NOT IN doesn't use index? (fwd)
Следующее
От: brew@theMode.com
Дата:
Сообщение: Re: why is the db so slow?