Обсуждение: am i creating a performance bottleneck?

Поиск
Список
Период
Сортировка

am i creating a performance bottleneck?

От
Mary Anderson
Дата:
Hi all,

    In order to save a fair amount of space I am creating a schema
against which I will want to run the following simplified query.  Think
of the size of the tables as data ~ 30 M rows,  series has 10,00
entries, data_has_dimensions has 300 K rows, dimensions is tiny

select   population = da.value,
          residence = di.value
from data da, data_has_dimension dhd, dimensions di, series si
where da.series_id = si.series_id
and si.series = 'my-series'
and dhd.data_id = da.data_id
and dhd.dimension_id = di.dimension_id
and di.dimension = 'residence'

UNION

select population = da.value,
        residence = 'total'
where da.series_id = si.series_id
   and  si.series = 'my_series'
   and NOT EXISTS(select dhd.da_id
                  from data_has_dimensions dhd, dimensions di
                  where di.dimension = 'residence'
                  and dhd.dimension_id = di.dimension_id
                  and dhd.da_id = da.da_id)

I am most worried about the second select, with its 'NOT EXISTS'
statement slowing everything down.  I would put appropriate indexes on
this -- namely an index on series for data and an index on dhd for
da_id.   My user community is a bunch of academics, so I am not under
the performance constraints I would have for a business application.

Would it help performance if I denormalized the database by attaching
series to the data_has_dimensions table?

Thanks,
Mary Anderson

Re: am i creating a performance bottleneck?

От
n0g0013
Дата:
On 17.07-17:31, Mary Anderson wrote:
[ ... ]
> I am most worried about the second select, with its 'NOT EXISTS'
> statement slowing everything down.  I would put appropriate indexes on
> this -- namely an index on series for data and an index on dhd for
> da_id.   My user community is a bunch of academics, so I am not under
> the performance constraints I would have for a business application.
>
> Would it help performance if I denormalized the database by attaching
> series to the data_has_dimensions table?

to be honest i couldn't really understand your SQL but that probably
says more about me than you.  either way i don't think you need to
worry about the NOT EXISTS statement.  generally, serialising select
statements is better than joining and would suggest that if your series
data has 10,000 rows (assuming 10,00 was a typo) then i wouldn't merge
that data with another table (it also will simply mean a larger,
disjoint table, essentially equivelant to the join table required your
queries and thus may well cost you in overall performance).  i would
suggest you try to restructure you queries to serialise the 'series'
select joins (and preferably only doing it once), thereby reducing
the final join table to a minimum.

--
        t
 t
                 w