Re: Performance problem with query

Поиск

Список

Период

Сортировка

От	Q
Тема	Re: Performance problem with query
Дата	19 июля 2006 г. 11:52:40
Msg-id	8ACBB832-F600-4C90-AFD0-621CB966E082@gmail.com обсуждение исходный текст
Ответ на	Re: Performance problem with query ("Christian Rengstl" <Christian.Rengstl@klinik.uni-regensburg.de>)
Ответы	Re: Performance problem with query
Список	pgsql-general

Дерево обсуждения

On 19/07/2006, at 8:49 PM, Christian Rengstl wrote:

> Obviously it had something to do with the rule, because now
> everything finished within 20 minutes. the problem is just that i
> don't really want to give up the inheritance design. is there a way
> to maintain the inheritance that doesn't cause this huge
> performance problem?

That is hard to say unless you post the rule and table schema you are
currently using.

> Q <qdolan@gmail.com> wrote on 07/19/06 11:54 am:
>> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
>>
>>> The analyze is from the exact query and i dropped the indexes
>>> before the insert as well without imrpvement. The target table is
>>> as well completely empty and the insert is supposed to write, in
>>> this case, more or less 8 million lines in the table. There is a
>>> rule though, because i have inheritance table structure with one
>>> master table and around 20 child tables.
>>
>> I would say the problem is in the rule.  Try doing the insert into a
>> duplicate table with no rules or inheritance and see how long it
>> takes.
>>
>> Perhaps you should provide the actual schema of tables and rules that
>> are involved in the query in question.
>>
>>> Q <qdolan@gmail.com> wrote on 07/19/06 4:37 am:
>>>>
>>>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>>>
>>>>> now finally after a long time i have the query plan for the whole
>>>>> filled table. I hope somebody can explain me why it takes so much
>>>>> longer...
>>>>
>>>>
>>>> These explain analyze results don't appear to be from the
>>>> queries you
>>>> posted previously. For these results to mean anything you need to
>>>> include the EXACT queries you used to generate them.
>>>>
>>>>> Result  (cost=0.06..254643.06 rows=8044000 width=39) (actual
>>>>> time=0.056..655772
>>>>> .273 rows=8044000 loops=1)
>>>>>   One-Time Filter: ((split_part(($1)::text, '_'::text,
>>>>> 2))::smallint = 1)
>>>>>   InitPlan
>>>>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>>>>> time=0.003..0.003 rows
>>>>> =1 loops=1)
>>>>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>>>>> rows=8044000 width=
>>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>>>>>     ->  Limit  (cost=0.00..0.02 rows=1 width=10) (actual
>>>>> time=0.006..0.007 rows
>>>>> =1 loops=1)
>>>>>           ->  Seq Scan on temp_table  (cost=0.00..194313.00
>>>>> rows=8044000 width=
>>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>>>>   ->  Seq Scan on temp_table  (cost=0.00..194313.00 rows=8044000
>>>>> width=39) (act
>>>>> ual time=0.002..191672.344 rows=8044000 loops=1)
>>>>> Total runtime: 62259544.896 ms
>>>>
>>>>
>>>> This is the query you want to be interested in, the others took no
>>>> time at all.
>>>>
>>>> As a guess I would say the query is an INSERT INTO ... FROM
>>>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>>>> The majority of the time appears to be taken up on the actual
>>>> INSERT
>>>> and not the SELECT part.
>>>>
>>>> How many rows are in the target table and what indexes does it
>>>> have?
>>>> Does it have any triggers, check constraints, or rules applied
>>>> to it?
>>>> All these things can make the insert take longer as the number of
>>>> rows you have already in the table increases.
>>>>
>>>> More than likely you have a target table with a LOT of rows and a
>>>> bunch of indexes on it and your disks are being thrashed because
>>>> the
>>>> indexes are not able to stay cached in RAM. At this point you
>>>> should
>>>> ensure your machine is not swapping do disk, and at the very least
>>>> you should go through one of the many tuning guidelines
>>>> available and
>>>> ensure you have allocated the appropriate amount of memory to
>>>> postgresql for your needs.
>>>>
>>>> You may also want to consider dropping the indexes before you do
>>>> the
>>>> INSERT and recreate them afterwards.
>>>>
>>>>
>>>>
>>>>>>>> "Christian Rengstl" <Christian.Rengstl@klinik.uni-
>>>>>>>> regensburg.de>
>>>>>>>> 13.07.06 8.37 Uhr >>>
>>>>> Good morning list,
>>>>>
>>>>> the following query takes about 15 to 20 minutes for around 2
>>>>> million lines in the file myfile.txt, but with 8 million lines it
>>>>> takes around 5 hours and i just don't understand why there is such
>>>>> a huge discrepancy in performance.
>>>>>
>>>>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>>>>
>>>>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>>>>>             SELECT pid, smid, val1, val12, CAST(split_part((SELECT
>>>>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>>>>            FROM public.temp_table;
>>>>>
>>>>> INSERT INTO public.values(smid, pos, chr)
>>>>>         SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr
>>>>> from
>>>>> public.temp_table LIMIT 1), '_', 2) as int2)
>>>>>         FROM public.temp_table;
>>>>>
>>>>> I came up with this query, because i wanted to use the COPY
>>>>> command
>>>>> to load huge files into the db, but i don't want to load all the
>>>>> columns contained in the file in only one table but copy some of
>>>>> them into one table and some in a second table. As i found out
>>>>> with
>>>>> logging, the data is loaded into temp_table within 15 minutes, but
>>>>> to transfer it from the temp_table toagain only something like 10
>>>>> minutes. Can it be that the cast takes up so much more time than
>>>>> when reading and transferring 2 million lines?
>>>>
>>>>
>>>> --
>>>> Seeya...Q
>>>>
>>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>
>>>>                            _____  /  Quinton Dolan -
>>>> qdolan@gmail.com
>>>>    __  __/  /   /   __/   /      /
>>>>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>>>>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>>>>                      _______  /
>>>>                              _\
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 2: Don't 'kill -9' the postmaster
>>>
>>>
>>> --
>>> Christian Rengstl M.A.
>>> Klinik und Poliklinik für Innere Medizin II
>>> Kardiologie - Forschung
>>> Universitätsklinikum Regensburg
>>> B3 1.388
>>> Franz-Josef-Strauss-Allee 11
>>> 93053 Regensburg
>>> Tel.: +49-941-944-7230
>>
>>
>> --
>> Seeya...Q
>>
>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>>                            _____  /  Quinton Dolan - qdolan@gmail.com
>>    __  __/  /   /   __/   /      /
>>       /    __  /   _/    /      /        Gold Coast, QLD, Australia
>>    __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
>>                      _______  /
>>                              _\
>>
>>
>
>
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230


--
Seeya...Q

                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

                           _____  /  Quinton Dolan - qdolan@gmail.com
   __  __/  /   /   __/   /      /
      /    __  /   _/    /      /        Gold Coast, QLD, Australia
   __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
                     _______  /
                             _\

В списке pgsql-general по дате отправления:

Предыдущее

От: "Christian Rengstl"
Дата: 19 июля 2006 г., 10:50:20
Сообщение: Re: Performance problem with query

Следующее

От: Timothy Smith
Дата: 19 июля 2006 г., 12:00:28
Сообщение: problem creating users via pythons script

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Performance problem with query

Предыдущее

Следующее