Re: Slow concurrent processing

Поиск
Список
Период
Сортировка
От Misa Simic
Тема Re: Slow concurrent processing
Дата
Msg-id CAH3i69mhbuiAcKhS1wfiHsLqpe1TAjDGNvQff=HTr-2=yMJ0EA@mail.gmail.com
обсуждение исходный текст
Ответ на Slow concurrent processing  (Misa Simic <misa.simic@gmail.com>)
Ответы Re: Slow concurrent processing  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
Hi,

Researching deeply my problem with concurrent processing i have found:



"The more likely suspect is a foreign key conflict. 
Are both transactions inserting/updating rows that could reference 
the same row(s) in a master table?" - Tom Lane

This is exactly the case (in my case) - several connections tries to insert rows in the same table... but some columns are referenced to settings tables... and there is possibility that two rows what we want to insert reference the same row in settings table...

Any idea how to make this process faster?

maybe to make new tables the same structure as results tables.... with no indexes fk etc... during processing insert into un-referenced  tables - when full process finish - move rows from unreferenced to real results tables?


Thanks,

Misa










2013/3/12 Misa Simic <misa.simic@gmail.com>
Hi all,

We have one table with list of "records for processing"...

We loop trough that table and call one long runing function:

 do_the_math_for_record(record_id) 

which use different tables for select related rows for input record_id, do some calculations and insert results in two tables...

and we have made 1 function process_all_records()

what simply does: SELECT do_the_math_for_record(record_id)  FROM records_for_processing

When we run that function - it last about 4 minutes...


There are about 300 rows in records_for_processing... we have logged the time on the beginning of do_the_math, and the time in end of do the math... and noticed that processing each row, last between 0.5 to 2 seconds...

so our do_the_math looks like:

PERFORM log_time(record_id, clock_timestamp(), 1)

PERFORM do_the_math_and_save_results(record_id);

PERFORM log_time(record_id, clock_timestamp(), 2)


Then we thought, if we take all "records for processing" and process each in separate connection - it should last longer...

but - got worse result! (using 30 concurrent connections...)... about 7 mins...

if we reduce concurrent connections on 10 - we got result in approx the same time as sequential processing...

but - if replace do_the_math_and_save_results with pg_sleep(1); To simulate long running function so processing each row - last 1 sec...

Sequential processing last as expected 300 seconds!

Concurrent processing last faster with higher number of concurrent connections - about 30 seconds with 30 connections! (much faster - and expected...)

however, if we return our: do_the_math_and_save_results - we can't get better results in concurrent processing...

with higher number of conccurent connections - result is worse... also we have noticed that for some records difference between end_time and start_time si even longer than 1 min - but it is random - not always on the same id... i.e. in this concurrent run lasts 1 min - in next 1 sec - but some other takes about 1 min...

Any idea - why? :)

It says to me - that there is somewhere lock on some tables - so probably our concurrent connections wait - to other finish... but I cant figure out: what and why...

do_the_math_and_save results - selects data from 10 other tables, calculates something, and results inserts in other tables...  

there are about 3 tracking tables with (record_id - other data...... and about 7 settings tables what we join to tracking tables to get all info...), then do the math with that info - and insert results..

we don't do any update... (to have possibility two connections want to update the same row in the same table)

data from tracking_tables - should be separate sets of data for two differenet record_ids...

(joined rows from settings tables could be common - for two sets of different record_id)

but - even they are the same set - SELECTs should not lock the rows in tables...

There are places where we do:

INSERT INTO result_table (columns)
SELECT query (tracking and settings tables joined)

Is there a chance it does some lock somewhere?

can above query be run "concurrently"?

Many thanks,

Misa
 











В списке pgsql-performance по дате отправления:

Предыдущее
От: Shaun Thomas
Дата:
Сообщение: Re: Slow query when used in a view
Следующее
От: Steve Crawford
Дата:
Сообщение: Re: Slow concurrent processing